Web Algorithmics
A.Y. 2024/2025
Learning objectives
The course has the goal of putting the student in touch with advanced techniques concerning the harvesting of large-scale document collections, and with the construction of search engines, in particular in the case of the web, with its ranking algorithms.
Expected learning outcomes
The student should be able to design and implement a large-scale text indexer, with a particular focus on the ranking algorithms applied to the results.
Lesson period: First semester
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
First semester
Course syllabus
The course aims to put the student in contact with advanced techniques relating to the collection of large document collections and the construction of search engines, with particular attention to the web and its ranking mechanisms. The course requires knowledge of basic algorithmic techniques and network protocols. A subset of the following topics will be covered, also based on the interests of the students:
Anatomy of a search engine and indexing techniques.
Information harvesting: algorithmic, technical and social problems in creating crawlers.
Comparing visiting strategies.
Global web structure and algorithmic tools for its analysis.
Web graph representation: compression problems.
Parallel and distributed crawlers. Consistent hashing techniques.
Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
Techniques for the construction of inverse indices.
Methodologies for the construction of large distributed indices.
Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
Representation of Elias-Fano and quasi-succinct indexes.
Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
Substring indices: trie, suffix trees and suffix vectors.
Vector-product quantization and vector databases.
Anatomy of a search engine and indexing techniques.
Information harvesting: algorithmic, technical and social problems in creating crawlers.
Comparing visiting strategies.
Global web structure and algorithmic tools for its analysis.
Web graph representation: compression problems.
Parallel and distributed crawlers. Consistent hashing techniques.
Network centrality: closeness, harmonic centrality, Katz, PageRank, dominant eigenvector, etc.
Techniques for the construction of inverse indices.
Methodologies for the construction of large distributed indices.
Methodologies for building large dictionaries: minimal perfect order-preserving maps, prefix dictionaries, lexicographic compression.
Instantaneous codes for index compression: unary, Elias γ, Elias δ, Golomb, etc.
Representation of Elias-Fano and quasi-succinct indexes.
Textual ranking techniques (TF/IDF, BM25, LSI, cosine, etc.).
Substring indices: trie, suffix trees and suffix vectors.
Vector-product quantization and vector databases.
Prerequisites for admission
Discrete mathematics, linear algebra, programming, algorithms.
Teaching methods
Classes.
Teaching Resources
Assessment methods and Criteria
Esame orale.
INF/01 - INFORMATICS - University credits: 6
Lessons: 48 hours
Professor:
Vigna Sebastiano
Shifts:
Turno
Professor:
Vigna SebastianoProfessor(s)