Course Code:
CEID_NE5597
Type:
Period:
Winter Semester
Division:
Instructors:
Credit Points:
5
- Introductory notions (user modeling, document logical representation, retrieval process).
- Performance evaluation metrics (recall, precision, average precision, R-precision, precision histograms, NDCG metric, harmonic median, user oriented metrics).
- Information retrieval modeling.
- Set-oriented models (boolean models, fuzzy set model, extended boolean model), algebraic models (vector space models, latent semantic indexing model, topic models), probabilistic models (classical and language models).
- Web information retrieval and its peculiarities.
- Web search engines (crawler, indexer). HITS algorithm (Hyperlink-induced topic search). Google search engine (the PageRank metric). The SALSA algorithm, variants in web searching
- Machine Learning Techniques in Information Retrieval (Learning to Rank, Linguistic Models, Vector representation of words (word embeddings such as word2vec, CBOW, skipgram), LSTM, Transformers, BERT, GPT)
- Indexing structures (inverted files, signature files, bitmaps).
- Storage Techniques in Distributed Information Retrieval (MapReduce, Apache Spark)
- Full indexing structures in main memory (suffix trees, suffix arrays, acyclic directed graphs (DAWG) for strings), and in secondary memory (supra-suffix array, prefix Β-tree, string Β-tree).
- Compression algorithms for text and for indexing structures.
- Text Mining