Information Retrieval

Course Code: 
CEID_NE5597
Period: 
Winter Semester
Instructors: 
Credit Points: 
5

Course outline

  • Introductory notions (user modeling, document logical representation, retrieval process).
  • Performance evaluation metrics (recall, precision, average precision, R-precision, precision histograms, NDCG metric, harmonic median, user oriented metrics).
  • Information retrieval modeling.
  • Set-oriented models (boolean models, fuzzy set model, extended boolean model), algebraic models (vector space models, latent semantic indexing model, topic models), probabilistic models (classical and language models).
  • Web information retrieval and its peculiarities.
  • Web search engines (crawler, indexer). HITS algorithm (Hyperlink-induced topic search). Google search engine (the PageRank metric). The SALSA algorithm, variants in web searching
  • Machine Learning Techniques in Information Retrieval (Learning to Rank, Linguistic Models, Vector representation of words (word embeddings such as word2vec, CBOW, skipgram), LSTM, Transformers, BERT, GPT)
  • Indexing structures (inverted files, signature files, bitmaps).
  • Storage Techniques in Distributed Information Retrieval (MapReduce, Apache Spark)
  • Full indexing structures in main memory (suffix trees, suffix arrays, acyclic directed graphs (DAWG) for strings), and in secondary memory (supra-suffix array, prefix Β-tree, string Β-tree).
  • Compression algorithms for text and for indexing structures.
  • Text Mining 

Startup Growth Lite is a free theme, contributed to the Drupal Community by More than Themes.