The general objective of the course is the presentation of methods and computational tools of linear algebra, emphasizing the solution of problems in Data Science. The mathematical methods target the effective manipulation of the two fundamental objects of the field, namely graphs and matrices. The field is rapidly advancing and has many applications. The course covers advanced matrix methods and their applications in data science. Students will learn about the mathematical foundations of matrices and explore their use in various data science tasks, including data reduction, classification, clustering, and optimization. The course combines theoretical concepts with practical applications, using software tools for hands-on experience. By the end of the course, students would have encountered and used theoretical and practical tools that are essential in the area, they would have a command of these tools’ strengths and weaknesses, and they would be able to select methods based on the problem characteristics. They would also be able to apply and combine these techniques and follow the rapidly evolving research literature on the topic.
Contents: Matrix computations as kernels in Data Science applications. From graphs to vectors to tensors. The many views of matrix multiplication. Classical matrix factorizations. CS and the GSVD. Rank revealing factorizations. Least sqaures and linear regression. Total least squares. Regularization techniques and ridge regression. Solving with iterative methods: descent methods, Krylov subspaces, row projection methods. Dimensionality reduction and clustering applications: Approximating with low rank matrices. Nonnegative least squares and the NMF. Tensors and their decompositions. Randomized numerical linear algebra for very large problems: randomized projections, sketching, CUR, Blendenpik, randomized SVD. Matrix functions and applications in computing centrality indices. Computing the trace and selected matrix elements. The impact of HPC and new architectures: Novel floating point number representations, probabilistic error analyses, stochastic rounding, mixed precision arithmetic. Parallelism in matrix computations. Communication avoiding and asynchronous algorithms. Numerical libraries.