Skip to main content

Advanced Machine Learning
Enrollment in this course is by invitation only

The course will concentrate on the topics like Applicability of common statistical methods to big data, Supervised and unsupervised learning methods and Applications of modern machine learning methods: decision trees, support vector machines, neural networks, factorization machines.
Enrollment in this course is by invitation only

Course Description

This course covers methods that became influential mostly, with few exceptions, during the two decades around the turn of the twenty first century. Main motivation for development of these methods was tremendous increase in volume and complexity of data in 1990-s. One of important takeaways of the course is that even traditional methods, like linear regression, need to be reviewed and adjusted for new realities of big data era.

Key features of this course are: fundamental academic background of the lecturer combined with lifelong teaching experience in multiple subjects related to data analyses as well as solving real life problems. On one hand, this course contains deep conceptual content presented in accessible form with large number of interactive tools and examples. On the other hand, it contains large number of workshops helping students to acquire hands-on experience.

This course illustrates methods with examples from broad range of areas of application in which the lecturer has had real experience during his career as data scientist. One area of applications in which the lecturer has specialized for especially long time is financial industry. Depending on demand from the audience the lecturer can include discussions and insights of applications in specific areas.

Projects in the course use R and Python. Familiarity with both languages at least at an introductory level is a requirement.

Course Contents

Important Note: Changes may occur to the syllabus at the instructor's discretion. When changes are made, students will be notified via email and in-class announcement.

    SESSION 1: Introduction

  • Data Analysis from Gauss to Google.
  • Review of regression methods, linear model with large number of predictors, selection of predictors.
  • Review of Principal Components Analysis applied to regression analysis (PCR).
  • SESSION 2: Regularization Techniques

  • Regularization and shrinkage of parameters in regression analysis.
  • Ridge and lasso regression methods.
  • Comparison with other regression methods on data with large number of predictors.
  • SESSION 3: Decision Trees

  • Decision trees for regression and classification, their assumptions their strengths and limitations.
  • Comparison of tree regression with linear and generalized linear models.
  • Review of measures of classification decision quality: confusion matrix and log-loss.
  • SESSION 4: Bagging and Boosting

  • Random forests, methods of bagging and boosting: parallel and sequential ways of achieving complexity.
  • Applications
  • SESSION 5: Support Vector Machines

  • Support vector machines for regression and classification
  • Kernel trick and review of selected kernels
  • Applications and comparison with other methods for regression and classification
  • SESSION 6: Filtering

  • Recommender algorithms
  • Regression vs. collaborative filtering
  • Review of factorization machines
  • Applications for classification, prediction and collaborative filtering
  • SESSION 7: Neural Networks

  • Neural networks. From biological to artificial neural network.
  • Statistical models as neural networks.
  • Applications
  • SESSION 8: Deep Learning

  • Introduction to deep learning: motivation, comparison with other methods.
  • Back propagation algorithm
  • Introduction to TensorFlow
  • SESSION 9: Deep Learning Continued

  • Introduction to deep learning: Keras basics, main architectures, sequential networks
  • fitting and tuning models. Using models for prediction
  • SESSION 10: NLP

  • Introduction to natural language processing.
  • Feature extraction and engineering.
  • Example project
Requirements
  • College Level Statistical Analysis
  • Familiarity with Python and/or R
Recommended Books

ISLR, G.James, D. Witten, T. Hastie, R. Tibshirani, Springer, 2013 (Recommended).

Software and Hardware

This course is taught using R (http://cran.r-project.org) and Python (https://www.python.org/). It is recommended that students have their laptops with R and Python installed during all sessions.

Course Staff

Course Staff Image #1
Yuri Balasanov

Yuri Balasanov is a faculty member at the University of Chicago since 1997. He teaches at Graduate Program on Financial Mathematics (MSFM) and Graduate Program on Analytics (MScA). He is also founder and President of Research Software International, Inc. since 1991 and iLykei Teaching Tech Corp since 2015. Dr. Balasanov earned his Master’s degree in Applied Mathematics and Ph. D. in Probability Theory and Mathematical Statistics from The Lomonosov Moscow State University, Russia, where he studied under Andrey Kolmogorov and leading members of his school. His primary expertise and research interests are in the area of stochastic modeling and advanced data analysis with applications in various fields including trading, risk management, finance and economics, business analytics, marketing, biology, medical studies. Dr. Balasanov has been a financial industry practitioner for more than 20 years, working at leading financial institutions as head quant, quantitative trader and risk manager.

Effort Required

6-8 Hours per week

Enrollment in this course is by invitation only