Overview of Statistical Machine Learning

Director: Nic Schraudolph (SML, NICTA and adjunct with CSL, RSISE)

The course is a general introduction to the methods and practice of statistical machine learning.

Pre-Requisites and Assumed Knowledge

A bachelor's degree in a relevant subject area; confident use of a common programming language.
Mathematical training at the 2nd year undergraduate level, including basic linear algebra and probability theory.

Dates

Registration: by 04 Apr 06
Course Dates: 25 Apr to 01 Jun 06 (6 weeks)
Lectures: Tue&Thu 10-12
Tutorial/Exercise sessions: once a week, time and place TBD
Assignments Due: by 09 Jun 05
Notification: by 26 Jun 06

Presenters

Simon Guenter
Nic Schraudolph
Doug Aberdeen
SVN Vishwanathan
Alex Smola

(all SML, NICTA and adjunct with CSL, RSISE)

Location

NICTA on Northbourne Ave., or RSISE on the ANU campus, depending on majority of participants.

Workload

Weekly contact hours: 4h lecture, 2h tutorial
Total contact hours: 24h lecture, 12h tutorial
Assignments: 3 required, 5h each, 15h total
Preparation/Reading: 1.5h per week, 9h total
Total workload: 24 + 12 + 15 + 9 = 60h (3 units)

Assessment

Only a pass or fail mark will be awarded. To pass the course, students must gain a pass mark on at least 3 out of at least 4 offered assignments.

Detailed Syllabus

DRAFT - subject to change at the discretion of the course organizer.

Bayesian Inference
- frequentists vs. Bayesians
- derivation of Bayes' Rule
- use for inference
Assignment 1 (theory): Ovarian Cancer Screening
Reading: Euro coin tosses (MacKay)
Maximum Likelihood Modeling
- regression, classification, density estimation
- maximum likelihood loss functions
Reading: Maximum Likelihood--Mixture of Gaussians (Schiele)
Density Estimation
- parametric vs. non-parametric
- classification via density estimation
- semi-parametric and mixture models
- Expectation-Maximisation (EM) algorithm
Assignment 2 (programming): EM
Reading: A Gentle Tutorial of the EM Algorithm (pages 1-3)
Least Squares Regression
- linear vs. non-linear models
- simple gradient descent
- singular value decomposition
- basis functions, generalized least squares
- classification via regression
Neural Networks
- biological background
- learning in neural networks
- backpropagation algorithm
Assignment 3 (programming): implement neural network
Classical (Batch) Optimization
- Newton, quasi-Newton
- conjugate gradient
Reading: Conjugate Gradient Without the Pain (chapters 1-4)
Stochastic (Online) Optimization
- need for online learning
- direct (gradient-free) methods
- gradient step size adaptation
Assignment 4
Overfitting, Validation, and Regularisation
- empirical vs. true risk
- cross-validation techniques
- Ockham's razor, regularization
- minimum description length
Reinforcement Learning (Doug Aberdeen)
- dynamic programming
- function approximation
- simulation
- policy based methods
- Tesauro's backgammon
Assignment 5 (programming): reinforcement learning
Kernel Methods 1 (Alex Smola / SVN Vishwanathan)
Kernel Methods 2 (Alex Smola / SVN Vishwanathan)
Assignment 6: kernel methods

10/05 - N. Schraudolph