Applications

These notes are now complete for Winter, 1999.

Required readings

Chapter 9 page 265-275 (Nettalk, auto-encoders and PCA, character recognition), and 4 articles from coursepack:

"Neural network approaches to solving hard problems", by Base & Liang. Skip sections 10.1, 10.2, 10.4.2, 10.8 and 10.9.
Mind Like A Steel Trap, by Rose.
Neural Networks in Real World Applications, section on Neural Networks for Speech Processing and Consumer Electronics, by Mozer.
Neural Networks Expand SP's Horizons, by Haykin, esp. case studies 2 and 3.

Applications of back-propagation networks

Character recognition - the le Cun et al model of hand-written digit recognition (see relevant section of Chapter 9, and section 10.6 of Base & Liang Coursepack article)
Main features:
- Preprocessing of inputs
- Hierarchical architecture with local receptive fields
- Weight-sharing/equality-constraints
Time-series prediction (see section 10.3 of Base and Liang article, and the Mind Like a Steel Trap article in coursepack)
- Standard approaches: MA, AR and ARMA models. These are all linear and therefore unsuitable for most practical problems.
- Neural network approaches:
  1. tapped delay lines - sensitive only to a fixed number of steps back in time, but allow for precise computation to be made over that limited window
  2. recurrent networks - sensitive arbitrarily far back in time, but info is blurred in time
- Applications:
  - Predicting geyser eruptions (covered in assignment 2)
  - Predicting breakouts in steel casting.
    Ghahramani and Hinton used a version of the Mixture of Competing Experts arhitecture, a modular arhcitecture that can divide up a problem into different regimes and separately model them.
    
    Each expert network was linear but used recurrent connections (a linear Kalman filter). The nonlinear gating network computed coefficients that were used to combine the experts' outputs into a single global prediction. The network was thereby able to handle very different regimes of the input, such as startup conditions versus normal operating conditions. A probabilistic interpretation of the outputs allowed the network to indicate its own certainty that it could account for each piece of data, and thereby detect abnormal operating conditions.
Process control - the ALVINN system for autonomous land vehicle navigation in neural networks (see section 10.5 in Base and Liang)
Major features & design issues:
- Simple feed-forward architecture - potential drawbacks?
- Output representation for steering controls
- Generalization to new road conditions - use diverse training set, and train on translated images to achieve translation-invariance
- Output reliability monitoring: use of a secondary auto-encoder network for input reconstruction
Speech generation & recognition (see relevant section of Chapter 9 in textbook, section 10.4.1 in Base and Liang article, and Neural Networks in Real World Applications article - section by Mozer in coursepack.)
- Nettalk - converting text to speech
- 'Radar' the talking robot (demo), using speech recognition chip by Sensory Inc.
  Figure 1. Mozer's architecture for speaker-independent recognition.
  "Figure 1: the basic processing stages that occur when an acoustic signal is presented for speaker-independent recognition: segmentation (where the utterance begins and ends in the continuous input stream), digital filtering for time normalization and acoustic feature extraction, and a neural network recognizer.
  A huge amount of work goes into the preprocessing and data collection
  "Speaker-independent recognition with a vocabulary of a dozen alternatives achieves recognition accuracies greater than 95% in actual usage" (Mozer, 1996).
- Glove-talk (video): an illustration of the use of neural networks in adaptive interfaces (no reading, but see Adaptive Mixtures of Local Experts paper for background)
  Major features:
  - Learns to map from hand gestures, which are input via a dataglove, to speech, which is output via a synthesizer.
  - A neural network learns the mapping, using vowel and consonant networks
  - the above two networks use radial basis functions (RBF's) for hidden units. The RBFs allow the hidden layer to develop "local features" that represent a small region of the high-dimensional input space. This tends to make training faster and is good for interpolation across the vowel space. That is, a wide range of input parameters (including intermediary values not previously trained on) can be mapped to a single vowel sound.
  - A third "V/C network" decides which of the first two nets is appropriate, as in the mixture of competing experts architecture.
Image compression and data preprocessing (see section 10.7 in Base and Liang, and case studies 2 and 3 in Haykin's article) with neural networks.
Key points:
- Equivalence between Principal Componenets Analysis (PCA) and multi-layer auto-encoder networks in the 3-layer case: The mapping learned by the hidden layer can be shown to be equivalent to Principal Components Analysis (PCA), a well known technique for dimension reduction which projects the data onto the subspace containing maximum variance in the data. PCA selects the dimensions corresponding to the eigenvectors of the data covariance matrix having the largest eigenvalues.
- Demo: encoding 2-d images of "Gaussian blobs", compressed through a bottleneck of hidden units: emergence of value-codes versus variable-codes, depending on compression rate.

Applications

Required readings

Further reading

Applications of back-propagation networks