<b>Optimization of Entropy with Neural Networks</b>

Optimization of Entropy with Neural Networks

N. N. Schraudolph. Optimization of Entropy with Neural Networks. Ph.D. Thesis, University of California, San Diego, 1995.
Introduction only Related papers: Chapter 2 Chapter 3 Chapter 3 Chapter 4

Download


1.1MB	656.6kB	727.7kB

Abstract

The goal of unsupervised learning algorithms is to discover concise yet informative representations of large data sets; the minimum description length principle and exploratory projection pursuit are two representative attempts to formalize this notion. When implemented with neural networks, both suggest the minimization of entropy at the network's output as an objective for unsupervised learning. The empirical computation of entropy or its derivative with respect to parameters of a neural network unfortunately requires explicit knowledge of the local data density; this information is typically not available when learning from data samples. This dissertation discusses and applies three methods for making density information accessible in a neural network: parametric modelling, probabilistic networks, and nonparametric estimation. By imposing their own structure on the data, parametric density models implement impoverished but tractable forms of entropy such as the log-variance. We have used this method to improve the adaptive dynamics of an anti-Hebbian learning rule which has proven successful in extracting disparity from random stereograms. In probabilistic networks, node activities are interpreted as the defining parameters of a stochastic process. The entropy of the process can then be calculated from its parameters, and hence optimized. The popular logistic activation function defines a binomial process in this manner; by optimizing the information gain of this process we derive a novel nonlinear Hebbian learning algorithm. The nonparametric technique of Parzen window or kernel density estimation leads us to an entropy optimization algorithm in which the network adapts in response to the distance between pairs of data samples. We discuss distinct implementations for data-limited or memory-limited operation, and describe a maximum likelihood approach to setting the kernel shape, the regularizer for this technique. This method has been applied with great success to the problem of pose alignment in computer vision. These experiments demonstrate a range of techniques that allow neural networks to learn concise representations of empirical data by minimizing its entropy. We have found that simple gradient descent in various entropy-based objective functions can lead to novel and useful algorithms for unsupervised neural network learning.

BibTeX Entry

@phdthesis{Schraudolph95,
     author = {Nicol N. Schraudolph},
      title = {\href{http://nic.schraudolph.org/pubs/Schraudolph95.pdf}{\bf
               Optimization of Entropy with Neural Networks}},
     school = {University of California, San Diego},
       year =  1995,
   b2h_type = {Other},
  b2h_topic = {>Entropy Optimization},
   b2h_note = {<a href="b2hd-intro">Introduction only</a> &nbsp;&nbsp;&nbsp; Related papers: &nbsp; <a href="b2hd-SchSej92">Chapter 2</a> &nbsp; <a href="b2hd-SchSej93">Chapter 3</a> &nbsp; <a href="b2hd-SchSej95">Chapter 3</a> &nbsp; <a href="b2hd-VioSchSej96">Chapter 4</a>},
   abstract = {
    The goal of unsupervised learning algorithms is to discover concise yet
    informative representations of large data sets; the minimum description
    length principle and exploratory projection pursuit are two representative
    attempts to formalize this notion.  When implemented with neural networks,
    both suggest the minimization of entropy at the network's output as an
    objective for unsupervised learning.
    The empirical computation of entropy or its derivative with respect to
    parameters of a neural network unfortunately requires explicit knowledge
    of the local data density; this information is typically not available
    when learning from data samples.  This dissertation discusses and applies
    three methods for making density information accessible in a neural
    network: parametric modelling, probabilistic networks, and nonparametric
    estimation.
    By imposing their own structure on the data, parametric density models
    implement impoverished but tractable forms of entropy such as the
    log-variance.  We have used this method to improve the adaptive dynamics
    of an anti-Hebbian learning rule which has proven successful in extracting
    disparity from random stereograms.
    In probabilistic networks, node activities are interpreted as the defining
    parameters of a stochastic process.  The entropy of the process can then
    be calculated from its parameters, and hence optimized.  The popular
    logistic activation function defines a binomial process in this manner;
    by optimizing the information gain of this process we derive a novel
    nonlinear Hebbian learning algorithm.
    The nonparametric technique of Parzen window or kernel density estimation
    leads us to an entropy optimization algorithm in which the network adapts
    in response to the distance between pairs of data samples.  We discuss
    distinct implementations for data-limited or memory-limited operation,
    and describe a maximum likelihood approach to setting the kernel shape,
    the regularizer for this technique.  This method has been applied with
    great success to the problem of pose alignment in computer vision.
    These experiments demonstrate a range of techniques that allow neural
    networks to learn concise representations of empirical data by
    minimizing its entropy.  We have found that simple gradient descent 
    in various entropy-based objective functions can lead to novel and
    useful algorithms for unsupervised neural network learning.
}}