Restricted Boltzmann Machines, in R

Some time back I watched an excellent video of Geoff Hinton presenting deep learning at Google, http://www.youtube.com/watch?v=AyzOUbkUf3M, and I decided that I really needed to learn more about this technology.

Well, the best way to learn is to do, so I decided to start coding a Restricted Boltzmann Machine (RBM) in R. I found this really good article and code online, http://bayesianthink.blogspot.com/2013/05/the-restricted-boltzmann-machine-rbm.html, and decided to see if I make an S3 object out of it and make it run a little faster…

# Setup the training data 
image_rows <- 5 
image_cols <- 9 
up <- c(0,0,0,0,1,0,0,0,0,
        0,0,0,1,1,1,0,0,0,
        0,0,1,1,1,1,1,0,0, 
        0,1,1,1,1,1,1,1,0, 
        1,1,1,1,1,1,1,1,1) 
down <- c(1,1,1,1,1,1,1,1,1, 
          0,1,1,1,1,1,1,1,0, 
          0,0,1,1,1,1,1,0,0, 
          0,0,0,1,1,1,0,0,0, 
          0,0,0,0,1,0,0,0,0) 
# Reverse each list, so that the image is drawn correctly 
up <- rev(up) 
down <- rev(down)
training_set <- rbind(up, down)

This gives us two arrows, one pointing up and the other pointing down:

Next, we can run the RBM with the following code. It’s based on the reference above, but I have added bias units and recoded it to make better use of R’s vector functions. This gave about a 10x speedup on my Mac Mini. I haven’t yet added mini-batches, convergence testing, or anything that would make it more industrial strength, but that would only complicate it for now.

rbm.prob <- function(a) {
  1/(1 + exp(-a))
}

fit.rbm <- function(data, maxiter=5000, alpha=0.01, hidden=3) {
  # Add 1 to the sizes for the bias units
  hstates <- hidden + 1
  vstates <- ncol(data) + 1
  init.std = 0.5
  w <- matrix(data=rnorm(hstates * vstates, 0, init.std), nrow=vstates)

  for (iter in seq(1:maxiter)) {
    # Get a training sample
    pos <- sample(1:nrow(data),1)
    visible.state <- c(data[pos,], 1)  # Add in bias unit

    # Forward pass, to get hidden unit activations
    hidden.prob <- as.vector( rbm.prob( visible.state %*% w ) )
    hidden.state <- ifelse(hidden.prob > runif(hstates,0,1), 1, 0)
    hidden.state[length(hidden.state)] <- 1  ## Bias
    iter0 <- visible.state %o% hidden.state

    # Reconstruction pass, to get visble unit activations from hidden
    visible.prob <- as.vector( rbm.prob( w %*% hidden.state ) )
    visible.state <- ifelse(visible.prob > runif(vstates,0,1), 1, 0)
    visible.state[length(visible.state)] <- 1  ## Bias

    # Now go forwards again
    hidden.prob <- rbm.prob( visible.state %*% w )
    hidden.prob[length(hidden.prob)] <- 1  ## Bias
    hidden.state <- ifelse(hidden.prob > runif(hstates,0,1), 1, 0)

    # Update weights
    iter1 <- visible.state %o% as.vector( hidden.prob )
    w <- w + alpha * (iter0 - iter1)
  }

  rc = list(weights=w, visible=vstates, hidden=hstates)
  class(rc) <- "rbm"
  rc
}

And now we can take a look at the output graphically. If we show the visible layer an upwards arrow and activate the hidden layer, this shows some of the images that it “reconstructs” – so now we can see what it dreams about:

Here’s the code that draws the little picture, in case you’d like to try it:

dream.rbm <- function(x, sample) {
  # Add bias unit
  visible.state <- c(sample, 1)
  hidden.prob <- as.vector( rbm.prob( visible.state %*% x$weights ) )
  hidden.state <- ifelse(hidden.prob > runif(x$hidden,0,1), 1, 0)
  dream.prob <- as.vector( rbm.prob( x$weights %*% hidden.state ) )
  dream.state <- ifelse(dream.prob > runif(x$visible,0,1), 1, 0)
  # remove bias unit
  dream.state[1:length(dream.state)-1]
}

d <- dream.rbm(r, up)
image( t( matrix(d, nrow=image_rows, byrow=TRUE) ), xaxt="n:", yaxt="n" )

Oh, and one more fun thing to play with. We can set the hidden state to whatever we want and see what type of images are recalled in the visible layer:

try_hidden.rbm <- function(x, h) {
  hidden.state <- c(h,1)
  visible.prob <- as.vector(rbm.prob( x$weights %*% hidden.state ))
  visible.state <- ifelse(visible.prob > runif(x$visible,0,1), 1, 0)
  visible.state[1:length(visible.state)-1]
}

d2 <- try_hidden.rbm(r, c(0,1,0))
image( t( matrix(d2, nrow=image_rows, byrow=TRUE) ), xaxt="n:", yaxt="n" )

And using (0,1,0) results in images that look like this:

This is one of the building blocks of deep learning. The next two steps are:

Stack 2 or more unsupervised stages together, to find even deeper structure in the data. Hinton’s video shows some of the things that are possible and should really whet your appetite to dive into deep learning.
Build a supervised model that uses the hidden layer units as inputs. For example, logistic regression is so much easier using the 3 hidden units, instead of the 45 visible units.

Enjoy!

Restricted Boltzmann Machines, in R

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories