Friday 23 October 2015

Installing RNNLIB on Ubuntu 14.04

I have previously looked at CURRENNT, but now I'll go through how to use RNNLIB, which is a different rnn library. RNNLIB doesn't use cuda, but it can do multi-dimensional rnns and also can use the CTC loss function which can be handy. Let's get right into it. Download RNNLIB from sourceforge here.

You'll have to install the netcdf library:

sudo apt-get install libnetcdf-dev

You should also install boost: sudo apt-get install libboost-all-dev. I have libboost1.55 installed already.

There is an installation guide over here if you want some additional pointers. First run configure and make:


Fixing the errors

The first error I got was:

Mdrnn.hpp:239:6:   required from here
Helpers.hpp:311:41: error: ‘range_min_size’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
size_t size = range_min_size(r1, r2, r3);

To fix this one you need to put templates code above the code that uses them. Moving the 4 templates for range_min_size to the top of the file should fix all the issues in Helpers.hpp. I cut lines 532-547 and pasted the lines to line 163 of Helpers.hpp.

Running make again I got: Container.hpp:157:24: error: ‘resize’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation. To fix this one change on line 157 of containers.hpp "resize" to "this->resize."

That was all I needed to do to get this version installed.

Thursday 22 October 2015

Running the CURRENNT rnn library

Previously I have made posts about installing Currennt and writing data files for Currennt. This post will be about actually running it assuming it is installed and the data files have been written.

The first thing you'll need to specify is the config file, which I called config.cfg. Its contents look like this:

max_epochs_no_best   = 20
max_epochs = 100
learning_rate = 1e-5
network = network.jsn
train = true
train_file =
val_file =
weights_dist = normal
weights_normal_sigma = 0.1
weights_normal_mean = 0
stochastic = true
validate_every = 1
parallel_sequences = 20
input_noise_sigma = 0
shuffle_fractions = true
shuffle_sequences = false

Many of the options are described on the currennt wiki. The network.jsn is a file that I will describe down below and contains the network configuration e,g, number of layers, layer types etc. The files and were created using the MATLAB script in a previous post. If you set parallel sequences too high you'll run out of memory on your GPU, but higher means it will train faster.

The network configuration

Info on the network configuration can be found over here. The network I used looks like this:

"layers": [
"size": 20,
"name": "input",
"type": "input"
"size": 200,
"name": "level_1",
"bias": 1.0,
"type": "blstm"
"size": 200,
"name": "level_2",
"bias": 1.0,
"type": "blstm"
"size": 4,
"name": "output",
"bias": 1.0,
"type": "softmax"
"size": 4,
"name": "postoutput",
"type": "multiclass_classification"

actually running stuff

Now to run things, just do:

/path/to/currennt/build/current config.cfg

and everything should work, just wait for it to finish training. This will output a file called trained_network.jsn which we will need during testing. For testing, you will need another HDF5 file just like the training one, except with the test sequences in it. For testing we have another config file which I called ff_config:

network = trained_network.jsn
cuda = 0
ff_output_format = csv
ff_output_file = test_results
ff_input_file =

Notice that cuda is off, I found it ran faster in feed forward mode without cuda. To do testing run:

/path/to/currennt/build/current ff_config.cfg

Now all you need to do is parse the output file test_results.csv if you need the individual predictions, it is in comma separated variable format so it is not hard to work out.

Writing HDF5 files for Currennt

In a previous post I explained how to install currennt. This post will deal with writing the data files. The next post will be about actually running things. Currennt uses data written to HDF5 files, so whatever format your data is in you have to write it to a HDF5 file then use currennt to process it. This post will use matlab to generate the HDF5 files, but e.g. python could easily do it as well.

Recurrent neural networks predict things about sequences, so for this example we'll be using protein sequences as the example. Predicting things like phonemes for speech files is exactly the same. For each amino acid in a protein, we'll be predicting the secondary structure, which can be C, E, H or X. The training information will be a L by 20 matrix, this means the protein is of length L and each amino acid has 20 features (we'll use PSSM features, of which there are 20, extracted using PSI-BLAST.). I will just be using 400 training proteins and 100 test proteins, it is often beneficial to try things out on small datasets because you get the results quicker and can figure out problems quicker.

HDF5 files

Currennt needs the data written to HDF5 files, but it needs a heap of extra parameters written to the files as well as the data, this section will specify what you need to write to them, then I'll put some actual MATLAB code in the next section.

The data we'll be using is protein data, there will be 400 training proteins, each of length L (the sequences can be anywhere from about 20 to 2000 in length). There are 20 features per amino acid, this means each protein is represented by a matrix of size 20 by L, and there are 400 matrices.

These parameters are for classification, if you want something else like regression you may have to use different dimensions and variables e.g. numLabels only makes sense for classification. HDF5 files have dimensions and variables, and they need to be specified separately. The dimensions that currennt needs specified in the HDF5 files are as follows:

numTimesteps: the total number of amino acids if you lined 
up all the proteins end to end
inputPattSize: the number of features per amino acid, in this case 20.
numSeqs: the number of proteins
numLabels: the number of classes we want to predict, in this case 4
maxLabelLength: the length of the longest string used to hold
the class label name
maxTargStringLength: to be honest I'm not sure about this one, I
just set it to be 5000 and things seem to work
maxSeqTagLength: this is another one I'm not sure about. I set
it to 800 and it seems to work.

Now we have specified the dimensions, now we want to specify the variables i.e. the actual data we will be using for features and labels. They are specified like so:

seqTags, size: maxSeqTagLength x numSeqs
numTargetClasses: size: 1
inputs, size: inputPattSize x numTimesteps
seqLengths, size: numSeqs
targetClasses, size: numTimesteps
labels, size: maxLabelLength x numLabels

You then just have to write the relevant data to each of the variables and run currennt. Some MATLAB code that does all this stuff is provided below. Note that this matlab file will just create one file for e.g. training. If you want a separate dataset for e.g. testing (strongly recommended) then you will need to run the code a second time changing the file names and protein dataset.

Tuesday 20 October 2015

Installing Currennt on ubuntu 14.04

This post is all about how to install Currennt on ubuntu 14.04. The next post will deal with how to write the data files that Currennt needs as well as actually running currennt. Currennt is a recurrent neural net library that is sped up using cuda. Installing it is not as simple as it could be.

First step: grap the source code from here: currennt on sourceforge. I am using

You'll first need to install cuda:

sudo apt-get update
sudo apt-get install cuda

You'll also have to install the netcdf library:

sudo apt-get install libnetcdf-dev

You should also install boost: sudo apt-get install libboost-all-dev. I have libboost1.55 installed already.

Now you can start the installation. Extract the Currennt zip file to a directory and cd to that directory. Now, create a folder called build and cd into that, then run make.

mkdir build
cd build
cmake ..

When I did this I got the error:nvcc fatal : Value 'compute_13' is not defined for option 'gpu-architecture'. This can be fixed by editing the CMakeLists.txt, change compute_13 to compute_30, and rerun cmake and make.

The next make got a little bit further, but I got the error: error: namespace "thrust" has no member "transform_reduce". This one can be fixed by adding the line: #include <thrust/transform_reduce.h>to the top of the file /currennt_lib/src/layers/Layer.hpp .

That was all I had to do this time, previously I have had a heap of different propblems when installing currennt on older computers, but this time seemed relatively painless.

Monday 5 October 2015

Numerically Checking Neural Net Gradients

It is highly recommended by everyone that implements neural networks that you should numerically check your gradient calculations. This is because it is very easy to introduce small errors into the back-propagation equations that will make it seem like the network is working, but it doesn't do quite as well as it should. A simple numerical check can let you know that all your numbers are right.

This tutorial will go through the process of how to numerically compute the derivatives for a simple 1 hidden layer neural network. The exact same process is used for checking the derivatives of e.g. LSTMs, which are much more complicated.

The main thing to remember is the error measure you are using for your neural network. If you are using Mean Squared Error then the error \(E = \dfrac{1}{N}\sum^N_{n=1} \sum^K_{k=1} (y_k - x_k)^2 \) is what we want to minimise. In the previous equation, \(y_k\) is the desired label and \(x_k\) is the network output. \(N\) is the minibatch size, if you are just putting in a single input at a time then \(N\) is one. This single number \(E\) tells us how good our network is at predicting the output, the lower the value of \(E\) the better. Our aim is to adjust the weights so that \(E\) gets smaller.

For a neural net with 1 hidden layer we have 2 sets of weights and 2 sets of biases. The output is computed as follows:

\( a = \sigma (i*W_1 + b_1) \)
\( x = \sigma ( a*W_2 + b_2) \)

\(i\) is our input vector, \(a\) is the hidden layer activation, and \(x\) is the network output.

The way to numerically check the gradient is to pick one of the weights e.g. element (1,1) of \(W_1\), and to add and subtract a small number to/from it e.g. 0.0001. This way we have \(W_1^+\) and \(W_1^-\) (note that only element (1,1) is changed, all the other weights stay the same for now). We now have to compute \(x^+\) and \(x^-\) using the new slightly modified weights. Then we can compute \(E^+\) and \(E^-\) using \(x^+\) and \(x^-\). The gradient of E is then \((E^+ - E^-)/(2*0.0001)\).

Now that we have the derivative of \(E\) with respect to weight (1,1) of \(W_1\), we have to do it for all the other weights as well. This follows the exact same procedure, we just add/subtract a small number from a different weight. The final matrix of derivatives should exactly match the gradient as calculated by back-propagation. This python code: implements both back-propagation and the numerical check for a simple single hidden layer nn.