Getting started with deep learning

By Vedant Misra | September 30, 2016

AWS just announced the release of their new p2.16xlarge instances, making this an especially great time to get started using a cloud service for deep learning. If you’re a developer or engineer interested in learning what all the fuss is about, the best way to learn is to spin up an instance and try to build something.

We’ve written up a quick getting started guide on the best options we found for quickly creating a versatile development environment. This writeup will help you set up a GPU-enabled EC2 instance capable of running Tensorflow, Keras, Caffe, Theano, Tensorboard, and various other useful packages.

Choose an instance type

While we initially used the g2 instance family, we’re switching over to p2 instances after running a few quick benchmarks. We found that training AlexNet on a p2.xlarge took less than half as long per batch as on a g2.2xlarge, presumably owing to the NVidia K80s in the p2.xlarge, with their 12 GiB of GPU memory (compared to 8GiB in the NVidia K520). In our case, it was a 2.2x speedup for an instance that’s 1.4x the price on an hourly basis, which makes it a no-brainer to use the p2.xlarge.

Set up an instance

So, after creating an EC2 account, boot up a p2.xlarge and install Ubuntu 14.04. While you could use Ubuntu 16.04, a newer long-term-support (LTS) Ubuntu release, many of NVidia’s GPU drivers haven’t been released for 16.04, making 14.04 the smart choice for now.

Configure your instance to have at least 20GB of storage and to use a security group that allows inbound connections via SSH. Later you might want to open up ports for other services you’ll run on the machine, but for now, SSH is enough.

Prepare the instance

The first time you SSH into your instance, run the following to update all installed packages:

sudo apt-get update  
sudo apt-get upgrade  
sudo apt-get install build-essential cmake g++ gfortran git pkg-config python-dev software-properties-common wget
sudo apt-get autoremove 
sudo rm -rf /var/lib/apt/lists/*

Next, install the NVidia drivers that you need to take advantage of all of the GPU power in your instance. Find your graphics card model with

lspci | grep -i nvidia

Visit the NVidia site to find the latest drivers for your platform, but don’t download anything yet. There’s a cleaner way to install the drivers, which is to use apt-get. Check this PPA to see if the drivers you need exist in the repository—generally, the more recent the driver, the faster and more stable it is.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367 # Or whichever driver package is labelled
                                # the "current long-lived branch release"

Restart your system:

sudo shutdown -r 0

When you reboot, you’ll have NVidia drivers installed.

Setting up docker

We originally built a custom AMI as a deep learning environment but quickly switched over to using docker instead. Using docker means you won’t need to install anything on your Ubuntu 14.04 machine besides docker itself, and then you can use docker to install everything else.

Docker saves you the trouble of compiling binaries from scratch, checking out source code, downloading and reconciling dependencies, and other things that aren’t any fun.

Follow the instructions here to install docker.

Then, follow the instructions here to install nvidia-docker, which you’ll need to take advantage of the NVidia drivers you just installed. Usually, to install nvidia-docker, you can just use their install script:

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

Selecting useful images

We recommend one of the following two options for docker images for your setup:

  • This popular docker image is an all-in-one environment. The image comes with TensorFlow, Theano, Torch, Caffe Keras, Jupyter/iPython, and the standard Python numerical computing packages (matplotlib, scikit learn, pandas, scipy, numpy). This is great if you want to get started quickly in a way where things just work.

  • If you prefer a more modular setup, you can run multiple docker containers, each with one framework. If you prefer this, use Kaixhin’s images here

If you go with option 1 above, which is probably the simplest approach for now, you’ll need to clone the repository containing the GPU dockerfile, build a docker container using that Dockerfile, and run it on your machine.

   git clone https://github.com/saiprashanths/dl-docker
   docker build -t floydhub/dl-docker:gpu -f Dockerfile.gpu .
   # Wait a long time...

   nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v /sharedfolder:/root/sharedfolder floydhub/dl-docker:gpu bash

Note the paths provided for creating a volume using -v in the docker run command. Modify this to reflect whatever directory you prefer to use as a shared volume on your docker container.

With the running docker container, you’re all set. If you don’t like writing and running code in bash, use Jupyter instead by visiting the-address-of-your-new-machine:8888 in your browser. You can always find the public DNS address of your instance by viewing information for your instance in the EC2 pane in the AWS management console.

This will give you access to a browser-based environment where you can write and run iPython notebook files in such a way that import tensorflow and import keras will just work.