By Vedant Misra | September 30, 2016
AWS just announced the release of their new p2.16xlarge instances, making this an especially great time to get started using a cloud service for deep learning. If you’re a developer or engineer interested in learning what all the fuss is about, the best way to learn is to spin up an instance and try to build something.
We’ve written up a quick getting started guide on the best options we found for quickly creating a versatile development environment. This writeup will help you set up a GPU-enabled EC2 instance capable of running Tensorflow, Keras, Caffe, Theano, Tensorboard, and various other useful packages.
Choose an instance type
While we initially used the
g2 instance family, we’re switching over to p2
instances after running a few quick benchmarks. We found that training
on a p2.xlarge took less than half as long per batch as on a g2.2xlarge,
presumably owing to the NVidia K80s in the p2.xlarge, with their 12 GiB of GPU
memory (compared to 8GiB in the NVidia K520). In our case, it was a 2.2x
speedup for an instance that’s 1.4x the price on an hourly basis, which makes
it a no-brainer to use the p2.xlarge.
Set up an instance
So, after creating an EC2 account, boot up a p2.xlarge and install Ubuntu 14.04. While you could use Ubuntu 16.04, a newer long-term-support (LTS) Ubuntu release, many of NVidia’s GPU drivers haven’t been released for 16.04, making 14.04 the smart choice for now.
Configure your instance to have at least 20GB of storage and to use a security group that allows inbound connections via SSH. Later you might want to open up ports for other services you’ll run on the machine, but for now, SSH is enough.
Prepare the instance
The first time you SSH into your instance, run the following to update all installed packages:
sudo apt-get update sudo apt-get upgrade sudo apt-get install build-essential cmake g++ gfortran git pkg-config python-dev software-properties-common wget sudo apt-get autoremove sudo rm -rf /var/lib/apt/lists/*
Next, install the NVidia drivers that you need to take advantage of all of the GPU power in your instance. Find your graphics card model with
lspci | grep -i nvidia
Visit the NVidia site to find the latest
drivers for your platform, but don’t download anything yet. There’s a cleaner
way to install the drivers, which is to use
apt-get. Check this
PPA to see if the
drivers you need exist in the repository—generally, the more recent the
driver, the faster and more stable it is.
sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get update sudo apt-get install nvidia-367 # Or whichever driver package is labelled # the "current long-lived branch release"
Restart your system:
sudo shutdown -r 0
When you reboot, you’ll have NVidia drivers installed.
Setting up docker
We originally built a custom AMI as a deep learning environment but quickly switched over to using docker instead. Using docker means you won’t need to install anything on your Ubuntu 14.04 machine besides docker itself, and then you can use docker to install everything else.
Docker saves you the trouble of compiling binaries from scratch, checking out source code, downloading and reconciling dependencies, and other things that aren’t any fun.
Follow the instructions here to install docker.
Then, follow the instructions here to install nvidia-docker, which you’ll need to take advantage of the NVidia drivers you just installed. Usually, to install nvidia-docker, you can just use their install script:
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
Selecting useful images
We recommend one of the following two options for docker images for your setup:
This popular docker image is an all-in-one environment. The image comes with TensorFlow, Theano, Torch, Caffe Keras, Jupyter/iPython, and the standard Python numerical computing packages (matplotlib, scikit learn, pandas, scipy, numpy). This is great if you want to get started quickly in a way where things just work.
If you prefer a more modular setup, you can run multiple docker containers, each with one framework. If you prefer this, use Kaixhin’s images here
If you go with option 1 above, which is probably the simplest approach for now, you’ll need to clone the repository containing the GPU dockerfile, build a docker container using that Dockerfile, and run it on your machine.
git clone https://github.com/saiprashanths/dl-docker docker build -t floydhub/dl-docker:gpu -f Dockerfile.gpu . # Wait a long time... nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v /sharedfolder:/root/sharedfolder floydhub/dl-docker:gpu bash
Note the paths provided for creating a volume using
-v in the
command. Modify this to reflect whatever directory you prefer to use as a
shared volume on your docker container.
With the running docker container, you’re all set. If you don’t like writing and running code in bash, use Jupyter instead by visiting the-address-of-your-new-machine:8888 in your browser. You can always find the public DNS address of your instance by viewing information for your instance in the EC2 pane in the AWS management console.
This will give you access to a browser-based environment where you can write and run
iPython notebook files in such a way that
import tensorflow and
import keras will