July 31, 2017
Over the summer I looked a bit into practical machine learning, and decided to follow the fast.ai course. Their emphasis is more on the code rather than dousing you with heavy maths from the beginning. Which is nice for someone who prefers learning in a practical way, rather than reading dry theory.
What follows in this post is a couple of gotchas I encountered while trying to setup AWS and a local installation for this course.
So the first thing you learn in the course, is that you have to set up AWS. These are quite a few steps, including registering the AWS account, setting up secrets keys, the CLI, and applying for access to the p2.xlarge instance type.
You should definitely do this first, because gaining access to the p2.xlarge might take a few days.
But, what I do not think is fully explained is the pricing model of AWS. The p2.xlarge instance itself costs $0.9 per hour when running, which is mentioned. But the storage volume will also cost $0.1 per GB/month, regardless if the instance is running or not. On top of that, the elastic IPs you have tied up will cost you $0.005 per hour when the instance is stopped (i.e. most of the time).
That can easily cost you an extra ~$15 a month on top of the instance running cost. I don’t feel the course is transparent enough about this.
Definitely register the account and apply for the p2.xlarge instance, because you’ll be needing the computing power. But while you’re waiting for the application to be approved, you should set up your environment locally.
What I found was that there was a bit of work understanding Jupyter Notebooks, the surrounding libraries, and how to prepare the data before you can even start training your ML model. Learning that stuff while running on an heavy duty AWS instance is a bit redundant, not to mention costly.
After installing Anaconda, setting up an environment for Python then requires running
conda create -n py27 python=2.7 anaconda in a terminal that supports BASH. The course runs with Python v2.7, but I used Python v3.6 with only minor changes to the code. There are some good docs on setting up the different envs.
source activate py27 in order to run from that environment. Inside the env you can install packages as necessary (
conda install theano, etc), and run the notebook (
One gotcha for me was that the Jupyter Notebook did not pick up the dependencies I had installed inside my enviromenent. Turns out you have to do
conda install nb_conda so your Notebook can select a specific environment, and thereby use the correct packages. Spent far too long trying to figure this out.
Another gotcha was that I had to install the 1.2.2 version of Keras (
pip install keras==1.2.2 or
pip install git+git://email@example.com), because the provided course notebooks and utils are not compatible with the newer Keras 2.0 API.
You will also have to edit the
keras.json file located in
~/.keras so you use Theano as the backend for the course:
"image_dim_ordering": "th" "backend": "theano"
This is because the settings will default to Tensorflow.
Minor sidenote for macOS people using fishshell (like me). If you want to activate your environment without using BASH, you can add the following to your
source (conda info --root)/etc/fish/conf.d/conda.fish
This’ll allow you to
activate py27 with fish.
Hope this will be of some help to others getting started on this course!