Setting Up AWS for Data Mining (and jupyter notebooks)

My experience of setting up deep learning environment on AWS, including how to create an instance, keep the data aftering you shut it down, set up budget warning and edit your file on the server without being charged.

Set up AWS account

Go to https://aws.amazon.com and set up your account

Tips: If you are a student, you may check AWS Educate and see if you’re eligible for $100.

It may take a few days for Amazon to confirm school info and send the promo code.

THE EASY WAY (for spot instance)

fast.ai provides scripts to automate the whole process.

AWS Spot instances Tutorial

Initialization (Only done once)

1
2
brew install jq
git clone --depth=1 https://github.com/slavivanov/ec2-spotter.git
Create an Instance
  • ami —Depending on which region we have picked and whether we want to use Fast.ai image or the Amazon one, we need to select an image (Amazon images below are updated to version 1.3 from April 2017):
Region/Provider Fast.ai Amazon
us-east-1 ami-31ecfb26 ami-fb8e19ed
us-west-2 ami-bc508adc ami-638c1e03
eu-west-1 ami-b43d1ec7 ami-c5afaaa3
  • subnetId — Use the subnet ID that create_vpc.sh printed.
  • securityGroupId — Use the security group ID that create_vpc.sh printed.
1
2
3
4
5
. ec2-spotter/fast_ai/create_vpc.sh
. ec2-spotter/fast_ai/start_spot_no_swap.sh --ami ami-53b23433 --subnetId subnet-9f69c3d6 --securityGroupId sg-a62f2ede
# instance_ip seen in the output
instance_ip=instance_ip_from_previous_step
ssh -i ~/.ssh/aws-key-fast-ai.pem ubuntu@$instance_ip
Detach the volume and make it permanent

The script will also terminate the instance from Step 1

1
sh ec2-spotter/fast_ai/config_from_instance.sh

Once you see the message, it is done.

Launch an instance (whenever you need one)

1
sh fast_ai/start_spot.sh

Wait until the Initializing disappears from status-checks tag on Instances panel.

Connect
1
ssh -i YOUR-KEY.pem ubuntu@YOUR-PUBLIC-DNS -L8888:localhost:8888
Terminate the instance
1
2
# Change $instance_id to your instance id obviously.
aws ec2 terminate-instances --instance-ids $instance_id

This is by far the most effortless way to set up an AWS


If you want do more customization, then you can manually set it up.

Create Instance

http://wiki.fast.ai/index.php/AWS_install fast.ai’s tutorial on setting up AWS, it provides some scripts to simplify the steps to launch your AWS. It also gives solutions to common problems when trying to set up AWS server.

http://forums.fast.ai/t/great-summary-of-how-to-use-aws/7651 A clear instruction to help set up AWS server and run jupyter notebooks.

The standard choice is to launch an instance on the EC2 tag. But a more budget-friendly choice is to launch a spot instance.

Later, this chapter will cover how to combine some of choices the to save money.

Spot Instance

https://aws.amazon.com/ec2/spot/

500h Free Tier

t2-micro instance

Run your instance

Find following instructions by clicking Connect in the Instances panel.

To access your instance:

  1. Open an SSH client, you can use Terminal on Mac OS.
  2. Rename your private key file from your_key.pem.txt to your_key.pem, and move to desired position. The wizard automatically detects the key you used to launch the instance.
  3. Your key must not be publicly viewable for SSH to work. Use this command if needed:
1
chmod 400 YOUR-KEY.pem
  1. Connect to your instance using its Public DNS:
1
ssh -i YOUR-KEY.pem ubuntu@YOUR-PUBLIC-DNS -L8888:localhost:8888

YOUR-PUBLIC-DNS can be found at EC2 - Instances panel, click connect.

Using fast.ai AMI

keep updated

1
2
3
cd fastai
git pull
conda env update # once a month

check for status

1
2
3
which python
python --version
pip list --format=legacy

Persistent Volume

AWS Spot instances Tutorial

Spot instance is a budget friendly choice. One downside about AWS spot instance is that, once you shut down the instance, everything gets lost. The tutorial provides two ways to solve the problem, the first is to create an external EBS volume, the second is to swap root volume. My personal preference is the second one. See comparison and details in the link.

Keep the AMI running (with lower cost)

Tutorial

If you want to be able to edit file on the server, but do not wish to pay for the GPU which you are not using, one option is to switch to free t2-micro temporarily.

Not fulfilled yet.

Budget Warning

Create Budget

Billing -> Dashboard - Budgets -> Create Budgets

Simple Notification Service

This is an option if you wish to get the warning notification elsewhere, SMS for example.

Create Topic
Edit Topic Policy
Subscript to Topic

(other) not using fast.ai AMI

You’ll need to install all the packages and set the environment courself. I am using the fast.ai AMI, so I don’t need this yet.

Here is a link http://www.diegoacuna.me/installing-cuda-8-on-ubuntu-16-04/

1
2
3
4
5
6
7
8
9
10
11
# install python3, numpy, jupyter, scipy, and matplotlib...
sudo apt-get update
sudo apt-get install python3-pip
export LC_ALL=C
sudo -H pip3 install pip --upgrade
sudo -H pip3 install numpy jupyter scipy matplotlib
sudo apt-get install emacs24
# install pytorch

sudo reboot
nvcc --version

https://thepcspy.com/read/making-ssh-secure/