Posts tagged with: sono

Training a Bird Classifier with Tensorflow and TFLearn

If you are new to our AMIs, head over to our Tensorflow README on how to get started, or check out our previous blog entry on getting started with TensorFlow Intro This entry is a walkthrough using the our latest Tenorflow AMI to train a model based on the example in Adam Geighty’s Medium article on Machine Learning . I am specifically using a g2.2xlarge EC2 instance to train the model to show the training benefits of using GPU instance over using a CPU instance. Adam Geighty’s article articulated a number of things really well - his code example split out the different steps needed to train the model and the steps matched with sections of the article itself, allowing you to get a good understanding of what he was explaining. The example he used is based on the Cifar-10 example code and uses a combination of datasets to train a bird classifier. You can read more about Cifar datasets here and the referenced TFLearn code example here.     Bitfusion Ubuntu 14 TensorFlow AMI Launch on AWS! Creating the Classifier Before we start, create a directory named bird_classifier in the ubuntu users home directory. We will carry out all operations in this directory as the ubuntu user. mkdir ~/bird_classifier cd ~/bird_classifier Next we need our dataset to work with. You can download the dataset referenced in the article from S3. It’s a combination of the the Cifar 10 dataset and Caltech-UCSD Birds-200–2011 data set. In total there are ~74K images. wget unzip From this you will get the dataset: full_dataset.pkl Download the Training Code Next we need to get the code used in the article. I have provided a couple options to obtain it below: Option 1 - use wget: The code below will pull from a gist and save it as wget -O Option 2 - copy the code below to a file that is in the same directory as full_dataset.pkl. In my case, I copied it to a file called from __future__ import division, print_function, absolute_import # Import tflearn and some helpers import tflearn from tflearn.data_utils import shuffle from tflearn.layers.core import input_data, dropout, fully_connected from tflearn.layers.conv import conv_2d, max_pool_2d from tflearn.layers.estimator import regression from tflearn.data_preprocessing import ImagePreprocessing from tflearn.data_augmentation import ImageAugmentation import pickle # Load the data set X, Y, X_test, Y_test = pickle.load(open("full_dataset.pkl", "rb")) # Shuffle the data X, Y = shuffle(X, Y) # Make sure the data is normalized img_prep = ImagePreprocessing() img_prep.add_featurewise_zero_center() img_prep.add_featurewise_stdnorm() # Create extra synthetic training data by flipping, rotating and blurring the # images on our data set. img_aug = ImageAugmentation() img_aug.add_random_flip_leftright() img_aug.add_random_rotation(max_angle=25.) img_aug.add_random_blur(sigma_max=3.) # Define our network architecture: # Input is a 32x32 image with 3 color channels (red, green and blue) network = input_data(shape=[None, 32, 32, 3], data_preprocessing=img_prep, data_augmentation=img_aug) # Step 1: Convolution network = conv_2d(network, 32, 3, activation='relu') # Step 2: Max pooling network = max_pool_2d(network, 2) # Step 3: Convolution again network = conv_2d(network, 64, 3, activation='relu') # Step 4: Convolution yet again network = conv_2d(network, 64, 3, activation='relu') # Step 5: Max pooling again network = max_pool_2d(network, 2) # Step 6: Fully-connected 512 node neural network network = fully_connected(network, 512, activation='relu') # Step 7: Dropout - throw away some data randomly during training to prevent over-fitting network = dropout(network, 0.5) # Step 8: Fully-connected neural network with two outputs (0=isn't a bird, 1=is a bird) to make the final prediction network = fully_connected(network, 2, activation='softmax') # Tell tflearn how we want to train the network network = regression(network, optimizer='adam', loss='categorical_crossentropy', learning_rate=0.001) # Wrap the network in a model object model = tflearn.DNN(network, tensorboard_verbose=0, checkpoint_path='bird-classifier.tfl.ckpt') # Train it! We'll do 100 training passes and monitor it as it goes., Y, n_epoch=100, shuffle=True, validation_set=(X_test, Y_test), show_metric=True, batch_size=96, snapshot_epoch=True, run_id='bird-classifier') # Save model when training is complete to a file"bird-classifier.tfl") print("Network trained and saved as bird-classifier.tfl!")   Train it! At this point, all we need to do is run our python script. The script carries out the following functions: Will run through our dataset 100 times (epoch=100) Takes roughly ~60 minutes (This is on a g2.2xlarge – EC2 Instance with a single GPU) Produce our model file: bird-classifier.tfl. $ python2 # OUTPUT BELOW I tensorflow/stream_executor/] successfully opened CUDA library locally I tensorflow/stream_executor/] successfully opened CUDA library locally I tensorflow/stream_executor/] successfully opened CUDA library locally I tensorflow/stream_executor/] successfully opened CUDA library locally .. ..... ........ Concatenated output ........... ..... .. I tensorflow/core/common_runtime/gpu/] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) --------------------------------- Run id: bird-classifier Log directory: /tmp/tflearn_logs/ --------------------------------- Preprocessing... Calculating mean over all dataset (this may take long)... .. ..... ........ Concatenated output ........... ..... .. -- Training Step: 59200 | total loss: 0.16163 | Adam | epoch: 100 | loss: 0.16163 - acc: 0.9332 | val_loss: 0.24135 - val_acc: 0.9387 -- iter: 56780/56780 -- Network trained and saved as bird-classifier.tfl! Inference (Let’s test some images) The script above created out trained model bird-classifier.tfl. Next, we will download the inference script provided in the article and some images from the internet and test it. The code below will save the inference script as   wget -O Next, we will create a directory to store our test images and download some creative commons images from the net. The test set has a total of 6 images – three that are birds and three that are not.   mkdir -p test_images cd test_images wget -O bird_bullocks_oriole.jpg wget -O bird_mount_bluebird.jpg wget -O bird_african_fish_eagle.jpg wget -O not_a_bird_stop_sign.jpg,_Antelope_Island,_Utah_\(4594258122\).jpg wget -O not_a_bird_airplane.jpg wget -O not_a_bird_creativecommons_logo.jpg Let’s run our inference script against the images. I created a simple loop to go through and test each image below: # Make sure you are in the directory where you downloaded to for f in test_images/*.jpg; do echo "File: ${f}"; python2 ${f} 2>/dev/null; echo ""; done Here’s our output: File: test_images/bird_african_fish_eagle.jpg That's a bird! File: test_images/bird_bullocks_oriole.jpg That's a bird! File: test_images/bird_mount_bluebird.jpg That's a bird! File: test_images/not_a_bird_airplane.jpg That's not a bird! File: test_images/not_a_bird_creativecommons_logo.jpg That's not a bird! File: test_images/not_a_bird_stop_sign.jpg That's a bird! Pretty good, we got one false positive in the bunch (“not_a_bird_stop_sign.jpg”). I left this in here as it is reveals an interesting anomoly, Adam’s article has a section that speaks to this: “How accurate is 95% accurate?” Lastly, if you would like run the inference script with a single image rather than using the bash forloop I have above, run the following: python2 test_images/bird_african_fish_eagle.jpg GPU vs. CPU Performance To see the value of a GPU, I ran the training overnight with a CPU instance (C4.4xl). The results are below: c4.4xl - $0.838 per Hour 16 cores (hyper threaded) maxed out ~ 123 Minutes g2.2xl - $0.65 per Hour single GPU ~ 65 Minutes   Conclusion There you have it - a trained bird classifier based on the Medium Article using Bitfusion’s Tensorflow AMI. If you are interested in scientific computing or deep learning, I encourage you to take a look our AMI offerings. They are sure to speed up your development, prototyping and GPU cluster creation. Additionally, if you have trained models and are looking for solid infrastructure to serve them, contact us here. Questions or comments? Please post them in the comment section below or join our community Bitfusion-AWS Slack Channel. Get Started!   Are you currently developing AI applications, but spending too much time wrangling machines and setting up your infrastructure? We are currently offering a Free 30-Day Trial of Bitfusion Flex!  
Read More

tensorflow tutorial

Bitfusion Boost 16 GPU Caffe Cluster - Quick Start on Amazon AWS with CloudFormation

Our AWS Marketplace AMIs have been updated since this post to make launching them with Boost even easier. Please refer to our latest tutorial post titled: Deploy Bitfusion Boost on AWS faster than ever We recently published our Boost AMIs to the AWS market and walked through potential cluster configurations. Today, we are going to expand on that and set up a Bitfusion Boost cluster on AWS. We will be explicitly setting this up for the Caffe Deep Learning Framework. At the end of this tutorial, you will have a cluster comprised of: One g2.8xlarge as a client where the application runs Three g2.8xlarges as servers This configuration will give your application a total of 16GPUs!   1. Subscribe to the Bitfusion AMIs This walkthrough leverages AWS's CloudFormation (CFN) templates. Using our template will enable you to get a Bitfusion Boost cluster up and running in minutes. In order to utilize the CloudFormation template, you need to be signed into the AWS console and be an active subscriber to the AMIs used in the template. What does it mean to Subscribe to a Product? Subscribing to a product means that you have accepted the terms of the product as shown on the product’s listing page, including pricing terms and the software seller’s End User License Agreement, and that you agree to use such product in accordance with the AWS Customer Agreement. All Bitfusion AMIs are priced on an hourly basis and you will only incur charges on top of the base AWS instance charges when the cluster is up and running - simply subscribing to one of our AMIs does not cost you anything. WARNING: In Step 1 and Step 2, DO NOT launch directly using the "1-Click Launch" option as this will automatically launch an instance. This is not required as all instances will launch via the CFN. For both AMIs below, make sure the "Manual Launch" tab is selected, then simply click on "Accept Software Terms." [container] [row] [column md="6"] Step 1: Accept Bitfusion Boost Server Software Terms Boost Server AMI Software Terms [/column] [column md="6"] Step 2: Accept Bitfusion Boost Caffe Client Software Terms Boost Caffe Client AMI Software Terms [/column] [/row] [/container]   2. Create an AWS Key Pair The AWS key pair uses public-key cryptography to provide secure login to your AWS cluster.  You will need create one to access the Bitfusion Client, unless you have created one previously, in which case you can re-use that key and skip directly to Section 3. Create Key Pair   [container] [row] [column md="4"] Step 1: Select us-east-1 as your region Our CFN template currently only supports us-east-1 [/column] [column md="4"] Step 2: Create and name your key pair In the navigation pane, under "Network & Security", select "Key Pairs". Then choose the "Create Key Pair" button. [/column] [column md="4"] Step 3: Download and save the key pair The key pair will automatically download. Make sure you keep this file as it is required to login to the client machine. [/column] [/row] [/container]   3. Create a Bitfusion Boost Cluster The Bitfusion Boost template is specifically configured for running a Boost Cluster. If you modify any of the AWS template configurations, you may be unable to run the cluster or tools. Launch Bitfusion AWS Template   [container] [row] [column md="4"] Step 1: Accept the template Accept the template already specified, and click "Next". [/column] [column md="4"] Step 2: Specify the template parameters On the specify details page, enter a name for your cluster (e.g. BitfusionCluster), accept the default parameters, select your "KeyName" and then click "Next". [/column] [column md="4"] Step 3: Accept default options On the options page, accept the defaults and click "Next" [/column] [/row] [/container]   4. Launch the Cluster Finish creating your AWS cluster and login to the Bitfusion Boost client. [container] [row] [column md="4"] Step 1: Create the cluster On the review page, check the box that allows CloudFormation to create the necessary IAM roles and click "Create". [/column] [column md="4"] Step 2: Monitor Provisioning Process The cluster stack spins up over a period of 10 to 15 minutes. Watch for the status to change from CREATE_IN_PROGRESS to CREATE_COMPLETE.  You may need to refresh the page to see the status change. [/column] [column md="4"] Step 3: Login to the Bitfusion Client From the Amazon EC2 console page, click on the client. Copy the IP address and login via SSH: ssh -l ubuntu -i [/column] [/row] [/container] 5. Take It for a Spin Once you have logged in you can query how many GPUs you have and test out Caffe. How many GPUs do you have? You can query the number of GPUs available to you with the following command: bfboost client /usr/local/cuda-7.0/samples/bin/x86_64/linux/release/deviceQuery Caffe Run the following commands to test out Caffe and see it running on all 16 GPUs: cd /opt/caffe-gpu ./data/mnist/ ./examples/mnist/ bfboost client "./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt -gpu all" For more information on using Boost please refer to our official documentation.   6. Deleting the Cluster Select your cluster on the Cloud Formation Management page and click Delete Stack. For more information, see Deleting a Stack on the AWS CloudFormation Console.
Read More



Bitfusion VMware
Solution Guide

Download Guide

Bitfusion Elastic AI Platform

Try FlexDirect Today