GET STARTED

Intro to TensorBoard

Intro to TensorBoard Now that we’re constantly validating the data and saving our model, we can start thinking of ways to visualize the ins and outs of our model or ways to do exploratory data analysis of our model while or after it is done training. In Dandelion Mane’s talk at the TensorFlow Dev Summit 2017, he described it as a flashlight to shine on the black box of deep neural networks. Sometimes shining a bright light is ill-advised.
Read More

tutorial

Monitoring and Checkpointing in TensorFlow

In our last post we gave a basic introduction to TensorFlow 1.0. What we want to do now is take our foundation and move it forward. One of the most important parts of deep learning is understanding what is going on while the code is running. As our problems get more complicated and our datasets get larger, training time can go from minutes to days. If we’ve picked a model with poor hyper-parameters or just a bad model in general, we don’t want to have to wait hours to make an adjustment to our model. Or if we have great hyperparameters and models, but don’t tell the model to train for enough steps we don't want to start from scratch. Or do we…
Read More

tutorial

Intro to TensorFlow

In a series of blog posts, we want to show a step-by-step guide on how to get from a basic TensorFlow model to best-in-class architectures. Deep learning is a hot topic and it’s easy to find starting and advanced resources, but it can be difficult to see how to get from the intro material to more advanced models. We at Bitfusion would like to help address the lack of points along the journey to becoming a Deep Learning expert.
Read More

tutorial

Training a Bird Classifier with Tensorflow and TFLearn

If you are new to our AMIs, head over to our Tensorflow README on how to get started, or check out our previous blog entry on getting started with TensorFlow Intro This entry is a walkthrough using the our latest Tenorflow AMI to train a model based on the example in Adam Geighty’s Medium article on Machine Learning . I am specifically using a g2.2xlarge EC2 instance to train the model to show the training benefits of using GPU instance over using a CPU instance. Adam Geighty’s article articulated a number of things really well - his code example split out the different steps needed to train the model and the steps matched with sections of the article itself, allowing you to get a good understanding of what he was explaining. The example he used is based on the Cifar-10 example code and uses a combination of datasets to train a bird classifier. You can read more about Cifar datasets here and the referenced TFLearn code example here.     Bitfusion Ubuntu 14 TensorFlow AMI Launch on AWS! Creating the Classifier Before we start, create a directory named bird_classifier in the ubuntu users home directory. We will carry out all operations in this directory as the ubuntu user. mkdir ~/bird_classifier cd ~/bird_classifier Next we need our dataset to work with. You can download the dataset referenced in the article from S3. It’s a combination of the the Cifar 10 dataset and Caltech-UCSD Birds-200–2011 data set. In total there are ~74K images. wget https://s3-us-west-2.amazonaws.com/ml-is-fun/data.zip unzip data.zip From this you will get the dataset: full_dataset.pkl Download the Training Code Next we need to get the code used in the article. I have provided a couple options to obtain it below: Option 1 - use wget: The code below will pull from a gist and save it as bird_classifier.py. wget -O bird_classifier.py https://gist.githubusercontent.com/sono-bfio/89a91da65a12175fb1169240cde3a87b/raw/b859d1673e0a81ebd42d7799d7c1df71517c175b/bird_classifier.py Option 2 - copy the code below to a file that is in the same directory as full_dataset.pkl. In my case, I copied it to a file called bird_classifier.py. from __future__ import division, print_function, absolute_import # Import tflearn and some helpers import tflearn from tflearn.data_utils import shuffle from tflearn.layers.core import input_data, dropout, fully_connected from tflearn.layers.conv import conv_2d, max_pool_2d from tflearn.layers.estimator import regression from tflearn.data_preprocessing import ImagePreprocessing from tflearn.data_augmentation import ImageAugmentation import pickle # Load the data set X, Y, X_test, Y_test = pickle.load(open("full_dataset.pkl", "rb")) # Shuffle the data X, Y = shuffle(X, Y) # Make sure the data is normalized img_prep = ImagePreprocessing() img_prep.add_featurewise_zero_center() img_prep.add_featurewise_stdnorm() # Create extra synthetic training data by flipping, rotating and blurring the # images on our data set. img_aug = ImageAugmentation() img_aug.add_random_flip_leftright() img_aug.add_random_rotation(max_angle=25.) img_aug.add_random_blur(sigma_max=3.) # Define our network architecture: # Input is a 32x32 image with 3 color channels (red, green and blue) network = input_data(shape=[None, 32, 32, 3], data_preprocessing=img_prep, data_augmentation=img_aug) # Step 1: Convolution network = conv_2d(network, 32, 3, activation='relu') # Step 2: Max pooling network = max_pool_2d(network, 2) # Step 3: Convolution again network = conv_2d(network, 64, 3, activation='relu') # Step 4: Convolution yet again network = conv_2d(network, 64, 3, activation='relu') # Step 5: Max pooling again network = max_pool_2d(network, 2) # Step 6: Fully-connected 512 node neural network network = fully_connected(network, 512, activation='relu') # Step 7: Dropout - throw away some data randomly during training to prevent over-fitting network = dropout(network, 0.5) # Step 8: Fully-connected neural network with two outputs (0=isn't a bird, 1=is a bird) to make the final prediction network = fully_connected(network, 2, activation='softmax') # Tell tflearn how we want to train the network network = regression(network, optimizer='adam', loss='categorical_crossentropy', learning_rate=0.001) # Wrap the network in a model object model = tflearn.DNN(network, tensorboard_verbose=0, checkpoint_path='bird-classifier.tfl.ckpt') # Train it! We'll do 100 training passes and monitor it as it goes. model.fit(X, Y, n_epoch=100, shuffle=True, validation_set=(X_test, Y_test), show_metric=True, batch_size=96, snapshot_epoch=True, run_id='bird-classifier') # Save model when training is complete to a file model.save("bird-classifier.tfl") print("Network trained and saved as bird-classifier.tfl!")   Train it! At this point, all we need to do is run our python script. The script carries out the following functions: Will run through our dataset 100 times (epoch=100) Takes roughly ~60 minutes (This is on a g2.2xlarge – EC2 Instance with a single GPU) Produce our model file: bird-classifier.tfl. $ python2 bird_classifier.py # OUTPUT BELOW I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.7.5 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally .. ..... ........ Concatenated output ........... ..... .. I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) --------------------------------- Run id: bird-classifier Log directory: /tmp/tflearn_logs/ --------------------------------- Preprocessing... Calculating mean over all dataset (this may take long)... .. ..... ........ Concatenated output ........... ..... .. -- Training Step: 59200 | total loss: 0.16163 | Adam | epoch: 100 | loss: 0.16163 - acc: 0.9332 | val_loss: 0.24135 - val_acc: 0.9387 -- iter: 56780/56780 -- Network trained and saved as bird-classifier.tfl! Inference (Let’s test some images) The script above created out trained model bird-classifier.tfl. Next, we will download the inference script provided in the article and some images from the internet and test it. The code below will save the inference script as infer.py   wget -O infer.py https://gist.githubusercontent.com/ageitgey/a40dded08e82e59724c70da23786bbf0/raw/7c78536295f1ab8cce62d5c63ed57212cafd8950/r_u_a_bird.py Next, we will create a directory to store our test images and download some creative commons images from the net. The test set has a total of 6 images – three that are birds and three that are not.   mkdir -p test_images cd test_images wget -O bird_bullocks_oriole.jpg http://www.5ensesmag.com/wp-content/uploads/2013/03/800px-Bullocks_Oriole.jpg wget -O bird_mount_bluebird.jpg http://climate.audubon.org/sites/default/files/bird_images/Mountain_Bluebird_FlickrCC_1.jpg wget -O bird_african_fish_eagle.jpg http://www.nature.org/cs/groups/webcontent/@web/@africa/documents/media/african-fish-eagle-720x400.jpg wget -O not_a_bird_stop_sign.jpg https://upload.wikimedia.org/wikipedia/commons/f/fd/Free_creative_commons_Rural_Stop_Landscape,_Antelope_Island,_Utah_\(4594258122\).jpg wget -O not_a_bird_airplane.jpg http://blogs.voanews.com/student-union/files/2012/01/airplane-flickr-shyb.jpg wget -O not_a_bird_creativecommons_logo.jpg https://www.raspberrypi.org/wp-content/uploads/2014/03/creative_commons.j Let’s run our inference script against the images. I created a simple loop to go through and test each image below: # Make sure you are in the directory where you downloaded infer.py to for f in test_images/*.jpg; do echo "File: ${f}"; python2 infer.py ${f} 2>/dev/null; echo ""; done Here’s our output: File: test_images/bird_african_fish_eagle.jpg That's a bird! File: test_images/bird_bullocks_oriole.jpg That's a bird! File: test_images/bird_mount_bluebird.jpg That's a bird! File: test_images/not_a_bird_airplane.jpg That's not a bird! File: test_images/not_a_bird_creativecommons_logo.jpg That's not a bird! File: test_images/not_a_bird_stop_sign.jpg That's a bird! Pretty good, we got one false positive in the bunch (“not_a_bird_stop_sign.jpg”). I left this in here as it is reveals an interesting anomoly, Adam’s article has a section that speaks to this: “How accurate is 95% accurate?” Lastly, if you would like run the inference script with a single image rather than using the bash forloop I have above, run the following: python2 infer.py test_images/bird_african_fish_eagle.jpg GPU vs. CPU Performance To see the value of a GPU, I ran the training overnight with a CPU instance (C4.4xl). The results are below: c4.4xl - $0.838 per Hour 16 cores (hyper threaded) maxed out ~ 123 Minutes g2.2xl - $0.65 per Hour single GPU ~ 65 Minutes   Conclusion There you have it - a trained bird classifier based on the Medium Article using Bitfusion’s Tensorflow AMI. If you are interested in scientific computing or deep learning, I encourage you to take a look our AMI offerings. They are sure to speed up your development, prototyping and GPU cluster creation. Additionally, if you have trained models and are looking for solid infrastructure to serve them, contact us here. Questions or comments? Please post them in the comment section below or join our community Bitfusion-AWS Slack Channel. Get Started!   Are you currently developing AI applications, but spending too much time wrangling machines and setting up your infrastructure? We are currently offering a Free 30-Day Trial of Bitfusion Flex!  
Read More

tensorflow tutorial

Easy TensorFlow Model Training on AWS

Recently Google released TensorFlow 0.8 which amongst other features provides distributed computing support. While this is great for power users, the most important step for most people trying to get started with machine learning or deep leaning is simply to have a powerful and pre-configured instance. To solve this problem, we recently released Bitfusion Ubuntu 14 TensorFlow AMI using version 0.8 of TensorFlow which has been configured to work equally well across CPU and GPU AWS instances.
Read More

tensorflow tutorial

Tutorial for model creation and training on NVIDIA DIGITS

Train neural network models with Caffe using sample MNIST or CIFAR datasets and learn how to use your own dataset to train models Before you start this tutorial, you should have launched a NVIDIA DIGITS instance on Nimbix, and have the url to the DIGITS UI on Nimbix. If you haven’t launched the NVIDIA DIGITS instance on Nimbix, go here to learn how to do that or watch this screencast.  We prepared the NVIDIA DIGITS application to have the three most common data sets used for learning with NVIDIA DIGITS and Caffe. These data sets are: MNIST, available in /db/mnist, with training and test data sets in /db/mnist/train and /db/mnist/test respectively cifar10, available at /db/cifar10, with training and test data sets in /db/cifar10/train and /db/cifar10/test respectively cifar100, available at /db/cifar100, with training and test data sets in /db/cifar100/train and /db/cifar100/test respectively DIGITS has auto-completion, so when you begin to type these paths, you will see these paths available to you. Creating a Model Select the blue images button under datasets, and choose “Classification.” This well guide you through setting up a data set of images that you can train using deep learning. Enter the settings in the screenshot below to configure your train and validation databases.   Training with Caffe Now you are ready to train your model! Go back to the main DIGITS page and select Images>Classification to create a new classification model. On this page, the defaults will work fine for a first time training a model. You simply need to select the data set you already configured, and select the GPUs in your instance at the bottom of the page. Kick it off and watch it run. Custom Data Sets If you have a custom data set, you can upload your custom data sets using Filezilla to drop.jarvice.com. See How do I transfer files to and from JARVICE? for instructions on using Filezilla. Many applications including NVIDIA DIGITS have an ssh server running. If you would like to upload or download your data while DIGITS is running using scp, rsync or another application, you can configure your SSH keys prior to launching your job. See How do I upload my SSH public key? What is it used for? for information configuring password-less SSH. You can upload data sets using these methods to your /data directory, referred to as drop.jarvice.com, or your “drop”, and then enter this path instead of /db/mnist. If you have not yet created your own custom data set, there are instructions on the DIGITS github page on how to structure your image database and prepare labels. Now you are ready to train your Deep Learning models with GPUs in the cloud! Want more? If you are interested in learning about enabling more advanced deep learning applications in the cloud like NVIDIA DIGITS, or customizing your own machine learning environment in the cloud, please don’t hesitate to live chat with us right on this page by clicking the blue/white bubble on the bottom right or emailing us at support@bitfusion.io. You can also learn more about NVIDIA DIGITS from here.  
Read More

tutorial

Running Boost Machine Images on Spot Instances

One of the nifty features of AWS is that one can utilize spot instances over on-demand instances in order to significantly reduce costs. To use spot instances, we need to create a spot instance request which includes a maximum price that we are willing to pay per hour, as well as a few other constraints such as the instance type and availability zone. You can find a detailed discussion of all the AWS spot instance parameters in the following AWS user guide.
Read More

tutorial machine images

Bitfusion Boost 16 GPU Caffe Cluster

We recently published our Boost AMIs to the AWS market and walked through potential cluster configurations. Today, we are going to expand on that and set up a Bitfusion Boost cluster on AWS. We will be explicitly setting this up for the Caffe Deep Learning Framework. At the end of this tutorial, you will have a cluster comprised of: One g2.8xlarge as a client where the application runs Three g2.8xlarges as servers This configuration will give your application a total of 16GPUs!   1. Subscribe to the Bitfusion AMIs This walkthrough leverages AWS’s CloudFormation (CFN) templates. Using our template will enable you to get a Bitfusion Boost cluster up and running in minutes. In order to utilize the CloudFormation template, you need to be signed into the AWS console and be an active subscriber to the AMIs used in the template.
Read More

tutorial

Search

New Call-to-action

Stay Up-to-Date!

Get our regular deep learning and AI news, insights, tutorials, and more.