In a series of blog posts, we want to show a step-by-step guide on how to get from a basic TensorFlow model to best-in-class architectures. Deep learning is a hot topic and it’s easy to find starting and advanced resources, but it can be difficult to see how to get from the intro material to more advanced models. We at Bitfusion would like to help address the lack of points along the journey to becoming a Deep Learning expert.
We want to state upfront that these posts should be used as an entry point into the usage of the framework, but we hope to continue to evolve the content posted in our blog to include more advanced topics in the future and move from abstracted TensorFlow layers language to the lower level base TensorFlow language to reap additional flexibility. We decided to use TensorFlow as our first framework because the community usage and support provides a large amount of supplemental material (it is also what we use the most internally). We will introduce and implement other deep learning frameworks in future posts.
Before we start, let's go over some of the details on following along with these series of posts.
The code for these blog posts are going to live here. The goal will be to create a folder every time we create a blog post whose name corresponds with the title of the blog post. The README will give the user an idea of the flow. For example, we might use the content of blog post two as the starting point for blog post ten. In this way we hope to not only show the code behind a post, but how we got from point A to point B. If you have any suggestions or want to see something specific, don’t hesitate to open up an issue in the repo.
Unless otherwise noted, we will assume that you are following along with the Bitfusion AMI found here. The Bitfusion AMI gives a lot of programs on top of TensorFlow to make it a great one-stop shop for doing Deep Learning using TensorFlow on AWS.
If you have never used Bitfusion’s AMI, follow our very detailed installation guide.
Why use these Posts?
We are creating these blog posts in an effort to document how one might go from A all the way to Z in deep learning using TensorFlow. If you’re a fan of our AMIs, this will also be a great way to stay on top of our product offerings that help make deep learning easier on both the individual and enterprise level. We will also try to stay on top of new TensorFlow features as they are released.
Deep Learning Programs
Most deep learning programs follow some very common architectures and execution patterns. This will aid with explaining the runtime environment. From a very simple perspective, a typical deep learning framework follows a process of reading data, training a model, and making predictions. A typical lifecycle might look something like Figure 1.
To go into a little more detail, the typical lifecycle of a deep learning model will:
- Read data in
- Train a model by:
- Defining a model structure
- Setting hyper parameters (learning rate, initial model weights and biases, etc.)
- Setting an optimizer
- Running the model a certain number of times (usually determined by number of epochs or number of steps)
- Periodically checking the model against some validation dataset
- Once you have a model candidate that performs well on the training data and validation data, try it against some held-out test dataset
- If the model meets specifications, deploy to production
The “Read Data” is supposed to represent some function or group of functions that read data into the model space. You will actually read data 3 times in different areas. Those already familiar with machine learning will know that a typical dataset can be split into training, validation, and testing sets. The training data will be used for training the model, the validation data for validating the model, and the test data for testing the model. This is a common strategy to assess model fitness.
This is a very simplistic view of the structure and glosses over a lot of the details and potential permutations. One of the benefits of TensorFlow is the ability to define a multitude of models in many different ways. This framework is just intended as a starting point.
Neural Networks and TensorFlow 1.0
What we want to show in this post is a simple neural network applied using some of the new functionality in TensorFlow 1.0. As of TensorFlow 1.0, TF layers has become core and there are plans in 1.1 to make TF Learn core. More details can be found here.
We could go on for days learning about neural networks -- that is out of scope for this post. However, I highly recommend you learn more about them. Some resources that I have found extremely helpful in the past as an easy introduction are:
Both links also go into more details that will have an implementation later in this series such as convolutional neural networks and recurrent neural networks.
Hello TensorFlow 1.0
As of TensorFlow version 1.0, defining a simple model is extremely straightforward. We are going to build a basic model using tf.layers and tf.contrib.learn. These additions to TensorFlow are great for creating robust models in very few lines of code. So let’s build our first neural network on the MNIST dataset.
MNIST Neural Net in TensorFlow 1.0
Let’s start by spinning up a Bitfusion Tensorflow 1.0 AMI found here. If you need more instruction on how to get the instance up and running, consult our installation guide. Once you have your instance up and running, it is time to get the code. First navigate to the /home/ubuntu/pynb/ directory. This is where the jupyter notebook for this server will be looking so we should make sure our code is in this directory. Then we clone the code from github.
git clone https://github.com/bitfusionio/blog-tutorial-code.git
Before we can run the neural network, we also need to give it some data. We are going to train on the famous MNIST handwritten digit dataset. To grab the data run:
We also want to override the warning outputs currently being spit out by TensorFlow for aesthetic purposes. If this causes any issues with the code in the future we will remove it and appropriately edit our code.
Implemented Features in This Post
Figure 2 represents what we will implement:
- Read data in
- Define model
- Train model
- Test model
It is up to you to change the number of hidden layers, learning rate, and batch size. These will constitute model iterations through hyperparameters.
In TensorFlow 1.0 and future releases, tf.contrib.learn has become more important as a front-end to make training deep learning models easier. In fact, tf.contrib.learn models and details can be found as part of the TensorFlow tutorials on their website.
Note: From this point forward we are assuming you are following along either in the jupyter notebook or in the .py files that are provided. The code in this section is associated with the dnn_model.py file and the next section will be associated with the custom_model.py file.
First let's read the data into our environment:
f = gzip.open('../data/mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
Next we define our Deep Neural Network (DNN) model:
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(train_set)
classifier = tf.contrib.learn.DNNClassifier(
Now we run the training for 10000 steps with a batch size of 100. For those unfamiliar with Stochastic Gradient Descent, I would recommend looking at this section of Karpathy’s class. Additionally, we are arbitrarily choosing 10,000 steps for illustration. A typical thing to do is train for a certain number of epochs where an epoch is a full pass through all the data. For this problem, there are 50,000 training examples, so we are effectively running the algorithm for (100 * 10000)/50000 = 20 epochs:
Finally we test the accuracy of the model against some held-out test set:
score = classifier.evaluate(x=test_set, y=test_set)
That’s it. A deep neural network in less than 20 lines of code.
Writing our Own Model with tf.layers
The next thing we want to illustrate is how to build your own model. While it might not be particularly useful for this example, being able to write your own model greatly increases the flexibility of architectures that you can use. Also, for the time being, a DNN is the “most” complex model there is in tf.contrib.learn. We want to build more complex models in future posts.
For simplicity we will show an implementation of the exact same DNN model we have, but use the layers library in Tensorflow. This code will also be an important building block for future posts.
Read the data:
f = gzip.open('../data/mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
Next we define our DNN model:
def fully_connected_model(features, labels):
labels = tf.one_hot(tf.cast(labels, tf.int32), 10, 1, 0)
layer1 = tf.layers.dense(features, 512, activation=tf.nn.relu, name='fc1')
layer2 = tf.layers.dense(layer1, 128, activation=tf.nn.relu, name='fc2')
logits = tf.layers.dense(layer2, 10, activation=None, name='out')
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss, tf.contrib.framework.get_global_step())
return tf.argmax(logits, 1), loss, train_op
classifier = learn.Estimator(model_fn=fully_connected_model)
Now we run the training for 10000 steps with a batch size of 100 (again 20 epochs as we calculated earlier):
Finally, we test the accuracy of the model against some held-out test set:
score = classifier.evaluate(x=test_set, y=test_set,
There is a little more to unpack in this code. First, the loss and optimization function needs to be explicitly stated as part of defining the model. You can see that the classifier only takes (x, y, batch_size, and steps) as part of the “fit” function call. We need to take care of the internals from that point on. Again, Karpathy’s class covers this well if you need more details. We also have to define the model as a function and put that into a learn.Estimator class. For more details on the Estimator class look at the TensorFlow documentation.
To give a little more clarification, the loss tells the algorithm how “far off” your algorithm is from the actual labels and the optimizer tells the algorithm how to update its weights to classify the labels correctly. These changes are mostly due to syntax differences between learn.Estimator and the DNNEstimator from the previous model. The last major change is having to explicitly state what metrics the classifier will evaluate. We add accuracy as a metric to compute (it natively just computes the step and loss).
Running the Model
To run the final model use the jupyter notebook found in the repository or simply run:
<code class="hljs dos"><span class="hljs-built_in">python custom_model.py</span></code>
You should see some output that is giving some information about the GPU and TensorFlow followed by something that looks like this:
loss = 2.31664, step = 1
loss = 1.54881, step = 101
loss = 0.94132, step = 201
loss = 0.636779, step = 301
loss = 0.598778, step = 401
loss = 0.516336, step = 501
After the model trains to completion, it should output a final accuracy that looks like this:
<code class="hljs dos"><span class="hljs-built_in">Accuracy: 0.957400</span></code>
Hopefully this post provided a simple introduction to defining your own deep neural network using TensorFlow. We will continue to build on these foundations and make more and more complex models with more robust features.
The next post in this series can be found here and it covers monitoring and checkpointing your models in TensorFlow.