Last active
July 24, 2017 06:40
-
-
Save newhouseb/f53b80efffb5dadce1b0c667a616e70a to your computer and use it in GitHub Desktop.
Basic Tensorflow
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# A Very Basic Primer on Tensorflow\n", | |
"\n", | |
"Tensorflow, like any library, is a collection of abstractions. These range from very high-level optimizers to low-level computations over CPUs and GPUs. At its core, machine learning is just boatloads of fancy math run in a specific sequence but because Tensorflow is heavily optimized, the core of what is actually going on is extremely hard to find. The abstractions are too opaque.\n", | |
"\n", | |
"This is a quick notebook that moves up the various levels of abstraction, which can help to clarify your mental model of Tensorflow's abstraction ladder and (in my case) remind you how to break out of the pre-defined higher-level abstractions provided by Tensorflow and start building your own. We'll go through four iterations of the same basic \"machine learning\" problem:\n", | |
"\n", | |
"1. Pure Python\n", | |
"2. Tensorflow with basic arithmetic\n", | |
"3. Tensorflow with automatic differentiation\n", | |
"4. Tensorflow with fully abstracted optimization\n", | |
"\n", | |
"We'll start by solving for $a$ and $b$ in $$y = ax + b$$ $$x=2$$ $$y=3$$\n", | |
"\n", | |
"We could of course solve this using straight-up algebra, but as you scale up in machine learning, it becomes intractable to solve things with a closed form solution, so instead we'll use [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent).\n", | |
"\n", | |
"In traditional training approaches we'll randomly initialize the weights we're solving for, but rather than do this every time, we'll do it up front, [XKCD](https://xkcd.com/221/) style:\n", | |
"\n", | |
"$$a=-1.55$$ \n", | |
"$$b=0.68$$\n", | |
"\n", | |
"Now, because we want $y$ to be as close to $ax + b$ as possible, what we actually want to ensure is that we minimize:\n", | |
"\n", | |
"$$cost = (ax + b - y)^2$$\n", | |
"\n", | |
"To do this with gradient descent, we need to compute the derivative of the cost function with respect to $a$ and $b$. In other words, if we increase $a$ by one (and hold everything else constant, how much does it change the cost? Same for $b$. Applying some basic calculus.\n", | |
"\n", | |
"$$\\frac{\\partial cost}{\\partial a} = 2x \\cdot (ax + b - y)$$ \n", | |
"$$\\frac{\\partial cost}{\\partial b} = 2 \\cdot (ax + b - y)$$\n", | |
"\n", | |
"Now, using the equations for gradient descent (and a learning rate of 0.01) to direct us towards an optimal solution we get the following update equations:\n", | |
"\n", | |
"$$a' = a - 0.01 \\cdot 2x \\cdot (ax + b - y)$$ \n", | |
"$$b' = b - 0.01 \\cdot 2 \\cdot (ax + b - y)$$\n", | |
"\n", | |
"Okay, now that we've derived this, let's implement gradient descent with basic Python." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Pure Python" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"step: 0 cost: 23.794884 a: -1.3332000000000002 b: 0.7884\n", | |
"step: 10 cost: 2.892902393090664 a: -0.062341372323120064 b: 1.4238293138384401\n", | |
"step: 20 cost: 0.35170939500901494 a: 0.3807796315628811 b: 1.6453898157814408\n", | |
"step: 30 cost: 0.04275965163326921 a: 0.535286371973398 b: 1.722643185986699\n", | |
"step: 40 cost: 0.0051985753970310924 a: 0.5891595412046727 b: 1.7495797706023362\n", | |
"step: 50 cost: 0.000632025405407898 a: 0.6079439538154768 b: 1.7589719769077383\n", | |
"step: 60 cost: 7.683953440574272e-05 a: 0.6144936735028069 b: 1.7622468367514035\n", | |
"step: 70 cost: 9.341893533342854e-06 a: 0.6167774195464775 b: 1.763388709773239\n", | |
"step: 80 cost: 1.1357561633248718e-06 a: 0.6175737125545689 b: 1.763786856277285\n", | |
"step: 90 cost: 1.3808143476743876e-07 a: 0.6178513627584928 b: 1.7639256813792468\n", | |
"y: 3 result: 2.9998560372180294\n" | |
] | |
} | |
], | |
"source": [ | |
"a = -1.55\n", | |
"b = 0.68\n", | |
"x = 2\n", | |
"y = 3\n", | |
"\n", | |
"def cost(a, b, x, y):\n", | |
" return (a * x + b - y)**2\n", | |
"\n", | |
"for i in range(100):\n", | |
" # Compute the new a and b\n", | |
" a_ = a - 0.01 * 2 * x * (a * x + b - y)\n", | |
" b_ = b - 0.01 * 2 * (a * x + b - y)\n", | |
" \n", | |
" # Swap them in at once so they don't interfere with one another\n", | |
" a = a_\n", | |
" b = b_\n", | |
" \n", | |
" # Print status every 10 iterations\n", | |
" if i % 10 == 0:\n", | |
" print(\"step:\", i, \"cost:\", cost(a, b, x, y), \"a:\", a, \"b:\", b)\n", | |
" \n", | |
"print(\"y:\", y, \"result:\", a*x + b)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"That's all well and good but there's no Tensorflow! Ah, yes, well let's do the rough equivalent in Tensorflow primitives. This seems a lot more verbose than the straight Python, but it comes with the benefit of being able to be compiled for the GPU (if you have a GPU and a complicated enough problem for this to be an improvement, it doesn't help here).\n", | |
"\n", | |
"## Tensorflow with basic arithmetic\n", | |
"First, a pre-amble that's shared across all tensorflow examples:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import tensorflow as tf\n", | |
"\n", | |
"# You would traditionally want to use tf.random_normal([1]) rather than a hardcoded constant.\n", | |
"a = tf.Variable(-1.55, name=\"a\")\n", | |
"b = tf.Variable(0.68, name=\"b\")\n", | |
"x = tf.placeholder(tf.float32, name=\"x\")\n", | |
"y = tf.placeholder(tf.float32, name=\"y\")\n", | |
"\n", | |
"# We'll feed this in as inputs and outputs when actually running\n", | |
"to_solve = {x: 2, y: 3}\n", | |
"\n", | |
"# This is only used to compute the end result, not in training\n", | |
"y_ = a * x + b\n", | |
"\n", | |
"# Cost using a tf convenience function\n", | |
"cost = tf.square(a * x + b - y)\n", | |
"\n", | |
"# Variables need to be initialized as part of a global operation at the start\n", | |
"init = tf.global_variables_initializer()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now on to the actual training using basic arithmetic:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"step 0 cost: 23.7949 a: -1.3332 b: 0.7884\n", | |
"step 10 cost: 2.8929 a: -0.0623414 b: 1.42383\n", | |
"step 20 cost: 0.351709 a: 0.38078 b: 1.64539\n", | |
"step 30 cost: 0.0427597 a: 0.535286 b: 1.72264\n", | |
"step 40 cost: 0.00519857 a: 0.58916 b: 1.74958\n", | |
"step 50 cost: 0.000632022 a: 0.607944 b: 1.75897\n", | |
"step 60 cost: 7.68375e-05 a: 0.614494 b: 1.76225\n", | |
"step 70 cost: 9.34235e-06 a: 0.616777 b: 1.76339\n", | |
"step 80 cost: 1.13578e-06 a: 0.617574 b: 1.76379\n", | |
"step 90 cost: 1.3798e-07 a: 0.617851 b: 1.76393\n", | |
"y: 3 result: 2.99986\n" | |
] | |
} | |
], | |
"source": [ | |
"# Gradient descent with manually derived gradients\n", | |
"a_ = a - 0.01 * 2 * x * (a * x + b - y)\n", | |
"b_ = b - 0.01 * 2 * (a * x + b - y)\n", | |
"\n", | |
"# Swap in the new values on each step\n", | |
"step = tf.group(a.assign(a_), b.assign(b_))\n", | |
"\n", | |
"# Kick off a session\n", | |
"with tf.Session() as sess:\n", | |
" # If there were any random values they would be initialized here\n", | |
" sess.run(init)\n", | |
" \n", | |
" # Run 100 steps\n", | |
" for i in range(100):\n", | |
" sess.run(step, feed_dict=to_solve)\n", | |
" \n", | |
" # Print status every 10 steps\n", | |
" if i % 10 == 0:\n", | |
" print(\"step\", i, \"cost:\", sess.run(cost, feed_dict=to_solve), \"a:\", sess.run(a), \"b:\", sess.run(b))\n", | |
" \n", | |
" # See how close we got\n", | |
" print(\"y:\", 3, \"result:\", sess.run(y_, feed_dict=to_solve))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ayyy, that pretty closely approximates the raw python version! But we can do a lot better than needing to derive our own gradients for everything. Let's use the auto-differentiation built into Tensorflow for this instead. As an aside, automatic differentiation is super clever and fun to implement if you're curious how it manages to programmatically compute what you spent a year of high school math learning.\n", | |
"\n", | |
"## Tensorflow with automatic differentiation\n", | |
"**Note:** This relies on the Python environment from the above cell" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"step 0 err: 23.7949 a: -1.3332 b: 0.7884\n", | |
"step 10 err: 2.8929 a: -0.0623414 b: 1.42383\n", | |
"step 20 err: 0.351709 a: 0.38078 b: 1.64539\n", | |
"step 30 err: 0.0427597 a: 0.535286 b: 1.72264\n", | |
"step 40 err: 0.00519857 a: 0.58916 b: 1.74958\n", | |
"step 50 err: 0.000632022 a: 0.607944 b: 1.75897\n", | |
"step 60 err: 7.68375e-05 a: 0.614494 b: 1.76225\n", | |
"step 70 err: 9.34235e-06 a: 0.616777 b: 1.76339\n", | |
"step 80 err: 1.13578e-06 a: 0.617574 b: 1.76379\n", | |
"step 90 err: 1.3798e-07 a: 0.617851 b: 1.76393\n", | |
"y: 3 result: 2.99986\n" | |
] | |
} | |
], | |
"source": [ | |
"# Tensorflow automatically computes the gradients!\n", | |
"derr_da, derr_db = tf.gradients(cost, [a, b], name=\"gradient\")\n", | |
"\n", | |
"# We still specify the update equations for gradient descent\n", | |
"a_ = a - derr_da * 0.01\n", | |
"b_ = b - derr_db * 0.01\n", | |
"\n", | |
"# Swap in the new values on each step\n", | |
"step = tf.group(a.assign(a_), b.assign(b_))\n", | |
"\n", | |
"# Run another session as before\n", | |
"with tf.Session() as sess:\n", | |
" sess.run(init)\n", | |
" for i in range(100):\n", | |
" sess.run(step, feed_dict=to_solve)\n", | |
" if i % 10 == 0:\n", | |
" print(\"step\", i, \"err:\", sess.run(cost, feed_dict=to_solve), \"a:\", sess.run(a), \"b:\", sess.run(b))\n", | |
" print(\"y:\", 3, \"result:\", sess.run(y_, feed_dict=to_solve))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Alas, even in the last example we need to know _how_ gradient descent can be used to optimize a given cost function. Unsurprisingly, Tensorflow also abstracts this away.\n", | |
"\n", | |
"## Tensorflow with fully abstracted optimization\n", | |
"**Note:** This (again) relies on the Python environment from the above cells" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"step 0 err: 23.7949 a: -1.3332 b: 0.7884\n", | |
"step 10 err: 2.8929 a: -0.0623414 b: 1.42383\n", | |
"step 20 err: 0.351709 a: 0.38078 b: 1.64539\n", | |
"step 30 err: 0.0427597 a: 0.535286 b: 1.72264\n", | |
"step 40 err: 0.00519857 a: 0.58916 b: 1.74958\n", | |
"step 50 err: 0.000632022 a: 0.607944 b: 1.75897\n", | |
"step 60 err: 7.68375e-05 a: 0.614494 b: 1.76225\n", | |
"step 70 err: 9.34235e-06 a: 0.616777 b: 1.76339\n", | |
"step 80 err: 1.13578e-06 a: 0.617574 b: 1.76379\n", | |
"step 90 err: 1.3798e-07 a: 0.617851 b: 1.76393\n", | |
"y: 3 result: 2.99986\n" | |
] | |
} | |
], | |
"source": [ | |
"# Initialize a magical optimizer with a learning rate of 0.01 as before\n", | |
"optimizer = tf.train.GradientDescentOptimizer(0.01)\n", | |
"train = optimizer.minimize(cost)\n", | |
"\n", | |
"# Run another session as before\n", | |
"with tf.Session() as sess:\n", | |
" sess.run(init)\n", | |
" for i in range(100):\n", | |
" sess.run(train, feed_dict=to_solve)\n", | |
" if i % 10 == 0:\n", | |
" print(\"step\", i, \"err:\", sess.run(cost, feed_dict=to_solve), \"a:\", sess.run(a), \"b:\", sess.run(b))\n", | |
" print(\"y:\", 3, \"result:\", sess.run(y_, feed_dict=to_solve))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As you can see, all four versions produce the exact same result! Hopefully this makes things a bit clearer." | |
] | |
} | |
], | |
"metadata": { | |
"anaconda-cloud": {}, | |
"kernelspec": { | |
"display_name": "Python [conda root]", | |
"language": "python", | |
"name": "conda-root-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment