miércoles, 9 de agosto de 2017

Solving a simple linear regression using TensorFlow

On 2014 I blogged about how Google is currently leading big data and machine learning world. Since then, they have been busy and one of their latest creations is TensorFlow.

Using SAP products has paid for my salary for the last 15 years, and when I learned that SAP is using Google's TensorFlow, I thought I should start learning more about it and see what the fuzz is all about.

In this blog post, I will be talking about:
  • Linear regression
  • Machine Learning
  • Programming languages
  • TensorFlow
You have been warned!

Let's start with a simple linear regression problem. We have the following table that contains the age and the weight of a certain animal.

Age
Weight
1
2,00
2
3,10
3
4,15
4
5,40
6
7,00

Using linear regression, I want to find an equation where I can input a new value of age, for example, 7 and it will predict the corresponding weight.

Using the explanation from this article, the equation for linear regression is: y = a + bx. And the equations for a and b are:



In our case, x is the age and y is the weight. Applying the above formulas, a = 1,09324324324324 and  b = 1,01148648648649.

Applying the linear regression formula on the original data, it predicts the following results:

Age (x)
Weight (y)
Predicted weight (y)
1
2,00
2,10472972972973
2
3,10
3,11621621621621
3
4,15
4,1277027027027
4
5,40
5,13918918918919
6
7,00
7,16216216216216
7

8,17364864864865

For the age of 7, the predicted weight will be 8,17.

Now let's do the same using TensorFlow.

The instructions to install and configure TensorFlow can be found here.

The code in TensorFlow looks like this:

import numpy as np
import tensorflow as tf

# Model parametersb = tf.Variable([.3], dtype=tf.float32)
a = tf.Variable([-.3], dtype=tf.float32)

# Model input and outputx = tf.placeholder(tf.float32)
linear_model = a + b * x
y = tf.placeholder(tf.float32)

# lossloss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares# optimizeroptimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training datax_train = [1,2,3,4,6]
y_train = [2,3.10,4.15,5.40,7]

# training loopinit = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
  sess.run(train, {x:x_train, y:y_train})

# evaluate training accuracycurr_W, curr_b, curr_loss = sess.run([a, b, loss], {x:x_train, y:y_train})
print("a: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

And the output is:
a: [ 1.09324002] b: [ 1.01148736] loss: 0.106047

The same values as in the manual calculation.

Now I will go through the main parts of the code.

In these lines, we pass the original data, in machine learning terminology, this is called "training the model":

x_train = [1,2,3,4,6]
y_train = [2,3.10,4.15,5.40,7]

In these lines we run the model:

sess = tf.Session()
sess.run(init) 
for i in range(1000):
  sess.run(train, {x:x_train, y:y_train})

If you need more details about getting started with TensorFlow try this link: https://www.tensorflow.org/get_started/get_started


My next blog post will be about loading data from a SAP BW source to TensorFlow. In the meantime go ahead and experiment with TensorFlow.


Cheers!