Theano 实例：人工神经网络

神经网络的模型可以参考 UFLDL 的教程，这里不做过多描述。

http://ufldl.stanford.edu/wiki/index.php/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C

In [1]:

import theano
import theano.tensor as T

import numpy as np
from load import mnist

Using gpu device 1: Tesla K10.G2.8GB (CNMeM is disabled)

我们在这里使用一个简单的三层神经网络：输入 - 隐层 - 输出。

对于网络的激活函数，隐层用 sigmoid 函数，输出层用 softmax 函数，其模型如下：

$$ \begin{aligned} h & = \sigma (W_h X) \ o & = \text{softmax} (W_o h) \end{aligned} $$In [2]:

def model(X, w_h, w_o):
    """
 input:
 X: input data
 w_h: hidden unit weights
 w_o: output unit weights
 output:
 Y: probability of y given x
 """
    # 隐层
    h = T.nnet.sigmoid(T.dot(X, w_h))
    # 输出层
    pyx = T.nnet.softmax(T.dot(h, w_o))
    return pyx

使用随机梯度下降的方法进行训练：

In [3]:

def sgd(cost, params, lr=0.05):
    """
 input:
 cost: cost function
 params: parameters
 lr: learning rate
 output:
 update rules
 """
    grads = T.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        updates.append([p, p - g * lr])
    return updates

对于 MNIST 手写数字的问题，我们使用一个 784 × 625 × 10 即输入层大小为 784，隐层大小为 625，输出层大小为 10 的神经网络来模拟，最后的输出表示数字为 0 到 9 的概率。

为了对权重进行更新，我们需要将权重设为 shared 变量：

In [4]:

def floatX(X):
    return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):
    return theano.shared(floatX(np.random.randn(*shape) * 0.01))

因此变量初始化为：

In [5]:

X = T.matrix()
Y = T.matrix()

w_h = init_weights((784, 625))
w_o = init_weights((625, 10))

模型输出为：

In [6]:

py_x = model(X, w_h, w_o)

预测的结果为：

In [7]:

y_x = T.argmax(py_x, axis=1)

模型的误差函数为：

In [8]:

cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))

更新规则为：

In [9]:

updates = sgd(cost, [w_h, w_o])

定义训练和预测的函数：

In [10]:

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)
predict = theano.function(inputs=[X], outputs=y_x, allow_input_downcast=True)

训练：

导入 MNIST 数据：

In [11]:

trX, teX, trY, teY = mnist(onehot=True)

训练 100 轮，正确率为 0.956：

In [12]:

for i in range(100):
    for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
        cost = train(trX[start:end], trY[start:end])
    print "{0:03d}".format(i), np.mean(np.argmax(teY, axis=1) == predict(teX))

我们一直在努力

apachecn/AiLearning