课程视频地址:https://study.163.com/courses-search?keyword=CS231

课程主页:http://cs231n.stanford.edu/2017/

参考资料:

https://github.com/Halfish/cs231n/tree/master/assignment2/cs231n

https://github.com/wjbKimberly/cs231n_spring_2017_assignment/blob/master/assignment2/TensorFlow.ipynb

我的代码地址:https://github.com/Doraemonzzz/CS231n

这一部分回顾作业2的重点。

准备工作

如果读取数据的时候报错,那么需要修改data_utils.py文件中如下函数:

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000,
                     subtract_mean=True):

找到

cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

将其修改为自己存放数据的位置即可。

1.全连接神经网络

为了后面叙述方便,这里做以下假设

Affine layer: foward

这部分很简单,只要知道输出为

即可,对应代码为

N = x.shape[0]
X = x.reshape(N, -1)
out = X.dot(w) + b
Affine layer: backward

上一次作业最后,我们推导了

其中$F$为反向传播的输入,这部分只要根据定义即可验证,记忆方式很简单,只要匹配矩阵的维度即可,有了上述公式,不难得到这部分对应的代码为

#转换形状
N = x.shape[0]
X = x.reshape(N, -1)
dx = dout.dot(w.T)
#转换为原来的形状
dx = np.reshape(dx, x.shape)
dw = X.T.dot(dout)
db = np.sum(dout, axis=0)
ReLU layer: forward

没什么好说的,直接根据定义:

out = np.copy(x)
out[out < 0] = 0
ReLU layer: backward

只要将输入小于$0$的位置的梯度取$0$即可:

dx = np.copy(dout)
dx[x < 0] = 0
Two-layer network

实现两层神经网络,这里的网络架构为affine - relu - affine - softmax。

第一步,初始化:

W1 = np.random.randn(input_dim, hidden_dim) * weight_scale
b1 = np.zeros(hidden_dim)
W2 = np.random.randn(hidden_dim, num_classes) * weight_scale
b2 = np.zeros(num_classes)
self.params["W1"] = W1
self.params["b1"] = b1
self.params["W2"] = W2
self.params["b2"] = b2

第二步,前向传播:

W1 = self.params["W1"]
b1 = self.params["b1"]
W2 = self.params["W2"]
b2 = self.params["b2"]

#中间层
z1, cache1 = affine_relu_forward(X, W1, b1)
#输出
scores, cache2 = affine_forward(z1, W2, b2)

第三步:反向传播:

#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
loss += self.reg * (np.sum(W2 ** 2) + np.sum(W1 ** 2)) / 2
#计算dW2,db2
dz1, dW2, db2 = affine_backward(dout, cache2)
#加上正则项
dW2 += self.reg * W2
#计算dW1,db1
dx, dW1, db1 = affine_relu_backward(dz1, cache1)
dW1 += self.reg * W1
#存入字典
grads["W1"] = dW1
grads["b1"] = db1
grads["W2"] = dW2
grads["b2"] = db2
Multilayer network

这部分是上一部分的推广。

第一步,初始化,注意这里要根据是输入层,输出层还是中间层来分情况讨论:

for i in range(self.num_layers):
	if i == 0:
		W = np.random.randn(input_dim, hidden_dims[i]) * weight_scale
		b = np.zeros(hidden_dims[i])
	elif i == self.num_layers - 1:
		W = np.random.randn(hidden_dims[i-1], num_classes) * weight_scale
		b = np.zeros(num_classes)
	else:
		W = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale
		b = np.zeros(hidden_dims[i])
        
	self.params["W"+str(i+1)] = W
	self.params["b"+str(i+1)] = b

第二步,前向传播,这里要分是否是输出层来讨论:

x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	b = self.params["b"+str(i+1)]
	if i < self.num_layers - 1:
		x, cache = affine_relu_forward(x, W, b)
	else:
		x, cache = affine_forward(x, W, b)
	#存入缓存
	Cache["cache"+str(i+1)] = cache
    
#输出
scores = x

第三步,反向传播,依旧要分是否是输出层来讨论

#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
for i in range(self.num_layers, 0, -1):
	cache = Cache["cache"+str(i)]
	W = self.params["W"+str(i)]
	if i == self.num_layers:
		dz, dW, db = affine_backward(dout, cache)
	else:
		dz, dW, db = affine_relu_backward(dz, cache)

	#加上正则项
	dW += self.reg * W
	#存入字典
	grads["W"+str(i)] = dW
	grads["b"+str(i)] = db

这部分内容只需要细心就可以完成,并不是很难。

SGD+Momentum

这里使用的更新公式如下:

v = config["momentum"] * v - config['learning_rate'] * dw
next_w = w + v
RMSProp
config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * dx * dx
next_x = x - config['learning_rate'] * dx / (np.sqrt(config['cache']) + config['epsilon'])
Adam
config['m'] = config['beta1'] * config['m'] + (1 - config['beta1']) * dx
config['v'] = config['beta2'] * config['v'] + (1 - config['beta2']) * dx * dx
first_unbias = config['m'] / (1 - config['beta1'] ** config['t'])
second_unbias = config['v'] / (1 - config['beta2'] ** config['t'])
next_x = x - config['learning_rate'] * first_unbias / (np.sqrt(second_unbias) + config['epsilon'])
config['t'] += 1

2.批量标准化

Batch normalization: Forward

计算公式如下:

分为两部分,首先是训练部分,对应代码如下:

#计算样本均值
sample_mean = np.mean(x, axis=0)
#计算样本方差
sample_var = np.var(x, axis=0)

#记录系数
k = np.sqrt(sample_var + eps)
x1 = (x - sample_mean) / k
out = gamma * x1 + beta

cache.append(k)
cache.append(sample_mean)
cache.append(x1)

running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var

最后一步是记录均值和方差的滑动平均值,这是为了给测试时使用,对应代码如下:

x1 = (x - running_mean) / np.sqrt(running_var + eps)
out = gamma * x1 + beta

备注,这里多记录了三个缓存的量,k对应的量为

x1对应的量为

out即为输出

这三部分在反向传播的时候都需要使用。

Batch Normalization: backward

这里我直接用求导的方法计算了,实际上完成了optional部分的作业,这里是作业最难的部分之一。

假设批量标准化后得到的矩阵为

注意$\beta ,\gamma$实际上为向量,即

$\beta_j ,\gamma_j$分别作用在第$j$个分量上,即

假设我们的函数为

反向传播传入的参数为

我们求$f$关于各个量的偏导数:

对应代码为

dbeta = np.sum(dout ,axis=0)

对应代码为

dgamma = np.sum(dout * x1, axis=0)

下面重点计算$\frac{\partial \hat x_k^{(s)}}{\partial x_k^{(i)}} ​$,首先回顾计算公式

所以我们有

有了准备工作,现在来计算$\frac{\partial \hat x_k^{(s)}}{\partial x_k^{(i)}} $:

所以

将上述内容分为三部分计算,首先是

将分子写为矩阵的形式:

利用numpy的广播机制,上述矩阵为

gamma * dout

注意k为

所以再次利用numpy的广播机制,第一项可以计算为

t1 = gamma * dout / k

接着计算第二项:

依旧利用numpy的广播机制,不难得到

m = x.shape[0]
t2 = - gamma / m * np.sum(dout, axis=0).reshape(1, -1) / k

最后是计算:

这一项比较复杂,我们先计算

首先是中心化矩阵:

t3 = x - sample_mean

其次不难看出$\frac{\partial f}{\partial y^{(s)}_k}(x_k^{(s)}-\mu_k)$为梯度矩阵和中心化矩阵对应元素相乘的结果,所以$\sum_{s=1}^ m\frac{\partial f}{\partial y^{(s)}_k}(x_k^{(s)}-\mu_k)$为该矩阵按行求和得到结果,所以代码为:

t4 = np.sum(dout * t3, axis=0).reshape(1, -1)

最后,利用numpy广播机制可以计算

对应代码为

t5 = - gamma / m * t3 / (k ** 3) * t4

最后将上述三项相加即可得到总梯度

dx= t1 + t2 + t5
Fully Connected Nets with Batch Normalization

这部分是修改Connected Nets的代码,因为网络结构为affine_batchnorm_relu,所以编写如下辅助函数:

def affine_batchnorm_relu_forward(x, W, b, gamma, beta, bn_params):
    #affline
    x, cache_affine = affine_forward(x, W, b)
    #batchnorm
    x, cache_batch = batchnorm_forward(x, gamma, beta, bn_params)
    #relu
    x, cache_relu = relu_forward(x)
    
    return x, (cache_affine, cache_batch, cache_relu)

def affine_batchnorm_relu_backward(dout, cache):
    cache_affine, cache_batch, cache_relu = cache
    #relu
    dx = relu_backward(dout, cache_relu)
    #batchnorm
    dx, dgamma, dbeta = batchnorm_backward(dx, cache_batch)
    #affline
    dx, dw, db = affine_backward(dx, cache_affine)
    
    return dx, dw, db, dgamma, dbeta

这部分只是将代码模块化,接着修改Connected Nets,首先是初始化部分:

for i in range(self.num_layers):
	if i == 0:
		W = np.random.randn(input_dim, hidden_dims[i]) * weight_scale
		b = np.zeros(hidden_dims[i])
	elif i == self.num_layers - 1:
		W = np.random.randn(hidden_dims[i-1], num_classes) * weight_scale
		b = np.zeros(num_classes)
	else:
		W = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale
		b = np.zeros(hidden_dims[i])
	
	if self.use_batchnorm and i != self.num_layers - 1:
		gamma = np.ones(hidden_dims[i])
		beta = np.zeros(hidden_dims[i])
		self.params["gamma"+str(i+1)] = gamma
		self.params["beta"+str(i+1)] = beta
	
	self.params["W"+str(i+1)] = W
	self.params["b"+str(i+1)] = b

接着是前向传播部分:

x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	b = self.params["b"+str(i+1)]
	if i < self.num_layers - 1:
		#batchnorm
		if self.use_batchnorm:
			gamma = self.params["gamma"+str(i+1)]
			beta = self.params["beta"+str(i+1)]
			x, cache = affine_batchnorm_relu_forward(x, W, b, gamma, beta, self.bn_params[i])
		else:
			x, cache = affine_relu_forward(x, W, b)
	else:
		x, cache = affine_forward(x, W, b)
	#存入缓存
	Cache["cache"+str(i+1)] = cache
	
#输出
scores = x

最后是反向传播部分:

#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
dz, dW, db, dgamma, dbeta = 0, 0, 0, 0, 0
for i in range(self.num_layers, 0, -1):
	cache = Cache["cache"+str(i)]
	W = self.params["W"+str(i)]
	if i == self.num_layers:
		dz, dW, db = affine_backward(dout, cache)
	else:
		if self.use_batchnorm:
			dz, dW, db, dgamma, dbeta = affine_batchnorm_relu_backward(dz, cache)
			grads["gamma"+str(i)] = dgamma
			grads["beta"+str(i)] = dbeta
		else:
			dz, dW, db = affine_relu_backward(dz, cache)

	#加上正则项
	dW += self.reg * W
	#存入字典
	grads["W"+str(i)] = dW
	grads["b"+str(i)] = db

3.随机失活(Dropout)

Dropout forward pass

具体公式可以参考笔记,这里直接给出代码,首先是前向传播,分为训练部分以及测试部分:

if mode == 'train':
	mask = (np.random.rand(x.shape[0], x.shape[1]) < p) / p
	out = x * mask
elif mode == 'test':
	out = x
Dropout backward pass

其次是反向传播,依旧分为训练部分和测试部分:

if mode == 'train':
	dx = dout * mask
elif mode == 'test':
	dx = dout
Fully-connected nets with Dropout

只需增加一个判断即可,前向传播:

x = X
#记录缓存
Cache = {}
Cache_dropout = {}
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	b = self.params["b"+str(i+1)]
	if i < self.num_layers - 1:
		#batchnorm
		if self.use_batchnorm:
			gamma = self.params["gamma"+str(i+1)]
			beta = self.params["beta"+str(i+1)]
			x, cache = affine_batchnorm_relu_forward(x, W, b, gamma, beta, self.bn_params[i])
		else:
			x, cache = affine_relu_forward(x, W, b)
			
		if self.use_dropout:
			x, cache_dropout = dropout_forward(x, self.dropout_param)
			Cache_dropout["cache"+str(i+1)] = cache_dropout
	else:
		x, cache = affine_forward(x, W, b)
	#存入缓存
	Cache["cache"+str(i+1)] = cache
	
#输出
scores = x

反向传播:

#损失以及dout
loss, dout = softmax_loss(scores, y)
#加上正则项
for i in range(self.num_layers):
	W = self.params["W"+str(i+1)]
	loss += self.reg * (np.sum(W ** 2)) / 2
#计算dWi
dz, dW, db, dgamma, dbeta = 0, 0, 0, 0, 0
for i in range(self.num_layers, 0, -1):
	cache = Cache["cache"+str(i)]
	W = self.params["W"+str(i)]
	if i == self.num_layers:
		dz, dW, db = affine_backward(dout, cache)
	else:
		if self.use_dropout:
			cache_dropout = Cache_dropout["cache"+str(i)]
			dz = dropout_backward(dz, cache_dropout)
		if self.use_batchnorm:
			dz, dW, db, dgamma, dbeta = affine_batchnorm_relu_backward(dz, cache)
			grads["gamma"+str(i)] = dgamma
			grads["beta"+str(i)] = dbeta
		else:
			dz, dW, db = affine_relu_backward(dz, cache)

	#加上正则项
	dW += self.reg * W
	#存入字典
	grads["W"+str(i)] = dW
	grads["b"+str(i)] = db

4.在CIFAR-10上运行卷积神经网络

为了方便讨论,这里定义如下变量:$x$是图像数据,维度为$(N, C, H, W)$;$w$为卷积核,维度为$(F, C, HH, WW)$;$b$为偏置项,维度为$(F, )$,其中$N$是图像的数量,$C$是channel数量(RGB图像中这一项为$3$),$H,W$是图像的长宽,$F$是卷积核的数量,$HH,WW$是卷积核的长宽。此外,定义stride为步长,pad为填充数量,那么根据公式,得到填充后的数据$x_$的维度为$(N, C, H_2, W_2)​$,其中

输出维度为$(N,F,H_1,W_1)​$,其中

记输出为out,考虑第out第$i,j $个元素

该元素由$x_1=x_[i]\in \mathbb R^{C\times H_2\times W_2}$和$w_1= w[j]\in \mathbb R^{C\times HH \times WW}$计算得到,其第$s,t$个元素的计算方法如下:

x1 = x_[i]
w1 = w[j]
res[s][t] = np.sum(x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * w1) + b[j]

(备注,利用numpy的广播机制,实际中可以最后加上$b[j]$)

为了方便反向传播的讨论,这里记

x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW]

为$x’\in \mathbb R^{C\times HH\times WW}​$,注意$w_1\in \mathbb R^{C\times HH \times WW}​$,$b[j]\in \mathbb R​$为对应偏置项,那么

那么

因此

记反向传播的输入为dout,其第$i,j$个元素为

假设最后作用在out上的函数为$f​$,那么我们有

所以

对应代码如下:

for s in range(H1):
	for t in range(W1):
		dw1 += x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * dout1[s][t]
		dx1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] += w1 * dout1[s][t]
		db1 += dout1[s][t]

剩余部分只要利用循环即可完成。

Convolution: Naive forward pass

首先是利用np.pad函数进行$0​$填充:

stride = conv_param["stride"]
pad = conv_param["pad"]
x_ = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), "constant")

然后计算输出维度:

#输入维度
N, C, H, W = x.shape
F, C, HH, WW = w.shape
#输出维度
H1 = 1 + (H + 2 * pad - HH) // stride
W1 = 1 + (W + 2 * pad - WW) // stride
out = np.zeros((N, F, H1, W1))

然后根据定义计算即可,这里用循环的方法:

for i in range(N):
	for j in range(F):
		x1 = x_[i]
		w1 = w[j]
		res = np.zeros((H1, W1))
		for s in range(H1):
			for t in range(W1):
				res[s][t] = np.sum(x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * w1)
		res += b[j]
		out[i][j] = res
Aside: Image processing via convolutions

这部分如果出现如下报错:

cannot import name imread

只需要安装Pillow即可:

pip install Pillow
Convolution: Naive backward pass

之前已经介绍了大部分内容,后续只要循环遍历即可,首先是初始化工作:

x, w, b, conv_param = cache
stride = conv_param["stride"]
pad = conv_param["pad"]
#填充
x_ = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), "constant")
#输入维度
N, C, H, W = x.shape
F, C, HH, WW = w.shape
H2, W2 = x_.shape[2:]
#输出维度
dx = np.zeros_like(x_)
dw = np.zeros_like(w)
db = np.zeros_like(b)

接着是循环遍历:

for i in range(N):
	for j in range(F):
		t1 = dout[i][j]
		#获得维度
		H1, W1 = t1.shape
		#初始化梯度
		dx1 = np.zeros((C, H2, W2))
		dw1 = np.zeros((C, HH, WW))
		db1 = 0
		#当前维度的x, w
		x1 = x_[i]
		w1 = w[j]
		#当前维度的dout
		dout1 = dout[i][j]
		for s in range(H1):
			for t in range(W1):
				dw1 += x1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] * dout1[s][t]
				dx1[:, s*stride: s*stride+HH, t*stride: t*stride+WW] += w1 * dout1[s][t]
				db1 += dout1[s][t]
		db[j] += db1
		dx[i] += dx1
		dw[j] += dw1

$i,j$部分的循环只是对$dx, dw$的第一个维度进行遍历,注意我们最后计算的实际上是填充后的梯度,所以输出应该为

dx = dx[:, :, pad: pad+H, pad: pad+W]
Max pooling: Naive forward

这部分和前项传播类似,只是将之前的卷积操作换成取最大值:

pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
#输入维度
N, C, H, W = x.shape
#输出维度
H1 = 1 + (H - pool_height) // stride
W1 = 1 + (W - pool_width) // stride

out = np.zeros((N, C, H1, W1))
for i in range(N):
    for j in range(C):
        x1 = x[i][j]
        res = np.zeros((H1, W1))
        for s in range(H1):
            for t in range(W1):
                res[s][t] = np.max(x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width])
        out[i][j] = res
Max pooling: Naive backward

由于最大池化的特性,只要将最大元素所在位置对应的dout累加即可,代码和反向传播类似:

x, pool_param = cache
pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
#输入维度
N, C, H, W = x.shape
#输出维度
H1 = 1 + (H - pool_height) // stride
W1 = 1 + (W - pool_width) // stride

dx = np.zeros_like(x)
for i in range(N):
	for j in range(C):
		#当前维度的dout
		dout1 = dout[i][j]
		x1 = x[i][j]
		dx1 = np.zeros((H, W))
		for s in range(H1):
			for t in range(W1):
				#拉直
				temp = x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width].flatten()
				#找到最大元素对应的索引
				index = np.argmax(temp)
				#还原矩阵中的位置
				m, n = index // pool_width, index % pool_width
				dx1[s*stride + m][t*stride + n] += dout1[s][t]
		dx[i][j] = dx1

这里我没找到计算矩阵最大元素对应的行列的方法,只能手工计算:

#拉直
temp = x1[s*stride: s*stride+pool_height, t*stride: t*stride+pool_width].flatten()
#找到最大元素对应的索引
index = np.argmax(temp)
#还原矩阵中的位置
m, n = index // pool_width, index % pool_width

然后累加对应位置的dout

dx1[s*stride + m][t*stride + n] += dout1[s][t]
Fast layers

这部分使用Cython,我一开始产生如下报错

error: Unable to find vcvarsall.bat

最后是参考这篇博客解决的,实际上只要下载一个安装包即可(传送门)。

Three-layer ConvNet

这部分感觉题目没有讲清楚,也有可能是我理解的问题,网络架构为:

conv - relu - 2x2 max pool - affine - relu - affine - softmax

权重是用于relu层以及affine层,一开始对于维度不清楚,后来发现有如下代码:

# pass conv_param to the forward pass for the convolutional layer
filter_size = W1.shape[2]
conv_param = {'stride': 1, 'pad': (filter_size - 1) // 2}

# pass pool_param to the forward pass for the max-pooling layer
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

上述代码说明经过卷积之后数据的维度和输入维度相同,所以初始化步骤如下:

C, H, W = input_dim
F, HH, WW = num_filters, filter_size, filter_size
W1 = np.random.randn(F, C, HH, WW) * weight_scale
b1 = np.zeros(F)
#根据后面算法推断,第一层卷积之后图像数据最后两个维度不变,总数据维度为
n = F * H * W
#W2是在2x2 max pool后使用的权重,所以第一个维度为n // 4
W2 = np.random.randn(n // 4, hidden_dim) * weight_scale
b2 = np.zeros(hidden_dim)
W3 = np.random.randn(hidden_dim, num_classes) * weight_scale
b3 = np.zeros(num_classes)
self.params["W1"] = W1
self.params["b1"] = b1
self.params["W2"] = W2
self.params["b2"] = b2
self.params["W3"] = W3
self.params["b3"] = b3

前向传播:

X1, cache1 = conv_forward_fast(X, W1, b1, conv_param)
X2, cache2 = relu_forward(X1)
X3, cache3 = max_pool_forward_fast(X2, pool_param)
X4, cache4 = affine_forward(X3, W2, b2)
X5, cache5 = relu_forward(X4)
X6, cache6 = affine_forward(X5, W3, b3)
scores = X6

反向传播:

loss, dz6 = softmax_loss(scores, y)
loss += self.reg * (np.sum(W1 ** 2) + np.sum(W2 ** 2) + np.sum(W2 ** 3)) / 2

dz5, dW3, db3 = affine_backward(dz6, cache6)
dz4 = relu_backward(dz5, cache5)
dz3, dW2, db2 = affine_backward(dz4, cache4)
dz2 = max_pool_backward_fast(dz3, cache3)
dz1 = relu_backward(dz2, cache2)
dz, dW1, db1 = conv_backward_fast(dz1, cache1)

grads["W3"] = dW3
grads["b3"] = db3
grads["W2"] = dW2
grads["b2"] = db2
grads["W1"] = dW1
grads["b1"] = db1
Spatial batch normalization: forward

这部分是对每个Channel上使用batch normalization,所以代码为:

N, C, H, W = x.shape
out = np.zeros_like(x)
cache = []

x1 = np.copy(x)
x1 = x.reshape(-1, C)
out, cache = batchnorm_forward(x1, gamma, beta, bn_param)
out = out.reshape(x.shape)
Spatial batch normalization: backward

反向传播也同理:

N, C, H, W = dout.shape
dout1 = np.copy(dout)
dout1 = dout1.reshape(-1, C)
dx, dgamma, dbeta = batchnorm_backward(dout1, cache)
dx = dx.reshape(dout.shape)

5.TensorFlow

如果读取数据的时候报错,则修改如下路径为自己存放数据的路径即可:

cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
TensorFlow Details

这部分的难点是5408这个数据是怎么来的,因为使用的参数为’VALID’,官方文档给出的计算公式为:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

这里根据公式计算得到的结果是

向下取整为

所以输出维度为

参考资料:传送门

Training a specific model

因为我对Tensorflow不是很熟,所以这部分主要参考了别人的作业,代码如下:

# define model
def complex_model(X,y,is_training):
    #conv1
    Wconv1 = tf.get_variable("Wconv1", shape=[7, 7, 3, 32])
    bconv1 = tf.get_variable("bconv1", shape=[32])
    #Affine layer
    W1 = tf.get_variable("W1", shape=[5408, 1024])
    b1 = tf.get_variable("b1", shape=[1024])
    #Affine layer
    W2 = tf.get_variable("W2", shape=[1024, 10])
    b2 = tf.get_variable("b2", shape=[10])

    #conv
    a1 = tf.nn.conv2d(X, Wconv1, strides=[1,1,1,1], padding='VALID') + bconv1
    #relu
    h1 = tf.nn.relu(a1)
    #Spatial Batch Normalization Layer
    axis = [0, 1, 2]
    mean, variance = tf.nn.moments(h1, axis)
    offset = tf.Variable(tf.zeros([32]))
    scala = tf.Variable(tf.ones([32]))
    bn1 = tf.nn.batch_normalization(h1, mean, variance, offset, scala, 0.001)
    #Max Pooling
    p1 = tf.nn.max_pool(bn1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
    #Affine layers
    p1_flat = tf.reshape(p1, [-1, 5408])
    a2 = tf.matmul(p1_flat, W1) + b1
    #relu
    h2 = tf.nn.relu(a2)
    h2_flat = tf.reshape(h2, [-1, 1024])
    #Affine layer 
    y_out = tf.matmul(h2_flat, W2) + b2

    return y_out

Spatial Batch Normalization Layer相对复杂一些,别的部分照葫芦画瓢就可以了。

总结

这次作业真的非常难,需要反复体会,后续应该会把卷积部分的代码优化一下。