900字范文 > 读书笔记-深度学习入门之pytorch-第五章（含循环实现手写数字识别）（LSTM GRU代码详解）

读书笔记-深度学习入门之pytorch-第五章（含循环实现手写数字识别）（LSTM GRU代码详解）

时间：2022-01-02 13:18:22

1、RNN优点：（记忆性）

2、循环神经网络结构与原理

3、LSTM（长短时记忆网络）

4、GRU

5、LSTM、RNN、GRU区别

6、收敛性问题

7、循环神经网络Pytorch实现

（1）RNN、LSTM、GRU

（2）LSTM+全连接实现手写数字识别

8、词嵌入（词向量）

9、NGram模型——单词预测

10、序列预测

（1）全连接方法

（2）循环神经网络方法

（3）LSTM方法

(4)GRU方法

1、RNN优点：（记忆性）

RNN对具有序列特性的数据非常有效，它能挖掘数据中的时序信息以及语义信息

2、循环神经网络结构与原理

每一时刻的隐藏层不仅由该时刻的输入层决定，还由上一时刻的隐藏层决定

深层网络结构：

双向循环神经网络：

网络先从序列正方向读取数据，再从反方向读取数据，最后两种输出结果一起形成网络的最终输出

循环神经网络能够很好的解决短时依赖问题，但对于长时依赖问题的效果不是很好

3、LSTM（长短时记忆网络）

4、GRU

5、LSTM、RNN、GRU区别

6、收敛性问题

RNN网络存在收敛性问题，

原因：RNN的误差曲面粗糙不平

解决方法：梯度裁剪

7、循环神经网络Pytorch实现

（1）RNN、LSTM、GRU

LSTM中间比标准RNN多了三个线性变换，多的三个线性变换的权重拼在一起，所以一共是4倍，同理偏置也是4倍。换句话说，LSTM里面做了4个类似标准RNN所做的运算，所以参数个数是标准RNN的4倍。

GRU:

GRU的隐藏状态数量为标准RNN的3倍；网络的隐藏状态不是ℎ0和𝑐0h0和c0，而是只有ℎ0h0；其余部分和LSTM相同；

from torch import nnbasic_rnn = nn.RNN(input_size=20, hidden_size=50, num_layers=2)# input_size:输入维度# hidden_size：输出维度# num_layers：网络层数# nonlinearity激活函数# bias是否使用偏置# batch_first输入数据的形式，默认是 False，就是这样形式，(seq(num_step), batch, input_dim)，也就是将序列长度放在第一位，batch 放在第二位# dropout是否应用dropout, 默认不使用，如若使用将其设置成一个0-1的数字即可# birdirectional是否使用双向的 rnn，默认是 Falselstm = nn.LSTM(input_size=20, hidden_size=50, num_layers=2)gru = nn.GRU(input_size=20, hidden_size=50, num_layers=2)

（2）LSTM+全连接实现手写数字识别

class Rnn(nn.Module):def __init__(self, in_dim=None, hidden_dim=None, n_layer=None):super(Rnn, self).__init__()self.lstm = nn.LSTM(in_dim, hidden_dim, n_layer, batch_first=True)self.classifier = nn.Linear(hidden_dim, 10)def forward(self, x):x = x.view(x.size(0), 1, -1) # 构建张量维度out, _ = self.lstm(x)out = out[:, -1, :]out = self.classifier(out)return out

准确率：97.42%（训练10次）

8、词嵌入（词向量）

词向量的每个维度表示词的某种属性，且词向量夹角越小，表示语义越接近

import torchfrom torch import nnfrom torch.autograd import Variableword_to_ix = {'hello': 0, 'world': 1}embeds = nn.Embedding(2, 5)hello_idx = torch.LongTensor([word_to_ix['hello']])hello_idx = Variable(hello_idx)hello_embed = embeds(hello_idx)print(hello_embed)

9、NGram模型——单词预测

import torchimport torch.nn as nnfrom torch.autograd import Variableimport torch.nn.functional as Ffrom torch import optimword_to_ix = {'hello': 0, 'world': 1}embeds = nn.Embedding(2, 5)hello_idx = torch.LongTensor([word_to_ix['hello']])hello_idx = Variable(hello_idx)hello_embed = embeds(hello_idx)print(hello_embed)CONTEXT_SIZE = 2EMBEDDING_DIM = 10test_sentence = """When forty winters shall besiege thy brow,And dig deep trenches in thy beauty's field,Thy youth's proud livery so gazed on now,Will be a totter'd weed of small worth held:Then being asked, where all thy beauty lies,Where all the treasure of thy lusty days;To say, within thine own deep sunken eyes,Were an all-eating shame, and thriftless praise.How much more praise deserv'd thy beauty's use,If thou couldst answer 'This fair child of mineShall sum my count, and make my old excuse,'Proving his beauty by succession thine!This were to be new made when thou art old,And see thy blood warm when thou feel'st it cold.""".split()trigram = [((test_sentence[i], test_sentence[i + 1]), test_sentence[i + 2])for i in range(len(test_sentence) - 2)]vocb = set(test_sentence)word_to_ix = {word: i for i, word in enumerate(vocb)}idx_to_word = {word_to_ix[word]: word for word in word_to_ix}class NgramModel(nn.Module):def __init__(self, vocb_size, context_size, n_dim):super().__init__()self.n_word = vocb_sizeself.embedding = nn.Embedding(self.n_word, n_dim)self.linear1 = nn.Linear(context_size * n_dim, 128)self.linear2 = nn.Linear(128, self.n_word)def forward(self, x):emb = self.embedding(x)emb = emb.view(1, -1)out = self.linear1(emb)out = F.relu(out)out = self.linear2(out)log_prob = F.log_softmax(out, 1)return log_probnet = NgramModel(len(vocb), CONTEXT_SIZE, EMBEDDING_DIM)criterion = nn.CrossEntropyLoss()optimizer = optim.SGD(net.parameters(), lr=1e-2, weight_decay=1e-5)epoches = 200for epoch in range(epoches):train_loss = 0for word, label in trigram:word = Variable(torch.LongTensor([word_to_ix[i] for i in word]))label = Variable(torch.LongTensor([word_to_ix[label]]))out = net(word)loss = criterion(out, label)train_loss += loss.item()optimizer.zero_grad()loss.backward()optimizer.step()if (epoch + 1) % 20 == 0:print('epoch: {}, Loss : {:.6f}'.format(epoch + 1, train_loss / len(trigram)))net = net.eval()word, label = trigram[20]print('input: {}'.format(word))print('input: {}'.format(label), end="\n\n")word = Variable(torch.LongTensor([word_to_ix[i] for i in word]))out = net(word)pred_label_idx = out.max(1)[1].data[0]print(pred_label_idx)predict_word = idx_to_word[int(pred_label_idx)]print('real word is "{}", predicted word is "{}"'.format(label, predict_word))

10、序列预测

（1）全连接方法

# 引入torch相关模块import torchfrom torch import nn, optimfrom torch.autograd import Variablefrom torch.nn import init# 引入初始化文件中的相关内容from seqInit import toTs, cudAvlfrom seqInit import input_sizefrom seqInit import train, real# 引入画图工具import numpy as npimport matplotlib.pyplot as plt# 定义FC模型class fcModel(nn.Module) :def __init__(self, in_dim, hidden_dim, out_dim) :super().__init__()ly, self.linear = 1, nn.Sequential()for hid in hidden_dim :layer = nn.Sequential(nn.Linear(in_dim, hid), nn.ReLU(True))self.linear.add_module('layer_{}'.format(ly), layer)ly, in_dim = ly + 1, hidself.linear.add_module('layer_{}'.format(ly), nn.Linear(in_dim, out_dim))# 使用kaiming_normal初始化模型参数self.weightInit(init.kaiming_normal)def forward(self, x) :x = self.linear(x)return xdef weightInit(self, func) :for name, param in self.named_parameters() :if 'weight' in name : func(param)# 输入为input_size，输出为1，隐藏层设定为3层，分别有[20, 10, 5]的维度hidden = [20, 10, 5]fc = cudAvl(fcModel(input_size, hidden, 1))# 定义损失函数和优化函数criterion = nn.MSELoss()optimizer = torch.optim.Adam(fc.parameters(), lr = 1e-2) # 制造数据集函数def create_dataset(dataset, look_back) :dataX, dataY = [], []for i in range(look_back, len(dataset)) :x = dataset[i - look_back: i]y = dataset[i]dataX.append(x)dataY.append(y)return np.array(dataX), np.array(dataY)# 制造训练集trainX, trainY = create_dataset(train, input_size)print(trainX.shape, trainY.shape)# 制造测试集testX, realY = create_dataset(real, input_size)print(testX.shape, realY.shape)# 处理输入fcx = trainX.reshape(-1, 3)fcx = torch.from_numpy(fcx)fcy = trainY.reshape(-1, 1)fcy = torch.from_numpy(fcy)print(fcx.shape, fcy.shape)%%time# 训练FC模型frq, sec = 100, 10loss_set = []for e in range(1, frq + 1) :inputs = cudAvl(Variable(fcx))target = cudAvl(Variable(fcy))# forwardoutputs = fc(inputs)loss = criterion(outputs, target)# reset gradientsoptimizer.zero_grad()loss.backward()optimizer.step()# print training infomationprint_loss = loss.item()current = e // secloss_set.append((e, print_loss))if e % sec == 0 :print_info = 'Epoch[{}/{}], Loss: {:.5f}'.format(current, frq // sec, print_loss)print(print_info)# 作出损失函数变化图像pltX = np.array([loss[0] for loss in loss_set])pltY = np.array([loss[1] for loss in loss_set])plt.title('loss function output curve')plt.plot(pltX, pltY)plt.show()# 测试px, ry = create_dataset(real, input_size)px = px.reshape(-1, 3)ry = ry.reshape(-1, 1)print(px.shape, ry.shape)px = torch.from_numpy(px)px = cudAvl(Variable(px))py = np.array(fc(px).data)# 画出实际结果和预测的结果plt.plot(py, 'r', label='prediction')plt.plot(ry, 'b', label='real')plt.legend(loc='best')

实际与预测接近

（2）循环神经网络方法

# 引入torch相关模块import torchfrom torch import nn, optimfrom torch.autograd import Variablefrom torch.nn import init# 引入初始化文件中的相关内容from seqInit import toTs, cudAvlfrom seqInit import input_sizefrom seqInit import train, real# 引入画图工具import numpy as npimport matplotlib.pyplot as plt# 定义RNN模型class rnnModel(nn.Module) :def __init__(self, in_dim, hidden_dim, out_dim, layer_num) :super().__init__()self.rnnLayer = nn.RNN(in_dim, hidden_dim, layer_num)self.fcLayer = nn.Linear(hidden_dim, out_dim)optim_range = np.sqrt(1.0 / hidden_dim)self.weightInit(optim_range)def forward(self, x) :out, _ = self.rnnLayer(x)out = out[12:]out = self.fcLayer(out)return outdef weightInit(self, gain=1):# 使用初始化模型参数for name, param in self.named_parameters() :if 'rnnLayer.weight' in name :init.orthogonal(param, gain)# 输入维度为1，输出维度为1，隐藏层维数为10, 定义rnn层数为2rnn = cudAvl(rnnModel(1, 10, 1, 2))# 确定损失函数和优化函数criterion = nn.MSELoss()optimizer = optim.Adam(rnn.parameters(), lr = 1e-2)# 处理输入def create_dataset(dataset) :data = dataset.reshape(-1, 1, 1)return torch.from_numpy(data)trainX = create_dataset(train[:-1])trainY = create_dataset(train[1:])[12:]print(trainX.shape, trainY.shape)# 训练RNN模型frq, sec = 2000, 200loss_set = []for e in range(1, frq + 1) :inputs = cudAvl(Variable(trainX))target = cudAvl(Variable(trainY))# forwardoutput = rnn(inputs)loss = criterion(output, target)# update gradientsoptimizer.zero_grad()loss.backward()optimizer.step()# print training informationprint_loss = loss.item()loss_set.append((e, print_loss))if e % sec == 0 :print('Epoch[{}/{}], loss = {:.5f}'.format(e, frq, print_loss))# 作损失函数图像pltX = np.array([loss[0] for loss in loss_set])pltY = np.array([loss[1] for loss in loss_set])plt.title('loss function output curve')plt.plot(pltX, pltY)plt.show()# 测试rnn = rnn.eval()px = real[:-1].reshape(-1, 1, 1)px = torch.from_numpy(px)ry = real[1:].reshape(-1)varX = cudAvl(Variable(px, volatile=True))py = rnn(varX).datapy = np.array(py).reshape(-1)print(px.shape, py.shape, ry.shape)# 画出实际结果和预测的结果plt.plot(py[-24:], 'r', label='prediction')plt.plot(ry[-24:], 'b', label='real')plt.legend(loc='best')

（3）LSTM方法

!jupyter nbconvert --to python seqInit.ipynbimport osos.environ['KMP_DUPLICATE_LIB_OK']='True'# 引入torch相关模块import torchfrom torch import nn, optimfrom torch.autograd import Variablefrom torch.nn import init# 引入初始化文件中的相关内容from seqInit import toTs, cudAvlfrom seqInit import input_sizefrom seqInit import train, real# 引入画图工具import numpy as npimport matplotlib.pyplot as plt# 定义LSTM模型class lstmModel(nn.Module) :def __init__(self, in_dim, hidden_dim, out_dim, layer_num) :super().__init__()self.lstmLayer = nn.LSTM(in_dim, hidden_dim, layer_num)self.relu = nn.ReLU()self.fcLayer = nn.Linear(hidden_dim, out_dim)self.weightInit(np.sqrt(1.0 / hidden_dim))def forward(self, x) :out, _ = self.lstmLayer(x)out = self.relu(out)out = out[12:]out = self.fcLayer(out)return out# 初始化权重def weightInit(self, gain) :for name, param in self.named_parameters():if 'lstmLayer.weight' in name :init.orthogonal(param)# 输入维度为1，输出维度为1，隐藏层维数为5, 定义LSTM层数为2lstm = cudAvl(lstmModel(1, 5, 1, 2))# 定义损失函数和优化函数criterion = nn.MSELoss()optimizer = optim.Adam(lstm.parameters(), lr = 1e-2)# 处理输入train = train.reshape(-1, 1, 1)x = torch.from_numpy(train[:-1])y = torch.from_numpy(train[1:])[12:]print(x.shape, y.shape)%%timefrq, sec = 3500, 350loss_set = []for e in range(1, frq + 1) :inputs = cudAvl(Variable(x))target = cudAvl(Variable(y))#forwardoutput = lstm(inputs)loss = criterion(output, target)# update paramtersoptimizer.zero_grad()loss.backward()optimizer.step()#print training informationprint_loss = loss.item()loss_set.append((e, print_loss))if e % sec == 0 :print('Epoch[{}/{}], Loss: {:.5f}'.format(e, frq, print_loss))# 作出损失函数变化图像pltX = np.array([loss[0] for loss in loss_set])pltY = np.array([loss[1] for loss in loss_set])plt.title('loss function output curve')plt.plot(pltX, pltY)plt.show()lstm = lstm.eval()# 预测结果并比较px = real[:-1].reshape(-1, 1, 1)px = torch.from_numpy(px)ry = real[1:].reshape(-1)varX = cudAvl(Variable(px, volatile=True))py = lstm(varX).datapy = np.array(py).reshape(-1)print(px.shape, py.shape, ry.shape)# 画出实际结果和预测的结果plt.plot(py[-24:], 'r', label='prediction')plt.plot(ry[-24:], 'b', label='real')plt.legend(loc='best')

(4)GRU方法

!jupyter nbconvert --to python seqInit.ipynbimport osos.environ['KMP_DUPLICATE_LIB_OK']='True'# 引入torch相关模块import torchfrom torch import nn, optimfrom torch.autograd import Variablefrom torch.nn import init# 引入初始化文件中的相关内容from seqInit import toTs, cudAvlfrom seqInit import input_sizefrom seqInit import train, real# 引入画图工具import numpy as npimport matplotlib.pyplot as plt# 定义GRU模型class gruModel(nn.Module) :def __init__(self, in_dim, hidden_dim, out_dim, hidden_layer) :super().__init__()self.gruLayer = nn.GRU(in_dim, hidden_dim, hidden_layer)self.fcLayer = nn.Linear(hidden_dim, out_dim)def forward(self, x) :out, _ = self.gruLayer(x)out = out[12:]out = self.fcLayer(out)return out# 输入维度为1，输出维度为1，隐藏层维数为5, 定义LSTM层数为2gru = cudAvl(gruModel(1, 5, 1, 2))# 定义损失函数和优化函数criterion = nn.MSELoss()optimizer = optim.Adam(gru.parameters(), lr = 1e-2)# 处理输入train = train.reshape(-1, 1, 1)x = torch.from_numpy(train[:-1])y = torch.from_numpy(train[1:])[12:]print(x.shape, y.shape)%%time# 训练模型frq, sec = 4000, 400loss_set = []for e in range(1, frq + 1) :inputs = cudAvl(Variable(x))target = cudAvl(Variable(y))#forwardoutput = gru(inputs)loss = criterion(output, target)# update paramtersoptimizer.zero_grad()loss.backward()optimizer.step()#print training informationprint_loss = loss.item()loss_set.append((e, print_loss))if e % sec == 0 :print('Epoch[{}/{}], Loss: {:.5f}'.format(e, frq, print_loss))# 作出损失函数变化图像pltX = np.array([loss[0] for loss in loss_set])pltY = np.array([loss[1] for loss in loss_set])plt.title('loss function output curve')plt.plot(pltX, pltY)plt.show()gru = gru.eval()# 预测结果并比较px = real[:-1].reshape(-1, 1, 1)px = torch.from_numpy(px)ry = real[1:].reshape(-1)varX = cudAvl(Variable(px, volatile=True))py = gru(varX).datapy = np.array(py).reshape(-1)print(px.shape, py.shape, ry.shape)# 画出实际结果和预测的结果plt.plot(py[-24:], 'r', label='prediction')plt.plot(ry[-24:], 'b', label='real')plt.legend(loc='best')

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。