相关视频:
PyTorch 动态神经网络 (莫烦 Python 教学)

笔记:PyTorch笔记 入门:写一个简单的神经网络3:CNN(以MNIST数据集为例)记录了如何编写一个简单的CNN神经网络,现在记录如何进一步使用GPU加快神经网络的训练。

一、将神经网络移到GPU上

# 将神经网络移到GPU上
cnn.cuda()

二、将测试数据移到GPU上

# 将测试数据移到GPU上
test_x = test_x.cuda()
test_y = test_y.cuda()

三、(训练过程中)将训练数据、预测结果移到GPU上

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        # 将训练数据移到GPU上
        batch_x = batch_x.cuda()
        batch_y = batch_y.cuda()
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            # 将预测结果移到GPU上
            predict_y = torch.max(test_output, 1)[1].cuda().data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)     

四、(在预测过程中)将数据移回CPU上

# 预测
test_output = cnn(test_x[:100])
# 为了将CUDA tensor转化为numpy,需要将数据移回CPU上
# 否则会报错:TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
predict_y = torch.max(test_output, 1)[1].cpu().data.numpy().squeeze()
real_y = test_y[:100].cpu().numpy()
print(predict_y)
print(real_y)

五、对比

使用CPU训练:

Epoch 0 | Step 0 | Loss 2.3023581504821777 | Test Accuracy 0.2795
Epoch 0 | Step 50 | Loss 0.36932313442230225 | Test Accuracy 0.839
Epoch 0 | Step 100 | Loss 0.17208492755889893 | Test Accuracy 0.9025
Epoch 0 | Step 150 | Loss 0.2834635376930237 | Test Accuracy 0.9025
Epoch 0 | Step 200 | Loss 0.10628349334001541 | Test Accuracy 0.9365
Epoch 0 | Step 250 | Loss 0.07513977587223053 | Test Accuracy 0.949
Epoch 0 | Step 300 | Loss 0.15143314003944397 | Test Accuracy 0.952
Epoch 0 | Step 350 | Loss 0.19321243464946747 | Test Accuracy 0.958
Epoch 0 | Step 400 | Loss 0.08455082774162292 | Test Accuracy 0.963
Epoch 0 | Step 450 | Loss 0.08475902676582336 | Test Accuracy 0.9635
Epoch 0 | Step 500 | Loss 0.14322614669799805 | Test Accuracy 0.966
Epoch 0 | Step 550 | Loss 0.22640569508075714 | Test Accuracy 0.966
Epoch 1 | Step 0 | Loss 0.04606473818421364 | Test Accuracy 0.969
Epoch 1 | Step 50 | Loss 0.35338521003723145 | Test Accuracy 0.9715
Epoch 1 | Step 100 | Loss 0.039717815816402435 | Test Accuracy 0.972
Epoch 1 | Step 150 | Loss 0.10654418915510178 | Test Accuracy 0.9695
Epoch 1 | Step 200 | Loss 0.032110925763845444 | Test Accuracy 0.9745
Epoch 1 | Step 250 | Loss 0.012637133710086346 | Test Accuracy 0.971
Epoch 1 | Step 300 | Loss 0.0625436082482338 | Test Accuracy 0.9735
Epoch 1 | Step 350 | Loss 0.032693102955818176 | Test Accuracy 0.975
Epoch 1 | Step 400 | Loss 0.05973822623491287 | Test Accuracy 0.976
Epoch 1 | Step 450 | Loss 0.22700577974319458 | Test Accuracy 0.9805
Epoch 1 | Step 500 | Loss 0.03670699521899223 | Test Accuracy 0.9725
Epoch 1 | Step 550 | Loss 0.14919476211071014 | Test Accuracy 0.9785
Time cost: 164.68248105049133 s

使用GPU训练:

Epoch 0 | Step 0 | Loss 2.295382499694824 | Test Accuracy 0.1795
Epoch 0 | Step 50 | Loss 0.4366167187690735 | Test Accuracy 0.851
Epoch 0 | Step 100 | Loss 0.1392095685005188 | Test Accuracy 0.915
Epoch 0 | Step 150 | Loss 0.374984472990036 | Test Accuracy 0.925
Epoch 0 | Step 200 | Loss 0.11992576718330383 | Test Accuracy 0.9435
Epoch 0 | Step 250 | Loss 0.09971962124109268 | Test Accuracy 0.955
Epoch 0 | Step 300 | Loss 0.15602746605873108 | Test Accuracy 0.9635
Epoch 0 | Step 350 | Loss 0.10646170377731323 | Test Accuracy 0.963
Epoch 0 | Step 400 | Loss 0.10151582956314087 | Test Accuracy 0.9675
Epoch 0 | Step 450 | Loss 0.050429973751306534 | Test Accuracy 0.97
Epoch 0 | Step 500 | Loss 0.07986892014741898 | Test Accuracy 0.966
Epoch 0 | Step 550 | Loss 0.11002516746520996 | Test Accuracy 0.9665
Epoch 1 | Step 0 | Loss 0.07174035906791687 | Test Accuracy 0.9745
Epoch 1 | Step 50 | Loss 0.1582135409116745 | Test Accuracy 0.9685
Epoch 1 | Step 100 | Loss 0.09163351356983185 | Test Accuracy 0.9805
Epoch 1 | Step 150 | Loss 0.13820190727710724 | Test Accuracy 0.9775
Epoch 1 | Step 200 | Loss 0.0733216404914856 | Test Accuracy 0.978
Epoch 1 | Step 250 | Loss 0.01615101844072342 | Test Accuracy 0.9785
Epoch 1 | Step 300 | Loss 0.0749548077583313 | Test Accuracy 0.978
Epoch 1 | Step 350 | Loss 0.05822641775012016 | Test Accuracy 0.977
Epoch 1 | Step 400 | Loss 0.033135443925857544 | Test Accuracy 0.98
Epoch 1 | Step 450 | Loss 0.07146552950143814 | Test Accuracy 0.9835
Epoch 1 | Step 500 | Loss 0.13729988038539886 | Test Accuracy 0.9795
Epoch 1 | Step 550 | Loss 0.07742690294981003 | Test Accuracy 0.98
Time cost: 7.764967918395996 s

可以发现,使用CPU训练用时为164.7s,准确率为0.9785;
而使用GPU训练用时为7.8s,准确率为0.98。

六、完整代码

使用CPU训练:

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torchsummary import summary
import time
    
# 创建神经网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.output_layer = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        output = self.output_layer(x)
        return output

# 超参数
EPOCH = 2
BATCH_SIZE = 100
LR = 0.001
DOWNLOAD = False # 若已经下载mnist数据集则设为False

# 下载mnist数据
train_data = datasets.MNIST(
    root='./data', # 保存路径
    train=True, # True表示训练集,False表示测试集
    transform=transforms.ToTensor(), # 将0~255压缩为0~1
    download=DOWNLOAD
)

# 旧的写法
print(train_data.train_data.size())
print(train_data.train_labels.size())

# 新的写法
print(train_data.data.size())
print(train_data.targets.size())

# 打印部分数据集的图片
for i in range(2):
    print(train_data.targets[i].item())
    plt.imshow(train_data.data[i].numpy(), cmap='gray')
    plt.show()
    
# DataLoader
train_loader = Data.DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

# 如果train_data下载好后,test_data也就下载好了
test_data = datasets.MNIST(
    root='./data',
    train=False
)

print(test_data.data.size())
print(test_data.targets.size())

# 新建网络
cnn = CNN()
print(cnn)

# 查看网络的结构
model = CNN()
if torch.cuda.is_available():
    model.cuda()
summary(model, input_size=(1,28,28))

# 优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)

# 损失函数
loss_func = nn.CrossEntropyLoss()

# 为了节约时间,只使用测试集的前2000个数据
test_x = Variable(
    torch.unsqueeze(test_data.data, dim=1),
    volatile=True
).type(torch.FloatTensor)[:2000]/255 # 将将0~255压缩为0~1

test_y = test_data.targets[:2000]

# # 使用所有的测试集
# test_x = Variable(
#     torch.unsqueeze(test_data.test_data, dim=1),
#     volatile=True
# ).type(torch.FloatTensor)/255 # 将将0~255压缩为0~1

# test_y = test_data.test_labels

# 开始计时
start = time.time()

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            predict_y = torch.max(test_output, 1)[1].data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)
            
# 结束计时
end = time.time()

# 训练耗时
print('Time cost:', end - start, 's')

# 预测
test_output = cnn(test_x[:100])
predict_y = torch.max(test_output, 1)[1].data.numpy().squeeze()
real_y = test_y[:100].numpy()
print(predict_y)
print(real_y)

# 打印预测和实际结果
for i in range(10):
    print('Predict', predict_y[i])
    print('Real', real_y[i])
    plt.imshow(test_data.data[i].numpy(), cmap='gray')
    plt.show()

使用GPU训练:

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torchsummary import summary
import time

    
# 创建神经网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.output_layer = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        output = self.output_layer(x)
        return output

# 超参数
EPOCH = 2
BATCH_SIZE = 100
LR = 0.001
DOWNLOAD = False # 若已经下载mnist数据集则设为False

# 下载mnist数据
train_data = datasets.MNIST(
    root='./data', # 保存路径
    train=True, # True表示训练集,False表示测试集
    transform=transforms.ToTensor(), # 将0~255压缩为0~1
    download=DOWNLOAD
)

# 旧的写法
print(train_data.train_data.size())
print(train_data.train_labels.size())

# 新的写法
print(train_data.data.size())
print(train_data.targets.size())

# 打印部分数据集的图片
for i in range(2):
    print(train_data.targets[i].item())
    plt.imshow(train_data.data[i].numpy(), cmap='gray')
    plt.show()
    
# DataLoader
train_loader = Data.DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

# 如果train_data下载好后,test_data也就下载好了
test_data = datasets.MNIST(
    root='./data',
    train=False
)

print(test_data.data.size())
print(test_data.targets.size())

# 新建网络
cnn = CNN()
# 将神经网络移到GPU上
cnn.cuda()
print(cnn)

# 查看网络的结构
model = CNN()
if torch.cuda.is_available():
    model.cuda()
summary(model, input_size=(1,28,28))

# 优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)

# 损失函数
loss_func = nn.CrossEntropyLoss()

# 为了节约时间,只使用测试集的前2000个数据
test_x = Variable(
    torch.unsqueeze(test_data.data, dim=1),
    volatile=True
).type(torch.FloatTensor)[:2000]/255 # 将将0~255压缩为0~1

test_y = test_data.targets[:2000]

# # 使用所有的测试集
# test_x = Variable(
#     torch.unsqueeze(test_data.test_data, dim=1),
#     volatile=True
# ).type(torch.FloatTensor)/255 # 将将0~255压缩为0~1

# test_y = test_data.test_labels

# 将测试数据移到GPU上
test_x = test_x.cuda()
test_y = test_y.cuda()

# 开始计时
start = time.time()

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        # 将训练数据移到GPU上
        batch_x = batch_x.cuda()
        batch_y = batch_y.cuda()
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            # 将预测结果移到GPU上
            predict_y = torch.max(test_output, 1)[1].cuda().data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)

# 结束计时
end = time.time()

# 训练耗时
print('Time cost:', end - start, 's')

# 预测
test_output = cnn(test_x[:100])
# 为了将CUDA tensor转化为numpy,需要将数据移回CPU上
# 否则会报错:TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
predict_y = torch.max(test_output, 1)[1].cpu().data.numpy().squeeze()
real_y = test_y[:100].cpu().numpy()
print(predict_y)
print(real_y)

# 打印预测和实际结果
for i in range(10):
    print('Predict', predict_y[i])
    print('Real', real_y[i])
    plt.imshow(test_data.data[i].numpy(), cmap='gray')
    plt.show()
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐