PyTorch笔记入门：使用GPU加速（以CNN+MNIST数据集为例）

相关视频：PyTorch 动态神经网络 (莫烦 Python 教学)目录一、将神经网络移到GPU上二、将测试数据移到GPU上三、（训练过程中）将训练数据、预测结果移到GPU上四、（在预测过程中）将数据移回CPU上五、对比六、完整代码笔记：PyTorch笔记入门：写一个简单的神经网络3：CNN（以MNIST数据集为例）记录了如何编写一个简单的CNN神经网络，现在记录如何进一步使用GPU加快神经网络

炼丹的蜗牛@/"

9366人浏览 · 2020-11-18 16:46:01

炼丹的蜗牛@/" · 2020-11-18 16:46:01 发布

相关视频：
PyTorch 动态神经网络 (莫烦 Python 教学)

一、将神经网络移到GPU上

# 将神经网络移到GPU上
cnn.cuda()

二、将测试数据移到GPU上

# 将测试数据移到GPU上
test_x = test_x.cuda()
test_y = test_y.cuda()

三、（训练过程中）将训练数据、预测结果移到GPU上

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        # 将训练数据移到GPU上
        batch_x = batch_x.cuda()
        batch_y = batch_y.cuda()
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            # 将预测结果移到GPU上
            predict_y = torch.max(test_output, 1)[1].cuda().data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)

四、（在预测过程中）将数据移回CPU上

# 预测
test_output = cnn(test_x[:100])
# 为了将CUDA tensor转化为numpy，需要将数据移回CPU上
# 否则会报错：TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
predict_y = torch.max(test_output, 1)[1].cpu().data.numpy().squeeze()
real_y = test_y[:100].cpu().numpy()
print(predict_y)
print(real_y)

五、对比

使用CPU训练：

Epoch 0 | Step 0 | Loss 2.3023581504821777 | Test Accuracy 0.2795
Epoch 0 | Step 50 | Loss 0.36932313442230225 | Test Accuracy 0.839
Epoch 0 | Step 100 | Loss 0.17208492755889893 | Test Accuracy 0.9025
Epoch 0 | Step 150 | Loss 0.2834635376930237 | Test Accuracy 0.9025
Epoch 0 | Step 200 | Loss 0.10628349334001541 | Test Accuracy 0.9365
Epoch 0 | Step 250 | Loss 0.07513977587223053 | Test Accuracy 0.949
Epoch 0 | Step 300 | Loss 0.15143314003944397 | Test Accuracy 0.952
Epoch 0 | Step 350 | Loss 0.19321243464946747 | Test Accuracy 0.958
Epoch 0 | Step 400 | Loss 0.08455082774162292 | Test Accuracy 0.963
Epoch 0 | Step 450 | Loss 0.08475902676582336 | Test Accuracy 0.9635
Epoch 0 | Step 500 | Loss 0.14322614669799805 | Test Accuracy 0.966
Epoch 0 | Step 550 | Loss 0.22640569508075714 | Test Accuracy 0.966
Epoch 1 | Step 0 | Loss 0.04606473818421364 | Test Accuracy 0.969
Epoch 1 | Step 50 | Loss 0.35338521003723145 | Test Accuracy 0.9715
Epoch 1 | Step 100 | Loss 0.039717815816402435 | Test Accuracy 0.972
Epoch 1 | Step 150 | Loss 0.10654418915510178 | Test Accuracy 0.9695
Epoch 1 | Step 200 | Loss 0.032110925763845444 | Test Accuracy 0.9745
Epoch 1 | Step 250 | Loss 0.012637133710086346 | Test Accuracy 0.971
Epoch 1 | Step 300 | Loss 0.0625436082482338 | Test Accuracy 0.9735
Epoch 1 | Step 350 | Loss 0.032693102955818176 | Test Accuracy 0.975
Epoch 1 | Step 400 | Loss 0.05973822623491287 | Test Accuracy 0.976
Epoch 1 | Step 450 | Loss 0.22700577974319458 | Test Accuracy 0.9805
Epoch 1 | Step 500 | Loss 0.03670699521899223 | Test Accuracy 0.9725
Epoch 1 | Step 550 | Loss 0.14919476211071014 | Test Accuracy 0.9785
Time cost: 164.68248105049133 s

使用GPU训练：

Epoch 0 | Step 0 | Loss 2.295382499694824 | Test Accuracy 0.1795
Epoch 0 | Step 50 | Loss 0.4366167187690735 | Test Accuracy 0.851
Epoch 0 | Step 100 | Loss 0.1392095685005188 | Test Accuracy 0.915
Epoch 0 | Step 150 | Loss 0.374984472990036 | Test Accuracy 0.925
Epoch 0 | Step 200 | Loss 0.11992576718330383 | Test Accuracy 0.9435
Epoch 0 | Step 250 | Loss 0.09971962124109268 | Test Accuracy 0.955
Epoch 0 | Step 300 | Loss 0.15602746605873108 | Test Accuracy 0.9635
Epoch 0 | Step 350 | Loss 0.10646170377731323 | Test Accuracy 0.963
Epoch 0 | Step 400 | Loss 0.10151582956314087 | Test Accuracy 0.9675
Epoch 0 | Step 450 | Loss 0.050429973751306534 | Test Accuracy 0.97
Epoch 0 | Step 500 | Loss 0.07986892014741898 | Test Accuracy 0.966
Epoch 0 | Step 550 | Loss 0.11002516746520996 | Test Accuracy 0.9665
Epoch 1 | Step 0 | Loss 0.07174035906791687 | Test Accuracy 0.9745
Epoch 1 | Step 50 | Loss 0.1582135409116745 | Test Accuracy 0.9685
Epoch 1 | Step 100 | Loss 0.09163351356983185 | Test Accuracy 0.9805
Epoch 1 | Step 150 | Loss 0.13820190727710724 | Test Accuracy 0.9775
Epoch 1 | Step 200 | Loss 0.0733216404914856 | Test Accuracy 0.978
Epoch 1 | Step 250 | Loss 0.01615101844072342 | Test Accuracy 0.9785
Epoch 1 | Step 300 | Loss 0.0749548077583313 | Test Accuracy 0.978
Epoch 1 | Step 350 | Loss 0.05822641775012016 | Test Accuracy 0.977
Epoch 1 | Step 400 | Loss 0.033135443925857544 | Test Accuracy 0.98
Epoch 1 | Step 450 | Loss 0.07146552950143814 | Test Accuracy 0.9835
Epoch 1 | Step 500 | Loss 0.13729988038539886 | Test Accuracy 0.9795
Epoch 1 | Step 550 | Loss 0.07742690294981003 | Test Accuracy 0.98
Time cost: 7.764967918395996 s

可以发现，使用CPU训练用时为164.7s，准确率为0.9785；
而使用GPU训练用时为7.8s，准确率为0.98。

六、完整代码

使用CPU训练：

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torchsummary import summary
import time
    
# 创建神经网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.output_layer = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        output = self.output_layer(x)
        return output

# 超参数
EPOCH = 2
BATCH_SIZE = 100
LR = 0.001
DOWNLOAD = False # 若已经下载mnist数据集则设为False

# 下载mnist数据
train_data = datasets.MNIST(
    root='./data', # 保存路径
    train=True, # True表示训练集，False表示测试集
    transform=transforms.ToTensor(), # 将0~255压缩为0~1
    download=DOWNLOAD
)

# 旧的写法
print(train_data.train_data.size())
print(train_data.train_labels.size())

# 新的写法
print(train_data.data.size())
print(train_data.targets.size())

# 打印部分数据集的图片
for i in range(2):
    print(train_data.targets[i].item())
    plt.imshow(train_data.data[i].numpy(), cmap='gray')
    plt.show()
    
# DataLoader
train_loader = Data.DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

# 如果train_data下载好后，test_data也就下载好了
test_data = datasets.MNIST(
    root='./data',
    train=False
)

print(test_data.data.size())
print(test_data.targets.size())

# 新建网络
cnn = CNN()
print(cnn)

# 查看网络的结构
model = CNN()
if torch.cuda.is_available():
    model.cuda()
summary(model, input_size=(1,28,28))

# 优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)

# 损失函数
loss_func = nn.CrossEntropyLoss()

# 为了节约时间，只使用测试集的前2000个数据
test_x = Variable(
    torch.unsqueeze(test_data.data, dim=1),
    volatile=True
).type(torch.FloatTensor)[:2000]/255 # 将将0~255压缩为0~1

test_y = test_data.targets[:2000]

# # 使用所有的测试集
# test_x = Variable(
#     torch.unsqueeze(test_data.test_data, dim=1),
#     volatile=True
# ).type(torch.FloatTensor)/255 # 将将0~255压缩为0~1

# test_y = test_data.test_labels

# 开始计时
start = time.time()

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            predict_y = torch.max(test_output, 1)[1].data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)
            
# 结束计时
end = time.time()

# 训练耗时
print('Time cost:', end - start, 's')

# 预测
test_output = cnn(test_x[:100])
predict_y = torch.max(test_output, 1)[1].data.numpy().squeeze()
real_y = test_y[:100].numpy()
print(predict_y)
print(real_y)

# 打印预测和实际结果
for i in range(10):
    print('Predict', predict_y[i])
    print('Real', real_y[i])
    plt.imshow(test_data.data[i].numpy(), cmap='gray')
    plt.show()

使用GPU训练：

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.utils.data as Data
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from torchsummary import summary
import time

    
# 创建神经网络
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.output_layer = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.reshape(x.size(0), -1)
        output = self.output_layer(x)
        return output

# 超参数
EPOCH = 2
BATCH_SIZE = 100
LR = 0.001
DOWNLOAD = False # 若已经下载mnist数据集则设为False

# 下载mnist数据
train_data = datasets.MNIST(
    root='./data', # 保存路径
    train=True, # True表示训练集，False表示测试集
    transform=transforms.ToTensor(), # 将0~255压缩为0~1
    download=DOWNLOAD
)

# 旧的写法
print(train_data.train_data.size())
print(train_data.train_labels.size())

# 新的写法
print(train_data.data.size())
print(train_data.targets.size())

# 打印部分数据集的图片
for i in range(2):
    print(train_data.targets[i].item())
    plt.imshow(train_data.data[i].numpy(), cmap='gray')
    plt.show()
    
# DataLoader
train_loader = Data.DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

# 如果train_data下载好后，test_data也就下载好了
test_data = datasets.MNIST(
    root='./data',
    train=False
)

print(test_data.data.size())
print(test_data.targets.size())

# 新建网络
cnn = CNN()
# 将神经网络移到GPU上
cnn.cuda()
print(cnn)

# 查看网络的结构
model = CNN()
if torch.cuda.is_available():
    model.cuda()
summary(model, input_size=(1,28,28))

# 优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)

# 损失函数
loss_func = nn.CrossEntropyLoss()

# 为了节约时间，只使用测试集的前2000个数据
test_x = Variable(
    torch.unsqueeze(test_data.data, dim=1),
    volatile=True
).type(torch.FloatTensor)[:2000]/255 # 将将0~255压缩为0~1

test_y = test_data.targets[:2000]

# # 使用所有的测试集
# test_x = Variable(
#     torch.unsqueeze(test_data.test_data, dim=1),
#     volatile=True
# ).type(torch.FloatTensor)/255 # 将将0~255压缩为0~1

# test_y = test_data.test_labels

# 将测试数据移到GPU上
test_x = test_x.cuda()
test_y = test_y.cuda()

# 开始计时
start = time.time()

# 训练神经网络
for epoch in range(EPOCH):
    for step, (batch_x, batch_y) in enumerate(train_loader):
        # 将训练数据移到GPU上
        batch_x = batch_x.cuda()
        batch_y = batch_y.cuda()
        output = cnn(batch_x)
        loss = loss_func(output, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 每隔50步输出一次信息
        if step%50 == 0:
            test_output = cnn(test_x)
            # 将预测结果移到GPU上
            predict_y = torch.max(test_output, 1)[1].cuda().data.squeeze()
            accuracy = (predict_y == test_y).sum().item() / test_y.size(0)
            print('Epoch', epoch, '|', 'Step', step, '|', 'Loss', loss.data.item(), '|', 'Test Accuracy', accuracy)

# 结束计时
end = time.time()

# 训练耗时
print('Time cost:', end - start, 's')

# 预测
test_output = cnn(test_x[:100])
# 为了将CUDA tensor转化为numpy，需要将数据移回CPU上
# 否则会报错：TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
predict_y = torch.max(test_output, 1)[1].cpu().data.numpy().squeeze()
real_y = test_y[:100].cpu().numpy()
print(predict_y)
print(real_y)

# 打印预测和实际结果
for i in range(10):
    print('Predict', predict_y[i])
    print('Real', real_y[i])
    plt.imshow(test_data.data[i].numpy(), cmap='gray')
    plt.show()