一、現有數據集+現有模型
1、命令結構
python tools/train.py ${CONFIG_FILE} [ARGS]
tools/train.py是訓練模型的腳本,而後面CONFIG_FILE是配置文件,同樣也是以目錄的形式
2、示例
cd mmpretrain後
python tools/train.py configs/resnet/resnet18_8xb16_cifar10.py
由於‘resnet/resnet18_8xb16_cifar10.py’配置文件內含自帶數據集,所以不需要對數據集進行配置,直接運行命令即可。然而,如此智能使用固定的已有模型對現有數據集進行處理,並不符合實際應用需求。
_base_ = [ '../_base_/models/resnet18_cifar.py', '../_base_/datasets/cifar10_bs16.py', '../_base_/schedules/cifar10_bs128.py', '../_base_/default_runtime.py' ]
這是‘resnet/resnet18_8xb16_cifar10.py’的內容,據此我們可以推斷,後續我們自己訓練模型:
①如需自定義模型,則修改 '../_base_/models/resnet18_cifar.py'部分;
②如需自定義數據集,則修改'../_base_/datasets/cifar10_bs16.py'部分;
③如需自定義訓練策略,則修改'../_base_/schedules/cifar10_bs128.py'部分;
3、結果文件
不論訓練成功與否,會在文件夾中如下位置生產日誌文件:
mmpretrain/work_dirs/resnet18_8xb16_cifar10
二、自定義模型+現有數據集
1、思路一
(1)使用pytorch實現完整的前向傳播,並測試好維度
我們搭建一個簡單的卷積網絡,現有原始代碼如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.Maxpool2d(2, 3)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 *5 *5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 *5 *5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
din = torch,randn(16,3,32,32)
net = Net()
net(din).shape
(2)按照MMpretrain的約定將代碼轉換以及配置文件的改寫
注意!!!
原始代碼中只屬於MMPretrain架構中的Backbone,而該架構中的Head因為其包含了損失Loss,是必須存在的。因此,在實踐中,要把原始的神經網絡代碼強行分成至少兩部分。
①在粘貼入代碼進行修改之前,我們先對原始代碼的主體架構部分進行觀察,事實上在11/18的學習中,已經應當有所瞭解,此為複習
auto_scale_lr = dict(base_batch_size=128)
……
model = dict(
backbone=dict(
depth=18,
num_stages=4,
out_indices=(3, ),
style='pytorch',
type='ResNet_CIFAR'),
head=dict(
in_channels=512,
loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
num_classes=10,
type='LinearClsHead'),
neck=dict(type='GlobalAveragePooling'),
type='ImageClassifier')
……
work_dir = './work_dirs/resnet18_8xb16_cifar10'
②以resnet_cifar為例,進行修改得到:
import torch.nn as nn
import torch.nn.functional as F
from mmpretrain.registry import MODELS
@MODELS.register_module()
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 3)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 *5 *5, 120)
self.fc2 = nn.Linear(120, 84)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 *5 *5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return tuple([x])
③__init__文件中進行註冊
# Copyright (c) OpenMMLab. All rights reserved. from .alexnet import AlexNet …… from .hjs import Net __all__ = [ 'LeNet5', …… 'Net', ]
④終端輸入命令如下:
python mmpretrain/tools/train.py mmpretrain/work_dirs/resnet18_8xb16_cifar10/resnet18_8xb16_cifar10.py
發生多次報錯!!
KeyError: 'Net is not in the mmpretrain::model registry. Please check whether the value of
Netis correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'
⑤註釋alexnet→shift模塊前加if→註釋costom、去掉if→去掉alexnet前註釋:
⑥後續又發生大小未對齊的報錯,我們採用自適應池化方法解決:
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = nn.AdaptiveAvgPool2d((5, 5))(x) # 調整為 5x5 特徵圖
x = x.view(x.size(0), -1) # 展開
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return tuple([x])
至此,終於成功運行
新問題!!
work_dir中沒有checkpoint輸出!!!
2、思路二
(1)通過斷點調試以及可視化的方式理解官方代碼的維度變換
輸出維度查看
通過print(x.shape),下一行故意寫錯查看
(2)在理解的基礎上直接按照約定改寫官方代碼
類似於過去YOLO的改進方法,替換原始模型的一些模塊,例如conv2d等