一、現有數據集+現有模型

1、命令結構

python tools/train.py ${CONFIG_FILE} [ARGS]

tools/train.py是訓練模型的腳本,而後面CONFIG_FILE是配置文件,同樣也是以目錄的形式

2、示例

cd mmpretrain後

python tools/train.py configs/resnet/resnet18_8xb16_cifar10.py

由於‘resnet/resnet18_8xb16_cifar10.py’配置文件內含自帶數據集,所以不需要對數據集進行配置,直接運行命令即可。然而,如此智能使用固定的已有模型對現有數據集進行處理,並不符合實際應用需求。

_base_ = [ '../_base_/models/resnet18_cifar.py', '../_base_/datasets/cifar10_bs16.py', '../_base_/schedules/cifar10_bs128.py', '../_base_/default_runtime.py' ]

這是‘resnet/resnet18_8xb16_cifar10.py’的內容,據此我們可以推斷,後續我們自己訓練模型:

①如需自定義模型,則修改 '../_base_/models/resnet18_cifar.py'部分;

②如需自定義數據集,則修改'../_base_/datasets/cifar10_bs16.py'部分;

③如需自定義訓練策略,則修改'../_base_/schedules/cifar10_bs128.py'部分;

3、結果文件

不論訓練成功與否,會在文件夾中如下位置生產日誌文件:

mmpretrain/work_dirs/resnet18_8xb16_cifar10

二、自定義模型+現有數據集

1、思路一

(1)使用pytorch實現完整的前向傳播,並測試好維度

我們搭建一個簡單的卷積網絡,現有原始代碼如下:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.Maxpool2d(2, 3)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 *5 *5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 *5 *5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        
din = torch,randn(16,3,32,32)
net = Net()
net(din).shape
(2)按照MMpretrain的約定將代碼轉換以及配置文件的改寫

注意!!!

原始代碼中只屬於MMPretrain架構中的Backbone,而該架構中的Head因為其包含了損失Loss,是必須存在的。因此,在實踐中,要把原始的神經網絡代碼強行分成至少兩部分

①在粘貼入代碼進行修改之前,我們先對原始代碼的主體架構部分進行觀察,事實上在11/18的學習中,已經應當有所瞭解,此為複習

auto_scale_lr = dict(base_batch_size=128)

……
model = dict(
    backbone=dict(
        depth=18,
        num_stages=4,
        out_indices=(3, ),
        style='pytorch',
        type='ResNet_CIFAR'),
    head=dict(
        in_channels=512,
        loss=dict(loss_weight=1.0, type='CrossEntropyLoss'),
        num_classes=10,
        type='LinearClsHead'),
    neck=dict(type='GlobalAveragePooling'),
    type='ImageClassifier')
……
work_dir = './work_dirs/resnet18_8xb16_cifar10'

②以resnet_cifar為例,進行修改得到:

import torch.nn as nn
import torch.nn.functional as F

from mmpretrain.registry import MODELS

@MODELS.register_module()

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 3)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 *5 *5, 120)
        self.fc2 = nn.Linear(120, 84)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 *5 *5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return tuple([x])

③__init__文件中進行註冊

# Copyright (c) OpenMMLab. All rights reserved. from .alexnet import AlexNet …… from .hjs import Net __all__ = [ 'LeNet5', …… 'Net', ]

④終端輸入命令如下:

python mmpretrain/tools/train.py mmpretrain/work_dirs/resnet18_8xb16_cifar10/resnet18_8xb16_cifar10.py

發生多次報錯!!

KeyError: 'Net is not in the mmpretrain::model registry. Please check whether the value of Net is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

⑤註釋alexnet→shift模塊前加if→註釋costom、去掉if→去掉alexnet前註釋:

⑥後續又發生大小未對齊的報錯,我們採用自適應池化方法解決:

def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = nn.AdaptiveAvgPool2d((5, 5))(x)  # 調整為 5x5 特徵圖
        x = x.view(x.size(0), -1)  # 展開
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return tuple([x])

至此,終於成功運行

新問題!!

work_dir中沒有checkpoint輸出!!!

2、思路二

(1)通過斷點調試以及可視化的方式理解官方代碼的維度變換

輸出維度查看

通過print(x.shape),下一行故意寫錯查看

(2)在理解的基礎上直接按照約定改寫官方代碼

類似於過去YOLO的改進方法,替換原始模型的一些模塊,例如conv2d等