项目地址:https://www.eetree.cn/project/2604
视频地址:基于MAX78000的手势识别人机交互系统_哔哩哔哩_bilibili
(项目有点简陋,本来以为时间很长,开头就拿到板子的时候搞了一段时间,中间一直没碰。。。。等到最后发现时间快截止了的时候又和期末考试时间撞一起了,只能匆匆完结。。。)
模型是基于resnet18进行修改的,只保留了最后修改完的代码
conda activate Maxim
(1)要使用不同数据集的话,要在 ai8x-training-data 下里面的路径下存放数据集,并生成txt文件
(2)ai8x-training-datasets 下的 gesture.py 文件里修改数据读取路径
一、流程和一些细节
训练
train下
bash scripts/train-gesture.sh
#!/bin/sh python train.py --epochs 200 --optimizer Adam --lr 0.001 --batch-size 64 --gpus 0 --deterministic --compress policies/schedule-gesture.yaml --model ai85net_gesture --dataset gesture --param-hist --pr-curves --embedding --device MAX78000 "$@"
log中训练得到的
复制到ai8x-synthesis的 self_proj_gesture中,主要是qat量化文件
注释了 ai8x-training/ai8x.py 的两行代码
1808 :# b = target_attr.op.bias.data
1830 :# target_attr.op.bias.data = b_new
量化
synthesis下
scripts/quantize_gesture.sh
#!/bin/sh python quantize.py self_proj/gesture/qat_best.pth.tar self_proj/gesture/qat_best-ai8x-q.pth.tar --device MAX78000 -v "$@"
修改了 ai8x-synthesis/izer/quantize.py 一行代码
159 :# bias_name = '.'.join([layer, operation, 'bias'])
160 : bias_name = 'bias'
评估
train下
scripts/evaluate_gesture.sh
#!/bin/sh python ./train.py --model ai85net_gesture --dataset gesture --confusion --evaluate --exp-load-weights-from ../ai8x-synthesis/self_proj/gesture/qat_best-ai8x-q.pth.tar -8 --device MAX78000 "$@"
生成npy文件
train中
./train.py --model ai85net_gesture --save-sample 10 --dataset gesture --evaluate --exp-load-weights-from ../ai8x-synthesis/self_proj/gesture/qat_best-ai8x-q.pth.tar -8 --device MAX78000 "$@"
会在 ai8x-training 文件下生存 sample_fpr2.npy
移动到测试目录:ai8x-synthesis/tests/sample_fpr2.npy
模型转换
synthesis中
./ai8xize.py --verbose --test-dir demos --prefix ai85-gesture --checkpoint-file self_proj/gesture/qat_best-ai8x-q.pth.tar --config-file networks/gesture-chw.yaml --device MAX78000 --compact-data --mexpress --softmax --overwrite
下列的模型转换的yaml配置文件可供参考,搭配模型一起看:
yaml文件
--- # FaceNet sequential model ending with avg_pool, CHW(big data) data_format arch: ai85net_gesture dataset: gesture layers: - out_offset: 0x1000 processors: 0x0000000000000007 operation: conv2d max_pool: 1 pool_stride: 3 pad: 1 kernel_size: 3x3 activate: ReLU data_format: HWC - processors: 0x0ffff00000000000 # 16 out_offset: 0x0000 operation: conv2d activate: ReLU write_gap: 1 max_pool: 1 pool_stride: 2 kernel_size: 3x3 pad: 1 output_processors: 0x00000000ffffffff # 32 - processors: 0x00000000ffffffff out_offset: 0x2000 operation: passthrough write_gap: 1 output_processors: 0x00000000ffffffff # 32 name: res1 - pad: 1 operation: conv2d kernel_size: 3x3 activate: ReLU out_offset: 0x4000 processors: 0x00000000ffffffff - operation: conv2d out_offset: 0x2004 kernel_size: 3x3 pad: 1 name: res2 write_gap: 1 processors: 0x00000000ffffffff # layer4 + blk2 - in_sequences: [res1, res2] processors: 0x00000000ffffffff in_offset: 0x2000 out_offset: 0x0000 operation: conv2d eltwise: add max_pool: 1 pool_stride: 2 kernel_size: 3x3 pad: 1 - processors: 0x00000000ffffffff out_offset: 0x2000 operation: passthrough write_gap: 1 output_processors: 0x00000000ffffffff name: res3 - pad: 1 operation: conv2d kernel_size: 3x3 activate: ReLU out_offset: 0x4000 processors: 0x00000000ffffffff - operation: conv2d out_offset: 0x2004 kernel_size: 3x3 pad: 1 name: res4 write_gap: 1 processors: 0x00000000ffffffff # layer8 + blk3 - in_sequences: [res3, res4] processors: 0x00000000ffffffff in_offset: 0x2000 out_offset: 0x0000 operation: conv2d eltwise: add max_pool: 1 pool_stride: 2 kernel_size: 3x3 pad: 1 - processors: 0xffffffffffffffff # 64 out_offset: 0x2000 operation: passthrough write_gap: 1 output_processors: 0xffffffffffffffff name: res5 - pad: 1 operation: conv2d kernel_size: 3x3 activate: ReLU out_offset: 0x4000 processors: 0xffffffffffffffff - operation: conv2d out_offset: 0x2004 kernel_size: 3x3 pad: 1 name: res6 write_gap: 1 processors: 0xffffffffffffffff # layer12 + blk4 - in_sequences: [res5, res6] processors: 0xffffffffffffffff in_offset: 0x2000 out_offset: 0x0000 operation: conv2d eltwise: add max_pool: 1 pool_stride: 2 kernel_size: 3x3 pad: 1 - processors: 0xffffffffffffffff out_offset: 0x2000 operation: passthrough write_gap: 1 output_processors: 0xffffffffffffffff name: res7 - pad: 1 operation: conv2d kernel_size: 3x3 activate: ReLU out_offset: 0x4000 processors: 0xffffffffffffffff - operation: conv2d out_offset: 0x2004 kernel_size: 3x3 pad: 1 name: res8 write_gap: 1 processors: 0xffffffffffffffff # layer16 - in_sequences: [res7, res8] in_offset: 0x2000 out_offset: 0x0000 eltwise: add avg_pool: [2,2] # 64*2*2 -> 64*1*1, 设置为[1,1]的时候是把 64*2*2 -> 16*2*2,所以不行 pool_stride: 1 operation: None processors: 0xffffffffffffffff output_processors: 0xffffffffffffffff # Layer 18 - LINNER - out_offset: 0x2000 processors: 0xffffffffffffffff output_processors: 0x00000000000000f9 operation: fc flatten: true output_width: 32
模型:
import torch import torch.nn as nn from torch.nn import functional as F import ai8x class ResBlk(nn.Module): """ resnet block """ def __init__(self, ch_in, ch_out, stride=1, bias=False, **kwargs): # 要传入输入、输出的维度 """ :param ch_in: :param ch_out: """ super(ResBlk, self).__init__() self.ch_in = ch_in self.ch_out = ch_out self.conv1 = ai8x.FusedMaxPoolConv2dReLU(ch_in, ch_out, kernel_size=3, pool_size=1, pool_stride=2, stride=stride, padding=1, bias=bias, **kwargs) self.conv2 = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=3,stride=1, padding=1, bias=bias, **kwargs) # self.conv2 = ai8x.FusedConv2dReLU(ch_out, ch_out, kernel_size=3, stride=1, padding=1, #bias=bias, **kwargs) #self.conv1 = ai8x.Conv2d(ch_in, ch_out, kernel_size=3, stride=stride, padding=1) #self.bn1 = nn.BatchNorm2d(ch_out) #self.conv2 = ai8x.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1) #self.bn2 = nn.BatchNorm2d(ch_out) # self.extra = nn.Sequential() # if ch_out != ch_in: # # [b, ch_in, h, w] => [b, ch_out, h, w] # self.extra = nn.Sequential( # ai8x.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride, bias=bias, **kwargs), # #nn.BatchNorm2d(ch_out) # ) self.resid1 = ai8x.Add() self.extra = ai8x.Conv2d(ch_out, ch_out, kernel_size=3, stride=stride, padding=1, bias=bias, **kwargs) # self.extra = ai8x.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride, bias=bias, **kwargs) def forward(self, x): """ :param x: [b, ch, h, w] :return: """ #out = F.relu(self.bn1(self.conv1(x))) #out = self.bn2(self.conv2(out)) #print("out1:", x) x = self.conv1(x) #print("x:", x.shape) out = self.conv2(x) #print("out:", x.shape) # short cut. # extra module: [b, ch_in, h, w] => [b, ch_out, ch_out, h, w] # element-wise add: # out = self.extra(x) + out # if self.ch_in != self.ch_out: # out = self.extra(x) + out # else: # out = x + out # out = self.extra(x) + out out = self.extra(out) #print("out2:", out.shape) out = self.resid1(out, x) #print("out3:", out.shape) return out class ResNet18(nn.Module): def __init__(self,num_classes=6, num_channels=1,dimensions=(64, 64), bias=False, **kwargs): super(ResNet18, self).__init__() #self.conv1 = nn.Sequential( # ai8x.Conv2d(3, 64, kernel_size=3, stride=3, padding=0), # nn.BatchNorm2d(64) #) # self.conv1 = ai8x.FusedConv2dReLU(3, 32, kernel_size=3, stride=3, padding=0, bias=bias, #**kwargs) self.conv2 = ai8x.FusedMaxPoolConv2dReLU(3, 16, kernel_size=3, pool_size=1, pool_stride=3, stride=1, padding=1, bias=bias, **kwargs) # 修改为 stride=2, padding=1 后就是 64->32 # followed 4 blocks # [b, 64, h, w] => [b, 128, h, w] # 输入 22*22 self.blk1 = ResBlk(16, 32, stride=1) # 11*11 # [b, 128, h, w] => [b, 256, h, w] self.blk2 = ResBlk(32, 32, stride=1) # 6*6 # [b, 256, h, w] => [b, 5112, h, w] self.blk3 = ResBlk(32, 64, stride=1) # 3*3 # [b, 512, h, w] => [b, 1024, h, w] self.blk4 = ResBlk(64, 64, stride=1) # 2*2 #self.out = ai8x.Conv2d(512, 256, kernel_size=1, stride=1, bias=bias, **kwargs) #self.out2 = ai8x.Conv2d(256, 128, kernel_size=1, stride=1, bias=bias, **kwargs) self.outlayer = ai8x.Linear(64 * 1 * 1, 6) def forward(self, x): """ :param x: :return: """ # x = F.relu(self.conv1(x)) # print(x.shape) # torch.Size([64, 9, 64, 64]) # x = x[:, :3, :, :] #print(x.shape) # torch.Size([3, 64, 64]) #x = self.conv1(x) x = self.conv2(x) #print("x1:",x.shape) # [b, 64, h, w] => [b, 1024, h, w] x = self.blk1(x) #print("x2:",x.shape) x = self.blk2(x) #print("x3:",x.shape) x = self.blk3(x) #print("x4:",x.shape) x = self.blk4(x) #print("x5:",x.shape) # print('after conv:', x.shape) # [b, 512, 2, 2] # [b, 512, h, w] => [b, 512, 1, 1] # 不管你的输入是多少,最终经过这个 avgpooling 都会变成 [1, 1]的 x = F.adaptive_avg_pool2d(x, [1, 1]) #print("x6:",x.shape) # print('after pool:', x.shape) # after pool: torch.Size([2, 512, 1, 1]) #x = self.out(x) #x = self.out2(x) x = x.view(x.size(0), -1) #print("x7:",x.shape) x = self.outlayer(x) #print("x8:",x.shape) return x def ai85net_gesture(pretrained=False, **kwargs): assert not pretrained return ResNet18(**kwargs) models = [ { 'name': 'ai85net_gesture', 'min_input': 1, 'dim': 3, }, ]
二、补充
(1)若模型量化后(这里都用qat方式)评估时,精度下降严重,将你模型中所有使用 nn.xx 的替换成 ai8x.xx 的,需要找 ai8x.py 中的函数的对应名称
(2)使用ai8x.Add()进行ai8x的残差连接
(3)avg_pool: [2,2]
具体含义小编也不太清楚,不过可参考下式理解
[2,2]:64*2*2 -> 64*1*1, 设置为[1,1]的时候是把 64*2*2 -> 16*2*2,所以不行
(4)个人理解 processors 和 output_processors 其实就是通道channel的大小,processors是输入通道,output_processors是输出通道
0xffffffffffffffff = 64
0x00000000ffffffff = 32
0x0ffff00000000000 = 16
0x0000000000000007 = 3
0x00000000000000f9 = 6
一些 大概的值,具体计算没去研究
(5)[res3, res4] 也可用层号代替,如[1,3]
(6)yaml 中好像不能写入stride,可能默认是1,只能写pool_stride,所以我对我的模型做了一些修改,改为stride都使用1的情况
(7)yaml 文件的配置参考:MaximAI_Documentation/Guides/YAML Quickstart.md at main · analogdevicesinc/MaximAI_Documentation · GitHub
补充一个:gesture-chw.yaml
kernel 为1*1时,pad要设置为0
(kernel应该是要一起设置,不然pad会有默认值)
还没有评论,来说两句吧...