香橙派 AIpro 昇腾 Ascend C++ 分类模型适配

香橙派 AIpro 昇腾 Ascend C++ 分类模型适配

码农世界 2024-06-05 前端 83 次浏览 0个评论

香橙派 AIpro 昇腾 Ascend C++ 分类模型适配

flyfish

文章目录

  • 香橙派 AIpro 昇腾 Ascend C++ 分类模型适配
    • 前言
    • 一、PyTorch官网resnet模型处理方式
      • 1、PyTorch模型 导出 onnx格式
      • 2、完整测试 输出top1结果
      • 3、完整测试 输出top5结果
      • 二、YOLOv8官网resnet模型Python处理方式
      • 三、昇腾resnet原始的C++预处理方式
      • 四、香橙派 AIpro 分类模型 自带Python示例的预处理方式
      • 五、对比不同
        • 1、Normalize
        • 2、CenterCrop
        • 六、香橙派 AIpro 分类模型resnet C++ 适配
          • 方式1 代码如下
          • 方式2 代码如下
          • 七、可以这样处理的原因

            模型可以从多个地方获取,这里说明两个地方

            从PyTorch官网获取到的resnet模型

            从YOLOv8官网获取到的resnet模型

            前言

            模型的处理

            查看香橙派 AIpro SoC版本

            根据上面查看到SoC版本是 310B4,在转换模型时选择Ascend310B4

            在硬件上可以加装一块固态盘,装上之后开机自动识别

            一、PyTorch官网resnet模型处理方式

            1、PyTorch模型 导出 onnx格式

            从PyTorch官网获取到的resnet模型

            # -*- coding: utf-8 -*-
            import torch
            import torchvision
            import onnx
            import onnxruntime
            import torch.nn as nn
            # 创建 PyTorch ResNet50 模型实例
            #在线下载
            #model = torchvision.models.resnet50(pretrained=True)
            #本地加载
            checkpoint_path ="/home/model/resnet50-19c8e357.pth"
            model = torchvision.models.resnet50().to("cpu")
            checkpoint = torch.load(checkpoint_path,map_location=torch.device('cpu'))
            model.load_state_dict(checkpoint)
            model.eval()
            batch_size = 1  
            input_shape = (batch_size, 3, 224, 224)
            input_data = torch.randn(input_shape)
            # 将模型转换为 ONNX 格式
            output_path_static = "resnet_static.onnx"
            output_path_dynamic = "resnet_dynamic.onnx"
            # dynamic
            torch.onnx.export(model, input_data, output_path_dynamic,
                              input_names=["input"], output_names=["output"],
                              dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}})
            #static
            torch.onnx.export(model, input_data, output_path_static,
                              input_names=["input"], output_names=["output"])
            # 简单测试
            session = onnxruntime.InferenceSession(output_path_dynamic)
            new_batch_size = 2  
            new_input_shape = (new_batch_size, 3, 224, 224)
            new_input_data = torch.randn(new_input_shape)
            outputs = session.run(["output"], {"input": new_input_data.numpy()})
            print(outputs)
            

            2、完整测试 输出top1结果

            # -*- coding: utf-8 -*-
            import onnxruntime
            import numpy as np
            from torchvision import datasets, models, transforms
            from PIL import Image
            import torch.nn as nn
            import torch
             
            def postprocess(outputs):
                res = list()
                outputs_exp = np.exp(outputs)
                outputs = outputs_exp / np.sum(outputs_exp, axis=1)[:,None]
                predictions = np.argmax(outputs, axis = 1)
                for pred, output in zip(predictions, outputs):
                    score = output[pred]
                    res.append((pred.tolist(),float(score)))
                return res
            onnx_model_path = "/home/model/resnet50_static.onnx"
             
            ort_session = onnxruntime.InferenceSession(onnx_model_path)
             
            transform = transforms.Compose([
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ])
             
            image = Image.open("/home/dog1_1024_683.jpg")
            image = transform(image).unsqueeze(0)  # 增加批处理维度
             
            input_data = image.detach().numpy()
             
            outputs_np = ort_session.run(None, {'input': input_data})
            outputs = outputs_np[0]
            res = postprocess(outputs)
            print(res)
            

            [(162, 0.9634788632392883)]

            3、完整测试 输出top5结果

            先把标签文件imagenet_classes.txt下载下来

            curl -o imagenet_classes.txt https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt
            
            # -*- coding: utf-8 -*-
            import onnxruntime
            import numpy as np
            from torchvision import datasets, models, transforms
            from PIL import Image
            import torch.nn as nn
            import torch
            from onnx import numpy_helper
            import time
             
            with open("imagenet_classes.txt", "r") as f:
                categories = [s.strip() for s in f.readlines()]
                
                
            def softmax(x):
                e_x = np.exp(x - np.max(x))
                return e_x / e_x.sum()
            onnx_model_path = "/home/model/resnet50_static.onnx"
             
            ort_session = onnxruntime.InferenceSession(onnx_model_path)
             
            transform = transforms.Compose([
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ])
             
            image = Image.open("/home/dog1_1024_683.jpg")
            image = transform(image).unsqueeze(0)  # 增加批处理维度
             
            session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
            latency = []
                
            start = time.time()
            input_arr = image.detach().numpy()
            output = session.run([], {'input':input_arr})[0]
            latency.append(time.time() - start)
            output = output.flatten()
            output = softmax(output) 
            top5_catid = np.argsort(-output)[:5]
            for catid in top5_catid:
                print(catid, categories[catid], output[catid])
              
            print("ONNX Runtime CPU Inference time = {} ms".format(format(sum(latency) * 1000 / len(latency), '.2f')))
            
            162 beagle 0.963479
            167 English foxhound 0.020814817
            166 Walker hound 0.011742038
            161 basset 0.0024754668
            164 bluetick 0.0004774033
            ONNX Runtime CPU Inference time = 20.01 ms
            

            预处理方式

            在计算机视觉领域,很多预训练模型(例如ResNet、VGG等)都是基于ImageNet数据集训练的。因此,使用相同的均值和标准差对数据进行标准化处理,可以确保输入数据与预训练模型的输入分布一致,有助于充分利用预训练模型的优势。

            transforms.Normalize函数通过减去均值并除以标准差,将输入图像的每个通道进行标准化处理。

            ImageNet数据集的结构

            训练集:包含超过120万张图像,用于训练模型。

            验证集:包含50,000张图像,用于模型验证和调整超参数。

            测试集:包含100,000张图像,用于评估模型的最终性能。

            使用ImageNet数据集的注意事项

            预处理:在使用ImageNet数据集进行训练时,通常需要对图像进行标准化处理,常用的均值和标准差为:

            均值:0.485,0.456,0.406

            标准差:0.229,0.224,0.225

            数据增强:为了提升模型的泛化能力,通常会对训练图像进行数据增强处理,例如随机裁剪、水平翻转等

            transforms.Resize 处理方式不同,有的地方是256,有的地方用的是224,

            二、YOLOv8官网resnet模型Python处理方式

            从YOLOv8官网获取到的resnet模型

            YOLOv8由Ultralytics 提供,YOLOv8 支持全方位的视觉 AI 任务,包括检测、分割、姿态估计、跟踪和分类。

            yolov8-cls-resnet50配置

            # Parameters
            nc: 1000 # number of classes
            scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
              # [depth, width, max_channels]
              n: [0.33, 0.25, 1024]
              s: [0.33, 0.50, 1024]
              m: [0.67, 0.75, 1024]
              l: [1.00, 1.00, 1024]
              x: [1.00, 1.25, 1024]
            # YOLOv8.0n backbone
            backbone:
              # [from, repeats, module, args]
              - [-1, 1, ResNetLayer, [3, 64, 1, True, 1]] # 0-P1/2
              - [-1, 1, ResNetLayer, [64, 64, 1, False, 3]] # 1-P2/4
              - [-1, 1, ResNetLayer, [256, 128, 2, False, 4]] # 2-P3/8
              - [-1, 1, ResNetLayer, [512, 256, 2, False, 6]] # 3-P4/16
              - [-1, 1, ResNetLayer, [1024, 512, 2, False, 3]] # 4-P5/32
            # YOLOv8.0n head
            head:
              - [-1, 1, Classify, [nc]] # Classify
            

            该分类模型的预处理方式如下

            IMAGENET_MEAN = 0.485, 0.456, 0.406  # RGB mean
            IMAGENET_STD = 0.229, 0.224, 0.225  # RGB standard deviation
            def classify_transforms(
                size=224,
                mean=DEFAULT_MEAN,
                std=DEFAULT_STD,
                interpolation=Image.BILINEAR,
                crop_fraction: float = DEFAULT_CROP_FRACTION,
            ):
                """
                Classification transforms for evaluation/inference. Inspired by timm/data/transforms_factory.py.
                Args:
                    size (int): image size
                    mean (tuple): mean values of RGB channels
                    std (tuple): std values of RGB channels
                    interpolation (T.InterpolationMode): interpolation mode. default is T.InterpolationMode.BILINEAR.
                    crop_fraction (float): fraction of image to crop. default is 1.0.
                Returns:
                    (T.Compose): torchvision transforms
                """
                import torchvision.transforms as T  # scope for faster 'import ultralytics'
                if isinstance(size, (tuple, list)):
                    assert len(size) == 2
                    scale_size = tuple(math.floor(x / crop_fraction) for x in size)
                else:
                    scale_size = math.floor(size / crop_fraction)
                    scale_size = (scale_size, scale_size)
                # Aspect ratio is preserved, crops center within image, no borders are added, image is lost
                if scale_size[0] == scale_size[1]:
                    # Simple case, use torchvision built-in Resize with the shortest edge mode (scalar size arg)
                    tfl = [T.Resize(scale_size[0], interpolation=interpolation)]
                else:
                    # Resize the shortest edge to matching target dim for non-square target
                    tfl = [T.Resize(scale_size)]
                tfl += [T.CenterCrop(size)]
                tfl += [
                    T.ToTensor(),
                    T.Normalize(
                        mean=torch.tensor(mean),
                        std=torch.tensor(std),
                    ),
                ]
                return T.Compose(tfl)
            

            标准化数据分布:深度学习模型通常在训练过程中受益于输入数据的标准化,即将输入数据的分布调整为零均值和单位方差。这样可以确保所有特征具有相似的尺度,从而提高学习效率。对于图像数据而言,这意味着将像素值从原始范围(通常是0-255)转换到一个更统一的范围。

            加速收敛:通过减去平均值并除以标准差,可以使梯度下降等优化算法在训练初期更快地收敛。这是因为这样的预处理减少了输入数据的方差,使得学习过程更加稳定和高效。

            网络权重初始化的匹配:很多预训练模型(尤其是基于ImageNet训练的模型)在设计和训练时就假设了输入数据经过了这样的标准化处理。因此,在微调这些模型或使用它们作为特征提取器时,继续使用相同的预处理步骤能保证数据分布与模型预期的一致性,有助于保持模型性能。

            泛化能力:ImageNet是一个大规模、多样化的图像数据集,其统计特性(如颜色分布)在很大程度上代表了自然图像的普遍特征。因此,使用ImageNet的统计量进行归一化有助于模型学习到更广泛适用的特征,增强模型在新数据上的泛化能力。

            如果任务或数据集与ImageNet有显著不同,直接使用ImageNet的均值和标准差可能不是最佳选择。在这种情况下,根据自己数据集的统计特性来计算并使用均值和标准差进行归一化可能会得到更好的效果。

            原始代码

            三、昇腾resnet原始的C++预处理方式

            namespace {
                const float min_chn_0 = 123.675;
                const float min_chn_1 = 116.28;
                const float min_chn_2 = 103.53;
                const float var_reci_chn_0 = 0.0171247538316637;
                const float var_reci_chn_1 = 0.0175070028011204;
                const float var_reci_chn_2 = 0.0174291938997821;
            }
            Result SampleResnetQuickStart::ProcessInput(const string testImgPath)
            {
                // read image from file by cv
                imagePath = testImgPath;
                srcImage = imread(testImgPath);
                Mat resizedImage;
                // zoom image to modelWidth_ * modelHeight_
                resize(srcImage, resizedImage, Size(modelWidth_, modelHeight_));
                // get properties of image
                int32_t channel = resizedImage.channels();
                int32_t resizeHeight = resizedImage.rows;
                int32_t resizeWeight = resizedImage.cols;
                // data standardization
                float meanRgb[3] = {min_chn_2, min_chn_1, min_chn_0};
                float stdRgb[3]  = {var_reci_chn_2, var_reci_chn_1, var_reci_chn_0};
                // create malloc of image, which is shape with NCHW
                imageBytes = (float*)malloc(channel * resizeHeight * resizeWeight * sizeof(float));
                memset(imageBytes, 0, channel * resizeHeight * resizeWeight * sizeof(float));
                uint8_t bgrToRgb=2;
                // image to bytes with shape HWC to CHW, and switch channel BGR to RGB
                for (int c = 0; c < channel; ++c)
                {
                    for (int h = 0; h < resizeHeight; ++h)
                    {
                        for (int w = 0; w < resizeWeight; ++w)
                        {
                            int dstIdx = (bgrToRgb - c) * resizeHeight * resizeWeight + h * resizeWeight + w;
                            imageBytes[dstIdx] =  static_cast((resizedImage.at(h, w)[c] -
                      1.0f*meanRgb[c]) * 1.0f*stdRgb[c] );
                        }
                    }
                }
                return SUCCESS;
            }
            

            四、香橙派 AIpro 分类模型 自带Python示例的预处理方式

            img_origin = Image.open(pic_path).convert('RGB')
            from torchvision import transforms
            normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            trans_list = transforms.Compose([transforms.Resize(256),
                                    transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    normalize])
            img = trans_list(img_origin)
            

            运行

            (base) HwHiAiUser@orangepiaipro:~/samples/model-adapter-models/cls/edge_infer$ ./run.sh 
            set env successfully!!
            start exec atc
            [Sample] init resource stage:
            Init resource success
            load model  mobilenetv3_100_bs1.om
            Init model resource
            [Model] create model output dataset:
            [Model] create model output dataset success
            [Model] class Model init resource stage success
            acl.mdl.execute exhaust  0:00:00.004750
            class result :  cat
            pic name:  cat
            pre cost:7050.8ms
            forward cost:6.8ms
            post cost:0.0ms
            total cost:7057.6ms
            FPS:0.1
            image name :./data/cat/cat.23.jpg, infer result: cat
            acl.mdl.execute exhaust  0:00:00.004660
            class result :  cat
            pic name:  cat
            pre cost:14.0ms
            forward cost:5.2ms
            post cost:0.0ms
            total cost:19.2ms
            FPS:52.2
            image name :./data/cat/cat.76.jpg, infer result: cat
            

            五、对比不同

            经过比对有以下不同处

            1、Normalize

            Normalize 数值的不同,YOLOv8和PyTorch 是IMAGENET_MEAN 和 IMAGENET_STD

            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            

            昇腾的是

            namespace {
                const float min_chn_0 = 123.675;
                const float min_chn_1 = 116.28;
                const float min_chn_2 = 103.53;
                const float var_reci_chn_0 = 0.0171247538316637;
                const float var_reci_chn_1 = 0.0175070028011204;
                const float var_reci_chn_2 = 0.0174291938997821;
            }
            

            2、CenterCrop

            YOLOv8和PyTorch都有 CenterCrop 中心剪裁处理

            六、香橙派 AIpro 分类模型resnet C++ 适配

            根据对比的结果所以我们只要处理IMAGENET_MEAN 和 IMAGENET_STD ,在加上CenterCrop 中心剪裁处理

            所以我们可以增加centercrop_and_resize函数,然后在ProcessInput调用即可。

            方式1 代码如下

            static const float IMAGENET_MEAN[3] = { 0.485, 0.456, 0.406 };
            static const float IMAGENET_STD[3] = { 0.229, 0.224, 0.225 };
            void centercrop_and_resize(const cv::Mat& src_img, cv::Mat& dst_img,int target_size)
            {
                int height = src_img.rows;
                int width = src_img.cols;
                if(height >= width)// hw
                {
                    cv::resize(src_img, dst_img,  cv::Size(target_size,target_size * height / width), 0, 0, cv::INTER_AREA);
                }
                else
                {
                    cv::resize(src_img, dst_img,  cv::Size(target_size * width  / height,target_size), 0, 0, cv::INTER_AREA);
                }
                height = dst_img.rows;
                width = dst_img.cols;
                cv::Point center(width/2, height/2);
                cv::Size size(target_size, target_size);
                cv::getRectSubPix(dst_img, size, center, dst_img);
            }
            Result SampleResnetQuickStart::ProcessInput(const string testImgPath)
            {
                // read image from file by cv
                imagePath = testImgPath;
                srcImage = imread(testImgPath);
                cv::cvtColor(srcImage, srcImage, cv::COLOR_BGR2RGB);
                Mat resizedImage;
                centercrop_and_resize(srcImage,resizedImage,224);
                // get properties of image
                int32_t channel = resizedImage.channels();
                int32_t resizeHeight = resizedImage.rows;
                int32_t resizeWeight = resizedImage.cols;
                std::vector rgbChannels(3);
                cv::split(resizedImage, rgbChannels);
                for (size_t i = 0; i < rgbChannels.size(); i++) //    resizedImage = resizedImage / 255.0;
                {
                    rgbChannels[i].convertTo(rgbChannels[i], CV_32FC1, 1.0 / ( 255.0* IMAGENET_STD[i]), (0.0 - IMAGENET_MEAN[i]) / IMAGENET_STD[i]);
                }
                int len = channel * resizeHeight * resizeWeight * sizeof(float);
                imageBytes = (float *)malloc(len);
                memset(imageBytes, 0, len);
                int index = 0;
                for (int c = 0; c <3; c++)
                { // R,G,B
                    for (int h = 0; h < modelHeight_; ++h)
                    {
                        for (int w = 0; w < modelWidth_; ++w)
                        {
                            imageBytes[index] = rgbChannels[c].at(h, w); // R->G->B
                            index++;
                        }
                    }
                }
                return SUCCESS;
            }
            

            方式2 代码如下

            CenterCrop类似如下的写法

            char* centercrop_and_resize(cv::Mat& iImg, std::vector iImgSize, cv::Mat& oImg)
            {
                if (iImg.channels() == 3)
                {
                    oImg = iImg.clone();
                    cv::cvtColor(oImg, oImg, cv::COLOR_BGR2RGB);
                }
                else
                {
                    cv::cvtColor(iImg, oImg, cv::COLOR_GRAY2RGB);
                }
                int h = iImg.rows;
                int w = iImg.cols;
                int m = min(h, w);
                int top = (h - m) / 2;
                int left = (w - m) / 2;
                cv::resize(oImg(cv::Rect(left, top, m, m)), oImg, cv::Size(iImgSize.at(0), iImgSize.at(1)));
                 
               
                
                return RET_OK;
            }
            

            使用方式

            cv::Mat img = cv::imread(img_path);
            std::vector imgSize = { 640, 640 }; 
            cv::Mat processedImg;
            centercrop_and_resize(iImg, imgSize, processedImg);
            

            processedImg就是我们要得到的cv::Mat 。图像经过centercrop,最后大小是640, 640,通道顺序是RGB

            七、可以这样处理的原因

            不同的Normalize数值之间的转换关系

            # namespace {
            #     const float min_chn_0 = 123.675;
            #     const float min_chn_1 = 116.28;
            #     const float min_chn_2 = 103.53;
            #     const float var_reci_chn_0 = 0.0171247538316637;
            #     const float var_reci_chn_1 = 0.0175070028011204;
            #     const float var_reci_chn_2 = 0.0174291938997821;
            # }
            import numpy as np
            mean = np.array([0.485, 0.456, 0.406])
            std = np.array([0.229, 0.224, 0.225])
            print(mean * 255)# [123.675 116.28  103.53 ]
            print(1/(std*255))#[0.01712475 0.017507   0.01742919]
            

            两者是可以相互转换的

            # 0.485 × 255 = 123.675
            # 0.456 × 255 = 116.28
            # 0.406 × 255 = 103.53
            # 0.229  × 255  =  58.395
            # 0.224  × 255  =  57.12
            # 0.225  × 255  = 57.375
            # 1 ÷ 58.395 = 0.017124754
            # 1 ÷ 57.12  = 0.017507003
            # 1 ÷ 57.375 = 0.017429194
            

            原始整个流程如下

            适配后的处理就在上面第3步ProcessInput加上了 CenterCrop

            链接地址

            https://www.hiascend.com/zh/
            https://gitee.com/ascend
            华为原版的resnet图片分类,有C++版本和Python版本
            https://gitee.com/ascend/samples/tree/master/inference/modelInference/sampleResnetQuickStart/
            

转载请注明来自码农世界,本文标题:《香橙派 AIpro 昇腾 Ascend C++ 分类模型适配》

百度分享代码,如果开启HTTPS请参考李洋个人博客
每一天,每一秒,你所做的决定都会改变你的人生!

发表评论

快捷回复:

评论列表 (暂无评论,83人围观)参与讨论

还没有评论,来说两句吧...

Top