onnx标准 & onnxRuntime加速推理引擎




通常我们在训练模型时可以使用很多不同的框架,比如有的同学喜欢用 Pytorch,有的同学喜欢使用 TensorFLow,也有的喜欢 MXNet,以及深度学习最开始流行的 Caffe等等,这样不同的训练框架就导致了产生不同的模型结果包,在模型进行部署推理时就需要不同的依赖库,而且同一个框架比如tensorflow 不同的版本之间的差异较大, 为了解决这个混乱问题,LF AI 这个组织联合 Facebook, MicroSoft等公司制定了机器学习模型的标准,这个标准叫做ONNX, Open Neural Network Exchage,所有其他框架产生的模型包 (.pth, .pb) 都可以转换成这个标准格式,转换成这个标准格式后,就可以使用统一的 ONNX Runtime等工具进行统一部署。(和Java生成的中间文件可以在JVM上运行一样,onnx runtime引擎为生成的onnx模型文件提供推理功能)

onnx主页: onnxruntime.ai

开发文档和教程: onnxruntime.ai/docs

Companion sample repositories:





  • onnx为机器学习模型定义了一个标准,和TensorFlow,Caffe都属于主流的模型格式,并且提供了onnx runtime并行加速推理包,包括对ONNX 模型进行解读,优化(融合conv-bn等操作),运行,方便AI模型的移植和部署。在使用SCRFD发现,原生的onnx runtime加载onnx模型并完成推理的确比cv2.dnn.readNet(onnx)的效率高得多,差不多是后者的一倍多,该博主也给出相同的结论:https://blog.csdn.net/woshicver/article/details/113764970。

  • onnx也为主流的机器学习框架提供模型训练,这里主要是pytorch,官网说会比之前在pytorch训练快上1.4倍。只需要在加载模型时修改成:

     from torch_ort import ORTModule
     model = ORTModule(model)


  • 其实除了onnx提供了onnxRuntime推理引擎,还有比如阿里的**MNN轻量级高性能推理引擎,腾讯的NCNN,Nvidia的TensorRT**等






#Function to Convert to ONNX
def Convert_ONNX(model,input):

    # set the model to inference mode

    # Let's create a dummy input tensor
    dummy_input = input

    # Export the model
    torch.onnx.export(model,         # model being run
         dummy_input,       # model input (or a tuple for multiple inputs)
         "ImageClassifier.onnx",       # where to save the model
         export_params=True,  # store the trained parameter weights inside the model file
         opset_version=13,    # the ONNX version to export the model to
         do_constant_folding=True,  # whether to execute constant folding for optimization
         input_names = ['modelInput'],   # the model's input names
         output_names = ['modelOutput'], # the model's output names
         dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes
                                'modelOutput' : {0 : 'batch_size'}})
    print(" ")
    print('Model has been converted to ONNX')

if __name__ == '__main__':
    checkpoints = torch.load("./pth/mobileNetV3_SMART_CE.pth",map_location="cpu")  #全部权重加载到模型中

    # device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    device = torch.device("cpu")
    model = model.to(device)
    Convert_ONNX(model, sample)


RuntimeError: Exporting the operator relu6 to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.



  • Native support exporting F.hardsigmoid to onnx.
  • Replace F.hardsigmoid with F.hardtanh that is friendly for exporting and equal numerically as I did above.
class hswish(nn.Module):
   def forward(self, x):
       out = x * F.hardtanh(x + 3, inplace=True) / 6
       return out

class hsigmoid(nn.Module):
   def forward(self, x):
       out = F.hardtanh(x + 3, inplace=True) / 6
       return out

三、tf1.0 / tf2.0 ckpt转onnx

参考将tensorflow 1.x & 2.x转化成onnx文件

四、python onnx的使用


If using pip, run pip install --upgrade pip prior to downloading.

ArtifactDescriptionSupported Platforms
onnxruntimeCPU (Release 稳定版)Windows (x64), Linux (x64, ARM64), Mac (X64),
ort-nightlyCPU (Dev 测试版)Same as above
onnxruntime-gpuGPU (Release)Windows (x64), Linux (x64, ARM64)
ort-nightly-gpuGPU (Dev)Same as above


pip install onnxruntime
self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path)  # Create inference session using ort.InferenceSession
  • 和cpu一样,导入onnxruntime包即可,无需加上’-gpu’

  • 只需加个provider即可,参考onnx 需要指定provider

self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path,providers=['CUDAExecutionProvider'])  # Create inference session using ort.InferenceSession


RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:531 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasn't able to be loaded. Please install the correct version of CUDA and cuDNN as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.

解决方法:参考 https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

  • 查看CUDA:

    nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
    Cuda compilation tools, release 10.0, V10.0.130
  • 查看cuDNN:参考Linux 和 Windows 查看 CUDA 和 cuDNN 版本

    # 使用 PyTorch 查看 CUDA 和 cuDNN 版本
    import torch
  • 对照下面表格:安装onnxruntime-gpu==1.2

    ONNX RuntimeCUDAcuDNNNotes
    1.1011.48.2.4 (Linux) (Windows)libcudart 11.4.43 libcufft libcurand libcublasLt libcublas libcudnn 8.2.4
    1.911.48.2.4 (Linux) (Windows)libcudart 11.4.43 libcufft libcurand libcublasLt libcublas libcudnn 8.2.4
    1.811. (Linux) (Windows)libcudart 11.0.221 libcufft libcurand libcublasLt libcublas libcudnn 8.0.4
    1.711. (Linux) (Windows)libcudart 11.0.221 libcufft libcurand libcublasLt libcublas libcudnn 8.0.4
    1.5-1.610.28.0.3CUDA 11 can be built from source
    1.2-1.410.17.6.5Requires cublas10-; cublas 10.1.x will not work
    1.0- versions from 9.1 up to 10.1, and cuDNN versions from 7.1 up to 7.4 should also work with Visual Studio 2017
  • 由于我的CUDA是10.0,所以onnxruntime也要降至1.2版本(如果不行,安装onnxruntime-gpu == 1.1),否则会报错

    from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, RunOptions, SessionOptions, set_default_logger_severity, NodeArg, ModelMetadata, GraphOptimizationLevel, ExecutionMode, OrtDevice, SessionIOBinding
    ImportError: cannot import name 'get_all_providers'





    print(onnxruntime.get_device())    #检测当前的硬件情况
    self.ort_sess = onnxruntime.InferenceSession(rootPath + landmark_model_path,providers=['CUDAExecutionProvider'])  # Create inference session using ort.InferenceSession

    print(onnxruntime.get_device()) 一直输出CPU



onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:Conv_0 : No Op registered for Conv with domain_version of 13



    # Export the model
    torch.onnx.export(landmark_detector,  # model being run
                      input,  # model input (or a tuple for multiple inputs)
                      "ImageClassifier.onnx",  # where to save the model
                      export_params=True,  # store the trained parameter weights inside the model file
                      opset_version=9,  # the ONNX version to export the model to
                      do_constant_folding=True,  # whether to execute constant folding for optimization
                      input_names=['modelInput'],  # the model's input names
                      output_names=['modelOutput'],  # the model's output names
                      dynamic_axes={'modelInput': {0: 'batch_size'},  # variable length axes
                                    'modelOutput': {0: 'batch_size'}})
    print(" ")
    print('Model has been converted to ONNX')



import cv2
import onnx
from onnx import helper
import onnxruntime
import numpy as np

if __name__ == '__main__':

    # 参考 <https://blog.csdn.net/CFH1021/article/details/108732114>, https://onnxruntime.ai/docs/get-started/with-python.html

    # 加载模型
    # model = onnx.load('./weights/mbv2_ID_recognition.onnx')  # Load the onnx model with onnx.load
    model = onnx.load('./weights/scrfd_500m_kps.onnx')
    # model = onnx.load('./weights/mbv3_fire_classifier.onnx')
    # 检查模型格式是否完整及正确

    ref: https://github.com/onnx/onnx/blob/main/docs/IR.md
    Graphs have the following properties:
        name:	string	The name of the model graph.
        node:	Node[]	A list of nodes, forming a partially ordered computation graph based on input/output data dependencies. It is in topological order.
        initializer:	Tensor[]	A list of named tensor values. When an initializer has the same name as a graph input, it specifies a default value for that input. When an initializer has a name different from all graph inputs, it specifies a constant value. The order of the list is unspecified.
        doc_string:	string	Human-readable documentation for this model. Markdown is allowed.
        input:	ValueInfo[]	The input parameters of the graph, possibly initialized by a default value found in ‘initializer.’
        output:	ValueInfo[]	The output parameters of the graph. Once all output parameters have been written to by a graph execution, the execution is complete.
        value_info:	ValueInfo[]	Used to store the type and shape information of values that are not inputs or outputs.
    input = model.graph.input    # 获取输入层,包含层名称、维度信息
    output = model.graph.output  # 获取输出层,包含层名称、维度信息
    depth = len(model.graph.node)   # 获取节点数
    doc_string = model.graph.doc_string  # 获取关于onnx模型的相关文档,是在哪里转换的

    print(f"input = {input}")
    print(f"output = {output}")
    print(f"depth = {depth}")
    print(f"doc_string = {doc_string}")

    # 参考 https://www.jianshu.com/p/476478c17b8e
    # Print a human readable representation of the graph


input = [name: "images"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 3
      dim {
        dim_value: 640
      dim {
        dim_value: 640
output = [name: "out0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 12800
      dim {
        dim_value: 1
, name: "out1"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 3200
      dim {
        dim_value: 1
, name: "out2"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 800
      dim {
        dim_value: 1
, name: "out3"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 12800
      dim {
        dim_value: 4
, name: "out4"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 3200
      dim {
        dim_value: 4
, name: "out5"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 800
      dim {
        dim_value: 4
, name: "out6"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 12800
      dim {
        dim_value: 10
, name: "out7"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 3200
      dim {
        dim_value: 10
, name: "out8"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      dim {
        dim_value: 800
      dim {
        dim_value: 10
depth = 191
doc_string = 

graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %neck.lateral_convs.0.conv.weight[FLOAT, 16x72x1x1]
  %neck.lateral_convs.0.conv.bias[FLOAT, 16]
  %neck.lateral_convs.1.conv.weight[FLOAT, 16x152x1x1]
  %neck.lateral_convs.1.conv.bias[FLOAT, 16]
  %neck.lateral_convs.2.conv.weight[FLOAT, 16x288x1x1]
  %neck.lateral_convs.2.conv.bias[FLOAT, 16]
  %neck.fpn_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.0.conv.bias[FLOAT, 16]
  %neck.fpn_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.1.conv.bias[FLOAT, 16]
  %neck.fpn_convs.2.conv.weight[FLOAT, 16x16x3x3]
  %neck.fpn_convs.2.conv.bias[FLOAT, 16]
  %neck.downsample_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.downsample_convs.0.conv.bias[FLOAT, 16]
  %neck.downsample_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.downsample_convs.1.conv.bias[FLOAT, 16]
  %neck.pafpn_convs.0.conv.weight[FLOAT, 16x16x3x3]
  %neck.pafpn_convs.0.conv.bias[FLOAT, 16]
  %neck.pafpn_convs.1.conv.weight[FLOAT, 16x16x3x3]
  %neck.pafpn_convs.1.conv.bias[FLOAT, 16]
  %bbox_head.stride_cls.(8, 8).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(8, 8).bias[FLOAT, 2]
  %bbox_head.stride_cls.(16, 16).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(16, 16).bias[FLOAT, 2]
  %bbox_head.stride_cls.(32, 32).weight[FLOAT, 2x64x3x3]
  %bbox_head.stride_cls.(32, 32).bias[FLOAT, 2]
  %bbox_head.stride_reg.(8, 8).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(8, 8).bias[FLOAT, 8]
  %bbox_head.stride_reg.(16, 16).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(16, 16).bias[FLOAT, 8]
  %bbox_head.stride_reg.(32, 32).weight[FLOAT, 8x64x3x3]
  %bbox_head.stride_reg.(32, 32).bias[FLOAT, 8]
  %bbox_head.stride_kps.(8, 8).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(8, 8).bias[FLOAT, 20]
  %bbox_head.stride_kps.(16, 16).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(16, 16).bias[FLOAT, 20]
  %bbox_head.stride_kps.(32, 32).weight[FLOAT, 20x64x3x3]
  %bbox_head.stride_kps.(32, 32).bias[FLOAT, 20]
  %555[FLOAT, 16x3x3x3]
  %556[FLOAT, 16]
  %558[FLOAT, 16x1x3x3]
  %559[FLOAT, 16]
  %561[FLOAT, 16x16x1x1]
  %562[FLOAT, 16]
  %564[FLOAT, 16x1x3x3]
  %565[FLOAT, 16]
  %567[FLOAT, 40x16x1x1]
  %568[FLOAT, 40]
  %570[FLOAT, 40x1x3x3]
  %571[FLOAT, 40]
  %573[FLOAT, 40x40x1x1]
  %574[FLOAT, 40]
  %576[FLOAT, 40x1x3x3]
  %577[FLOAT, 40]
  %579[FLOAT, 72x40x1x1]
  %580[FLOAT, 72]
  %582[FLOAT, 72x1x3x3]
  %583[FLOAT, 72]
  %585[FLOAT, 72x72x1x1]
  %586[FLOAT, 72]
  %588[FLOAT, 72x1x3x3]
  %589[FLOAT, 72]
  %591[FLOAT, 72x72x1x1]
  %592[FLOAT, 72]
  %594[FLOAT, 72x1x3x3]
  %595[FLOAT, 72]
  %597[FLOAT, 152x72x1x1]
  %598[FLOAT, 152]
  %600[FLOAT, 152x1x3x3]
  %601[FLOAT, 152]
  %603[FLOAT, 152x152x1x1]
  %604[FLOAT, 152]
  %606[FLOAT, 152x1x3x3]
  %607[FLOAT, 152]
  %609[FLOAT, 288x152x1x1]
  %610[FLOAT, 288]
  %612[FLOAT, 288x1x3x3]
  %613[FLOAT, 288]
  %615[FLOAT, 288x288x1x1]
  %616[FLOAT, 288]
  %618[FLOAT, 288x1x3x3]
  %619[FLOAT, 288]
  %621[FLOAT, 288x288x1x1]
  %622[FLOAT, 288]
  %624[FLOAT, 288x1x3x3]
  %625[FLOAT, 288]
  %627[FLOAT, 288x288x1x1]
  %628[FLOAT, 288]
  %630[FLOAT, 288x1x3x3]
  %631[FLOAT, 288]
  %633[FLOAT, 288x288x1x1]
  %634[FLOAT, 288]
  %636[FLOAT, 288x1x3x3]
  %637[FLOAT, 288]
  %639[FLOAT, 288x288x1x1]
  %640[FLOAT, 288]
  %642[FLOAT, 16x1x3x3]
  %643[FLOAT, 16]
  %645[FLOAT, 64x16x1x1]
  %646[FLOAT, 64]
  %648[FLOAT, 64x1x3x3]
  %649[FLOAT, 64]
  %651[FLOAT, 64x64x1x1]
  %652[FLOAT, 64]
  %654[FLOAT, 16x1x3x3]
  %655[FLOAT, 16]
  %657[FLOAT, 64x16x1x1]
  %658[FLOAT, 64]
  %660[FLOAT, 64x1x3x3]
  %661[FLOAT, 64]
  %663[FLOAT, 64x64x1x1]
  %664[FLOAT, 64]
  %666[FLOAT, 16x1x3x3]
  %667[FLOAT, 16]
  %669[FLOAT, 64x16x1x1]
  %670[FLOAT, 64]
  %672[FLOAT, 64x1x3x3]
  %673[FLOAT, 64]
  %675[FLOAT, 64x64x1x1]
  %676[FLOAT, 64]
  %677[INT64, 1]
  %678[INT64, 1]
  %679[INT64, 1]
  %680[INT64, 1]
  %681[INT64, 1]
  %682[INT64, 1]
  %683[INT64, 1]
  %684[INT64, 1]
  %685[INT64, 1]
  %686[INT64, 1]
  %687[INT64, 1]
  %688[INT64, 1]
  %689[INT64, 1]
  %690[INT64, 1]
  %691[INT64, 1]
  %692[INT64, 1]
  %693[INT64, 1]
  %694[INT64, 1]
) {
  %554 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%images, %555, %556)
  %288 = Relu(%554)
  %557 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%288, %558, %559)
  %291 = Relu(%557)
  %560 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%291, %561, %562)
  %294 = Relu(%560)
  %563 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%294, %564, %565)
  %297 = Relu(%563)
  %566 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%297, %567, %568)
  %300 = Relu(%566)
  %569 = Conv[dilations = [1, 1], group = 40, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%300, %570, %571)
  %303 = Relu(%569)
  %572 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%303, %573, %574)
  %306 = Relu(%572)
  %575 = Conv[dilations = [1, 1], group = 40, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%306, %576, %577)
  %309 = Relu(%575)
  %578 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%309, %579, %580)
  %312 = Relu(%578)
  %581 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%312, %582, %583)
  %315 = Relu(%581)
  %584 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%315, %585, %586)
  %318 = Relu(%584)
  %587 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%318, %588, %589)
  %321 = Relu(%587)
  %590 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%321, %591, %592)
  %324 = Relu(%590)
  %593 = Conv[dilations = [1, 1], group = 72, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%324, %594, %595)
  %327 = Relu(%593)
  %596 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%327, %597, %598)
  %330 = Relu(%596)
  %599 = Conv[dilations = [1, 1], group = 152, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%330, %600, %601)
  %333 = Relu(%599)
  %602 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%333, %603, %604)
  %336 = Relu(%602)
  %605 = Conv[dilations = [1, 1], group = 152, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%336, %606, %607)
  %339 = Relu(%605)
  %608 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %609, %610)
  %342 = Relu(%608)
  %611 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%342, %612, %613)
  %345 = Relu(%611)
  %614 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%345, %615, %616)
  %348 = Relu(%614)
  %617 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%348, %618, %619)
  %351 = Relu(%617)
  %620 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%351, %621, %622)
  %354 = Relu(%620)
  %623 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%354, %624, %625)
  %357 = Relu(%623)
  %626 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%357, %627, %628)
  %360 = Relu(%626)
  %629 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%360, %630, %631)
  %363 = Relu(%629)
  %632 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%363, %633, %634)
  %366 = Relu(%632)
  %635 = Conv[dilations = [1, 1], group = 288, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%366, %636, %637)
  %369 = Relu(%635)
  %638 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%369, %639, %640)
  %372 = Relu(%638)
  %373 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%324, %neck.lateral_convs.0.conv.weight, %neck.lateral_convs.0.conv.bias)
  %374 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%336, %neck.lateral_convs.1.conv.weight, %neck.lateral_convs.1.conv.bias)
  %375 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%372, %neck.lateral_convs.2.conv.weight, %neck.lateral_convs.2.conv.bias)
  %376 = Shape(%374)
  %377 = Constant[value = <Scalar Tensor []>]()
  %378 = Gather[axis = 0](%376, %377)
  %379 = Shape(%374)
  %380 = Constant[value = <Scalar Tensor []>]()
  %381 = Gather[axis = 0](%379, %380)
  %382 = Unsqueeze[axes = [0]](%378)
  %383 = Unsqueeze[axes = [0]](%381)
  %384 = Concat[axis = 0](%382, %383)
  %385 = Shape(%375)
  %386 = Constant[value = <Tensor>]()
  %387 = Constant[value = <Tensor>]()
  %388 = Constant[value = <Tensor>]()
  %389 = Slice(%385, %387, %388, %386)
  %390 = Cast[to = 7](%384)
  %391 = Concat[axis = 0](%389, %390)
  %392 = Constant[value = <Tensor>]()
  %393 = Constant[value = <Tensor>]()
  %394 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%375, %392, %393, %391)
  %395 = Add(%374, %394)
  %396 = Shape(%373)
  %397 = Constant[value = <Scalar Tensor []>]()
  %398 = Gather[axis = 0](%396, %397)
  %399 = Shape(%373)
  %400 = Constant[value = <Scalar Tensor []>]()
  %401 = Gather[axis = 0](%399, %400)
  %402 = Unsqueeze[axes = [0]](%398)
  %403 = Unsqueeze[axes = [0]](%401)
  %404 = Concat[axis = 0](%402, %403)
  %405 = Shape(%395)
  %406 = Constant[value = <Tensor>]()
  %407 = Constant[value = <Tensor>]()
  %408 = Constant[value = <Tensor>]()
  %409 = Slice(%405, %407, %408, %406)
  %410 = Cast[to = 7](%404)
  %411 = Concat[axis = 0](%409, %410)
  %412 = Constant[value = <Tensor>]()
  %413 = Constant[value = <Tensor>]()
  %414 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%395, %412, %413, %411)
  %415 = Add(%373, %414)
  %416 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%415, %neck.fpn_convs.0.conv.weight, %neck.fpn_convs.0.conv.bias)
  %417 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%395, %neck.fpn_convs.1.conv.weight, %neck.fpn_convs.1.conv.bias)
  %418 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%375, %neck.fpn_convs.2.conv.weight, %neck.fpn_convs.2.conv.bias)
  %419 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%416, %neck.downsample_convs.0.conv.weight, %neck.downsample_convs.0.conv.bias)
  %420 = Add(%417, %419)
  %421 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%420, %neck.downsample_convs.1.conv.weight, %neck.downsample_convs.1.conv.bias)
  %422 = Add(%418, %421)
  %423 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%420, %neck.pafpn_convs.0.conv.weight, %neck.pafpn_convs.0.conv.bias)
  %424 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%422, %neck.pafpn_convs.1.conv.weight, %neck.pafpn_convs.1.conv.bias)
  %641 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%416, %642, %643)
  %427 = Relu(%641)
  %644 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%427, %645, %646)
  %430 = Relu(%644)
  %647 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%430, %648, %649)
  %433 = Relu(%647)
  %650 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%433, %651, %652)
  %436 = Relu(%650)
  %437 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_cls.(8, 8).weight, %bbox_head.stride_cls.(8, 8).bias)
  %438 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_reg.(8, 8).weight, %bbox_head.stride_reg.(8, 8).bias)
  %439 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%436, %bbox_head.stride_kps.(8, 8).weight, %bbox_head.stride_kps.(8, 8).bias)
  %440 = Shape(%437)
  %441 = Constant[value = <Scalar Tensor []>]()
  %442 = Gather[axis = 0](%440, %441)
  %443 = Transpose[perm = [0, 2, 3, 1]](%437)
  %446 = Unsqueeze[axes = [0]](%442)
  %449 = Concat[axis = 0](%446, %677, %678)
  %450 = Reshape(%443, %449)
  %out0 = Sigmoid(%450)
  %452 = Transpose[perm = [0, 2, 3, 1]](%438)
  %455 = Unsqueeze[axes = [0]](%442)
  %458 = Concat[axis = 0](%455, %679, %680)
  %out3 = Reshape(%452, %458)
  %460 = Transpose[perm = [0, 2, 3, 1]](%439)
  %463 = Unsqueeze[axes = [0]](%442)
  %466 = Concat[axis = 0](%463, %681, %682)
  %out6 = Reshape(%460, %466)
  %653 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%423, %654, %655)
  %470 = Relu(%653)
  %656 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%470, %657, %658)
  %473 = Relu(%656)
  %659 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%473, %660, %661)
  %476 = Relu(%659)
  %662 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%476, %663, %664)
  %479 = Relu(%662)
  %480 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_cls.(16, 16).weight, %bbox_head.stride_cls.(16, 16).bias)
  %481 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_reg.(16, 16).weight, %bbox_head.stride_reg.(16, 16).bias)
  %482 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%479, %bbox_head.stride_kps.(16, 16).weight, %bbox_head.stride_kps.(16, 16).bias)
  %483 = Shape(%480)
  %484 = Constant[value = <Scalar Tensor []>]()
  %485 = Gather[axis = 0](%483, %484)
  %486 = Transpose[perm = [0, 2, 3, 1]](%480)
  %489 = Unsqueeze[axes = [0]](%485)
  %492 = Concat[axis = 0](%489, %683, %684)
  %493 = Reshape(%486, %492)
  %out1 = Sigmoid(%493)
  %495 = Transpose[perm = [0, 2, 3, 1]](%481)
  %498 = Unsqueeze[axes = [0]](%485)
  %501 = Concat[axis = 0](%498, %685, %686)
  %out4 = Reshape(%495, %501)
  %503 = Transpose[perm = [0, 2, 3, 1]](%482)
  %506 = Unsqueeze[axes = [0]](%485)
  %509 = Concat[axis = 0](%506, %687, %688)
  %out7 = Reshape(%503, %509)
  %665 = Conv[dilations = [1, 1], group = 16, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%424, %666, %667)
  %513 = Relu(%665)
  %668 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%513, %669, %670)
  %516 = Relu(%668)
  %671 = Conv[dilations = [1, 1], group = 64, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%516, %672, %673)
  %519 = Relu(%671)
  %674 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%519, %675, %676)
  %522 = Relu(%674)
  %523 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_cls.(32, 32).weight, %bbox_head.stride_cls.(32, 32).bias)
  %524 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_reg.(32, 32).weight, %bbox_head.stride_reg.(32, 32).bias)
  %525 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%522, %bbox_head.stride_kps.(32, 32).weight, %bbox_head.stride_kps.(32, 32).bias)
  %526 = Shape(%523)
  %527 = Constant[value = <Scalar Tensor []>]()
  %528 = Gather[axis = 0](%526, %527)
  %529 = Transpose[perm = [0, 2, 3, 1]](%523)
  %532 = Unsqueeze[axes = [0]](%528)
  %535 = Concat[axis = 0](%532, %689, %690)
  %536 = Reshape(%529, %535)
  %out2 = Sigmoid(%536)
  %538 = Transpose[perm = [0, 2, 3, 1]](%524)
  %541 = Unsqueeze[axes = [0]](%528)
  %544 = Concat[axis = 0](%541, %691, %692)
  %out5 = Reshape(%538, %544)
  %546 = Transpose[perm = [0, 2, 3, 1]](%525)
  %549 = Unsqueeze[axes = [0]](%528)
  %552 = Concat[axis = 0](%549, %693, %694)
  %out8 = Reshape(%546, %552)
  return %out0, %out1, %out2, %out3, %out4, %out5, %out6, %out7, %out8

可以看到其输出有3个dict,一个是 input, 一个是 initializers,以及最后一个是operators把输入和权重 initialization 进行类似于 forward操作,在最后一个dict operators中其返回是 %191,也就是 gemm 最后一个全连接的输出。

利用netron在线工具https://netron.app/ 查看SCRFD模型结构(SCRFD中FPN每一层对应3个head)



import cv2
import onnx
from onnx import helper
import onnxruntime
import numpy as np

if __name__ == '__main__':

    ort_sess = onnxruntime.InferenceSession('./weights/scrfd_500m_kps.onnx')  # Create inference session using ort.InferenceSession
    # 加载图片
    img = cv2.imread("./img/calibrate_glasses.jpg")
    img = cv2.resize(img, (640, 640))
    img = img.astype(np.float32) / 255.
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.reshape((3,640,640))
    if len(img.shape) == 3:
        img = np.expand_dims(img, 0)

    outputs = ort_sess.run(None, {'images': img})  # 调用实例sess的run方法进行推理
    print(f"length of outputs = {len(outputs)}")
length of outputs = 9


五、onnx推理效率:和Module & DataParallel比较

跳转至onnx效率问题:和Module & DataParallel比较


