C++-ONNX：用onnxruntime部署自己的模型【Pytorch导出.onnx】【Tensorflow导出.onnx】【C++用onnxruntime框架部署并推理】

微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX，旨在将所有模型格式统一为一致，更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架，用户可以非常便利的用其运行一个onnx模型。ONNXRunt

u013250861

4381人浏览 · 2022-11-17 21:09:58

u013250861 · 2022-11-17 21:09:58 发布

微软联合Facebook等在2017年搞了个深度学习以及机器学习模型的格式标准–ONNX，旨在将所有模型格式统一为一致，更方便地实现模型部署。现在大多数的深度学习框架都支持ONNX模型转出并提供相应的导出接口。

ONNXRuntime(Open Neural Network Exchange)是微软推出的一款针对ONNX模型格式的推理框架，用户可以非常便利的用其运行一个onnx模型。ONNXRuntime支持多种运行后端包括CPU，GPU，TensorRT，DML等。可以说ONNXRuntime是对ONNX模型最原生的支持，只要掌握模型导出的相应操作，便能对将不同框架的模型进行部署，提高开发效率。

利用onnx和onnxruntime实现pytorch深度框架使用C++推理进行服务器部署，模型推理的性能是比python快很多的。

利用C++ ONNXruntime部署自己的模型，这里用Keras搭建好的一个网络模型来举例，转换为onnx的文件，在C++上进行部署，另外可以利用tensorRT加速。

Github地址：https://github.com/zouyuelin/SLAM_Learning_notes/tree/main/PoseEstimation

1、下载

GitHub下载地址：

Releases · microsoft/onnxruntime · GitHub

2、解压

下载的onnxruntime是直接编译好的库文件，直接放在自定义的文件夹中即可。在CMakeLists.txt中引入onnxruntime的头文件、库文件即可。

# 引入头文件
include_directories(......../onnxruntime/include)
# 引入库文件
link_directories(......../onnxruntime/lib)

一、模型的准备

1、Pytorch导出.onnx模型

首先，利用pytorch自带的torch.onnx模块导出 .onnx 模型文件，具体查看该部分pytorch官方文档，主要流程如下：

import torch
checkpoint = torch.load(model_path)
model = ModelNet(params)
model.load_state_dict(checkpoint['model'])
model.eval()

input_x_1 = torch.randn(10,20)
input_x_2 = torch.randn(1,20,5)
output, mask = model(input_x_1, input_x_2)

torch.onnx.export(model,
                 (input_x_1, input_x_2),
                 'model.onnx',
                 input_names = ['input','input_mask'],
                 output_names = ['output','output_mask'],
                 opset_version=11,
                 verbose = True,
                 dynamic_axes={'input':{1,'seqlen'}, 'input_mask':{1:'seqlen',2:'time'},'output_mask':{0:'time'}})

torch.onnx.export参数在文档里面都有，opset_version对应的版本很重要，dynamic_axes是对输入和输出对应维度可以进行动态设置，不设置的话输入和输出的Tensor 的 shape是不能改变的，如果输入固定就不需要加。

导出的模型可否顺利使用可以先使用python进行检测

import onnxruntime as ort
import numpy as np
ort_session = ort.InferenceSession('model.onnx')
outputs = ort_session.run(None,{'input':np.random.randn(10,20),'input_mask':np.random.randn(1,20,5)})
# 由于设置了dynamic_axes,支持对应维度的变化
outputs = ort_session.run(None,{'input':np.random.randn(10,5),'input_mask':np.random.randn(1,26,2)})
# outputs 为 包含'output'和'output_mask'的list

import onnx
model = onnx.load('model.onnx')
onnx.checker.check_model(model)

如果没有异常代表导出的模型没有问题，目前torch.onnx.export只能对部分支持的Tensor操作进行识别，详情参考Supported operators，对于包括transformer等基本的模型都是没有问题的，如果出现ATen等问题，你就需要对模型不支持的Tensor操作进行改进，以免影响C++对该模型的使用。

2、Tensorflow Keras导出.onnx模型

搭建网络模型训练：
tensorflow keras 搭建相机位姿估计网络–例
网络的输入输出为：

网络的输入： [image_ref , image_cur]
网络的输出: [tx , ty , tz , roll , pitch , yaw]

训练的模型位置：kerasTempModel\，一定要用model.save()的方式，不能用model.save_model()
在onnxruntime调用需要onnx模型，这里需要将keras的模型转换为onnx模型；

安装转换的工具：

pip install tf2onnx

安装完后运行：

python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14

tip:这里设置 opset 版本为14 的优化效率目前亲测是最好的，推理速度比版本 11 、12更快。

运行完以后在终端最后会告诉你网络模型的输入和输出：

2022-01-21 15:48:00,766 - INFO - 
2022-01-21 15:48:00,766 - INFO - Successfully converted TensorFlow model kerasTempModel to ONNX
2022-01-21 15:48:00,766 - INFO - Model inputs: ['input1', 'input2']
2022-01-21 15:48:00,766 - INFO - Model outputs: ['Output']
2022-01-21 15:48:00,766 - INFO - ONNX model is saved at model.onnx

模型有两个输入，输入节点名分别为['input1', 'input2']，输出节点名为['Output']。

当然也可以不用具体知道节点名，在onnxruntime中可以通过打印来查看模型的输入输出节点名。

2.1、数据集的处理

数据集的格式：

# image_ref  image_cur  tx  ty  tz  roll(x) pitch(y) yaw(z)
0 images/0.jpg images/1.jpg 0.000999509 -0.00102794 0.00987293 0.00473228 -0.0160252 -0.0222079 
1 images/1.jpg images/2.jpg -0.00544488 -0.00282174 0.00828871 -0.00271557 -0.00770117 -0.0195182 
2 images/2.jpg images/3.jpg -0.0074375 -0.00368121 0.0114751 -0.00721246 -0.0103843 -0.0171883 
3 images/3.jpg images/4.jpg -0.00238111 -0.00371362 0.0120466 -0.0081171 -0.0149111 -0.0198595 
4 images/4.jpg images/5.jpg 0.000965841 -0.00520437 0.0135452 -0.0141721 -0.0126401 -0.0182697 
5 images/5.jpg images/6.jpg -0.00295753 -0.00340146 0.0144557 -0.013633 -0.00463747 -0.0143332

通过load_image映射数据集到 TF tensor

class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim = 512
        self.epochs = 40
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 

        self.GetTheImagesAndPose()
        self.buildTrainData()

    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")

    def load_image(self,index:tf.Tensor):

        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.cast(img_ref,tf.float32)

        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        
        pose = tf.strings.to_number(index[2:8],tf.float32)
        
        return (img_ref,img_cur),(pose)
        
    def buildTrainData(self):
        '''
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        '''
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
           
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)

2.2、网络模型的搭建

简单的模型

# 模型
def model(dim):
    First = K.layers.Input(shape=(dim,dim,3),name="input1")
    Second = K.layers.Input(shape=(dim,dim,3),name="input2")

    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1)

    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2)

    x = K.layers.concatenate([x1,x2])
    x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x)
    x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.Flatten()(x)
    x = K.layers.Dense(1024)(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)

    return poseModel
#损失函数
def loss_fn(y_true,y_pre):
    loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    return loss_value

# 学习率下降回调函数
class learningDecay(K.callbacks.Callback):
    def __init__(self,schedule=None,alpha=1,verbose=0):
        super().__init__()
        self.schedule = schedule
        self.verbose = verbose
        self.alpha = alpha
    def on_epoch_begin(self, epoch, logs=None):
        lr = float(K.backend.get_value(self.model.optimizer.lr))
        if self.schedule != None:
            lr = self.schedule(epoch,lr)
        else:
            if epoch != 0:
                lr = lr*self.alpha
        K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
        if self.verbose > 0:
            print(f"Current learning rate is {lr}")
# 学习率计划
def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

resnet34为主骨

#-------resnet 34-------------
def conv_block(inputs, 
        neuron_num, 
        kernel_size,  
        use_bias, 
        padding= 'same',
        strides= (1, 1),
        with_conv_short_cut = False):
    conv1 = K.layers.Conv2D(
        neuron_num,
        kernel_size = kernel_size,
        activation= 'relu',
        strides= strides,
        use_bias= use_bias,
        padding= padding
    )(inputs)
    conv1 = K.layers.BatchNormalization(axis = 1)(conv1)

    conv2 = K.layers.Conv2D(
        neuron_num,
        kernel_size= kernel_size,
        activation= 'relu',
        use_bias= use_bias,
        padding= padding)(conv1)
    conv2 = K.layers.BatchNormalization(axis = 1)(conv2)

    if with_conv_short_cut:
        inputs = K.layers.Conv2D(
            neuron_num, 
            kernel_size= kernel_size,
            strides= strides,
            use_bias= use_bias,
            padding= padding
            )(inputs)
        return K.layers.add([inputs, conv2])

    else:
        return K.layers.add([inputs, conv2])

def ResNet34(inputs,namescope = ""):
    x = K.layers.ZeroPadding2D((3, 3))(inputs)

    # Define the converlutional block 1
    x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
    x = K.layers.BatchNormalization(axis= 1)(x)
    x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)

    # Define the converlutional block 2
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)

    # Define the converlutional block 3
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)

    # Define the converlutional block 4
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)

    # Define the converltional block 5
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
    return x


def model(dim_w,dim_h):
    First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
    Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")

    # x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.LeakyReLU()(x1)
    # x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    # x1 = K.layers.BatchNormalization()(x1)
    # x1 = K.layers.ReLU()(x1)
    x1 = ResNet34(x1,"x1")

    # x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.LeakyReLU()(x2)
    # x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    # x2 = K.layers.BatchNormalization()(x2)
    # x2 = K.layers.ReLU()(x2)
    x2 = ResNet34(x2,"x2")

    x = K.layers.concatenate([x1,x2])

    x = K.layers.Flatten()(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)

    return poseModel

def loss_fn(y_true,y_pre):
    loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
    loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
    loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)

    # loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    # tf.print(y_pre)
    return loss_value

2.3、模型的训练

build()函数用来编译模型，建立回调函数；

train_fit() 函数是利用keras 的 fit函数来训练；

train_gradient() 函数是利用 apply_gradients函数训练，可以实时对loss和梯度进行监控，灵活性强；

save_model() 函数是保存模型的函数，可以有多种保存的方式，利用model.save() 可以保存为h5文件和tf格式的模型。

class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()

    def build(self):
        self.poseModel = model(self.dataset.dim)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]

        try:
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
            pass

    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
            # 测试
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = loss_fn(y,prediction)
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))

    def save_model(self):
        '''
        利用 save 函数来保存，可以保存为h5文件，也可以保存为文件夹的形式，推荐保存第二种，再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        '''
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重，没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了

主函数：

if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()

2.4、加载保存的模型

保存的模型文件夹为： kerasTempModel\

model = K.models.load_model(dataset.model_path,compile=False)

测试模型：

output = model([img_ref,img_cur])

2.5、完整代码

import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks

class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim = 512
        self.epochs = 40
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 

        self.GetTheImagesAndPose()
        self.buildTrainData()

    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")

    def load_image(self,index:tf.Tensor):

        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        img_ref = tf.image.resize(img_ref,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.cast(img_ref,tf.float32)

        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim,self.dim))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        pose = tf.strings.to_number(index[2:8],tf.float32)
        return (img_ref,img_cur),(pose)

    def buildTrainData(self):
        '''
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        '''
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)

def model(dim):
    First = K.layers.Input(shape=(dim,dim,3),name="input1")
    Second = K.layers.Input(shape=(dim,dim,3),name="input2")

    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.ReLU()(x1)
    x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x1)

    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(512,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.ReLU()(x2)
    x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x2)

    x = K.layers.concatenate([x1,x2])
    x = K.layers.Conv2D(256,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(x)
    x = K.layers.Conv2D(128,kernel_size=(3,3), strides=1,padding='same',
                        activation='relu')(x)
    x = K.layers.BatchNormalization()(x)
    x = K.layers.ReLU()(x)
    x = K.layers.Flatten()(x)
    x = K.layers.Dense(1024)(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)

    return poseModel

def loss_fn(y_true,y_pre):
    loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    return loss_value


class learningDecay(K.callbacks.Callback):
    def __init__(self,schedule=None,alpha=1,verbose=0):
        super().__init__()
        self.schedule = schedule
        self.verbose = verbose
        self.alpha = alpha
    def on_epoch_begin(self, epoch, logs=None):
        lr = float(K.backend.get_value(self.model.optimizer.lr))
        if self.schedule != None:
            lr = self.schedule(epoch,lr)
        else:
            if epoch != 0:
                lr = lr*self.alpha
        K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
        if self.verbose > 0:
            print(f"Current learning rate is {lr}")

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1) 

class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()

    def build(self):
        self.poseModel = model(self.dataset.dim)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]

        try:
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
            pass

    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))
            # 测试
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = loss_fn(y,prediction)
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss),lr,val_loss))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))

    def save_model(self):
        '''
        利用 save 函数来保存，可以保存为h5文件，也可以保存为文件夹的形式，推荐保存第二种，再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        '''
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重，没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了

if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()

改进：

import argparse
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
import cv2 as cv
import os
import time
import sys
import random
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks
from tensorflow.python.keras.saving.save import save_model

class datasets:
    def __init__(self, datasetsPath:str):
        self.dataPath = datasetsPath
        self.dim_w = 512
        self.dim_h = 512
        self.epochs = 200
        self.batch_size = 8
        self.train_percent = 0.92
        self.learning_rate = 2e-4
        self.model_path = 'kerasTempModel/'
        self.posetxt = os.path.join(self.dataPath,'pose.txt') 

        self.GetTheImagesAndPose()
        self.buildTrainData()

    def GetTheImagesAndPose(self):
        self.poselist = []
        with open(self.posetxt,'r') as f:
            for line in f.readlines():
                line = line.strip()
                line = line.split(' ')
                line.remove(line[0])
                self.poselist.append(line)
                # im1 im2 tx,ty,tz,roll,pitch,yaw
        #打乱数据集
        length = np.shape(self.poselist)[0]
        train_num =int(length * self.train_percent) 
        test_num = length - train_num
        randomPoses = np.array(random.sample(self.poselist,length)) #取出所有数据集
        self.train_pose_list = randomPoses[0:train_num,:]
        self.test_pose_list = randomPoses[train_num:length+1,:]
        print(f"The size of train pose list is : {np.shape(self.train_pose_list)[0]}")
        print(f"The size of test pose list is : {np.shape(self.test_pose_list)[0]}")

    def load_image(self,index:tf.Tensor):

        img_ref= img_ref = tf.io.read_file(index[0])
        img_ref = tf.image.decode_jpeg(img_ref) #此处为jpeg格式
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_ref = tf.image.resize(img_ref,(self.dim_w,self.dim_h))/255.0
        img_ref = tf.cast(img_ref,tf.float32)

        img_cur = img_cur = tf.io.read_file(index[1])
        img_cur = tf.image.decode_jpeg(img_cur) #此处为jpeg格式
        img_cur = tf.image.resize(img_cur,(self.dim_w,self.dim_h))/255.0
        #img = tf.reshape(img,[self.dim,self.dim,3])
        img_cur = tf.cast(img_cur,tf.float32)
        pose = tf.strings.to_number(index[2:8],tf.float32)
        return (img_ref,img_cur),(pose)

    def buildTrainData(self):
        '''
        for example:\\
        >>> poses = dataset.y_train.take(20)\\
        >>> imgs = dataset.x1_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> imgs = dataset.x2_train.take(40)\\
        >>> print(np.array(list(imgs.as_numpy_iterator()))[39]) \\
        >>> print(np.array(list(poses.as_numpy_iterator()))[19]) \\
        '''
        self.traindata = tf.data.Dataset.from_tensor_slices(self.train_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)#.cache() 
        self.testdata = tf.data.Dataset.from_tensor_slices(self.test_pose_list) \
           .map(self.load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE) \
           .shuffle(500)\
           .repeat(10)\
           .batch(self.batch_size) \
           .prefetch(tf.data.experimental.AUTOTUNE)
#-------resnet 34-------------
def conv_block(inputs, 
        neuron_num, 
        kernel_size,  
        use_bias, 
        padding= 'same',
        strides= (1, 1),
        with_conv_short_cut = False):
    conv1 = K.layers.Conv2D(
        neuron_num,
        kernel_size = kernel_size,
        activation= 'relu',
        strides= strides,
        use_bias= use_bias,
        padding= padding
    )(inputs)
    conv1 = K.layers.BatchNormalization(axis = 1)(conv1)

    conv2 = K.layers.Conv2D(
        neuron_num,
        kernel_size= kernel_size,
        activation= 'relu',
        use_bias= use_bias,
        padding= padding)(conv1)
    conv2 = K.layers.BatchNormalization(axis = 1)(conv2)

    if with_conv_short_cut:
        inputs = K.layers.Conv2D(
            neuron_num, 
            kernel_size= kernel_size,
            strides= strides,
            use_bias= use_bias,
            padding= padding
            )(inputs)
        return K.layers.add([inputs, conv2])

    else:
        return K.layers.add([inputs, conv2])

def ResNet34(inputs,namescope = ""):
    x = K.layers.ZeroPadding2D((3, 3))(inputs)

    # Define the converlutional block 1
    x = K.layers.Conv2D(64, kernel_size= (7, 7), strides= (2, 2), padding= 'valid')(x)
    x = K.layers.BatchNormalization(axis= 1)(x)
    x = K.layers.MaxPooling2D(pool_size= (3, 3), strides= (2, 2), padding= 'same')(x)

    # Define the converlutional block 2
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 64, kernel_size= (3, 3), use_bias= True)

    # Define the converlutional block 3
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 128, kernel_size= (3, 3), use_bias= True)

    # Define the converlutional block 4
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 256, kernel_size= (3, 3), use_bias= True)

    # Define the converltional block 5
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True, strides= (2, 2), with_conv_short_cut= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = conv_block(x, neuron_num= 512, kernel_size= (3, 3), use_bias= True)
    x = K.layers.AveragePooling2D(pool_size=(7, 7))(x)
    return x


def model(dim_w,dim_h):
    First = K.layers.Input(shape=(dim_w,dim_h,3),name="input1")
    Second = K.layers.Input(shape=(dim_w,dim_h,3),name="input2")

    # x1 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(First)
    x1 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(First)
    x1 = K.layers.BatchNormalization()(x1)
    x1 = K.layers.LeakyReLU()(x1)
    # x1 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x1)
    # x1 = K.layers.BatchNormalization()(x1)
    # x1 = K.layers.ReLU()(x1)
    x1 = ResNet34(x1,"x1")

    # x2 = K.layers.MaxPool2D(pool_size=(2,2),strides=2)(Second)
    x2 = K.layers.Conv2D(128,kernel_size=(3,3), strides=2,padding='same')(Second)
    x2 = K.layers.BatchNormalization()(x2)
    x2 = K.layers.LeakyReLU()(x2)
    # x2 = K.layers.Conv2D(256,kernel_size=(3,3), strides=2,padding='same')(x2)
    # x2 = K.layers.BatchNormalization()(x2)
    # x2 = K.layers.ReLU()(x2)
    x2 = ResNet34(x2,"x2")

    x = K.layers.concatenate([x1,x2])

    x = K.layers.Flatten()(x)
    x = K.layers.Dense(6,name='Output')(x)
    poseModel = K.Model([First,Second],x)

    return poseModel

def loss_fn(y_true,y_pre):
    loss_value_translation = K.backend.square(y_true[-1,0:3]-y_pre[-1,0:3])
    loss_value_rotation = 1/5.7*K.backend.square(y_true[-1,3:6]-y_pre[-1,3:6])
    loss_value = K.backend.mean(loss_value_translation + loss_value_rotation)

    # loss_value = K.backend.mean(K.backend.square(y_true-y_pre))
    # tf.print(y_pre)
    return loss_value


class learningDecay(K.callbacks.Callback):
    def __init__(self,schedule=None,alpha=1,verbose=0):
        super().__init__()
        self.schedule = schedule
        self.verbose = verbose
        self.alpha = alpha
    def on_epoch_begin(self, epoch, logs=None):
        lr = float(K.backend.get_value(self.model.optimizer.lr))
        if self.schedule != None:
            lr = self.schedule(epoch,lr)
        else:
            if epoch >= 30:
                lr = lr*self.alpha
        K.backend.set_value(self.model.optimizer.lr,K.backend.get_value(lr))
        if self.verbose > 0:
            print(f"Current learning rate is {lr}")
        #save the model
        if epoch % 20 == 0 and epoch != 0:
            self.model.save("model.h5")

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1) 

class Posenet:
    def __init__(self,dataset:datasets):
        self.dataset = dataset
        self.build()

    def build(self):
        self.poseModel = model(self.dataset.dim_w,self.dataset.dim_h)
        self.poseModel.summary()
        self.optm = K.optimizers.RMSprop(1e-4,momentum=0.9) #,decay=1e-5/self.dataset.epochs
        # self.optm = K.optimizers.Adam(1e-4)
        self.decayCallback = learningDecay(schedule = None,alpha = 0.99,verbose = 1)
        decayCallbackScheduler = K.callbacks.LearningRateScheduler(scheduler)
        self.callbacks = [decayCallbackScheduler]

        try:
            print("************************loading the model weights***********************************")
            self.poseModel.load_weights("model.h5")
        except:
            pass

    def train_fit(self):
        self.poseModel.compile(optimizer=self.optm,loss=loss_fn,metrics=['accuracy'])
        self.poseModel.fit(self.dataset.traindata,
                            validation_data=self.dataset.testdata,
                            epochs=self.dataset.epochs,
                            callbacks=[self.decayCallback],
                            verbose=1)
    
    def train_gradient(self):
        for step in range(self.dataset.epochs):
            loss = 0
            val_loss = 0
            index = 0
            lr = float(self.optm.lr)
            tf.print(">>> [Epoch is %s/%s]"%(step,self.dataset.epochs))
            for (x1,x2),y in self.dataset.traindata:
                with tf.GradientTape() as tape:
                    prediction = self.poseModel([x1,x2])
                    # y = tf.cast(y,dtype=prediction.dtype)
                    loss = loss + loss_fn(y,prediction)
                gradients = tape.gradient(loss,self.poseModel.trainable_variables)
                self.optm.apply_gradients(zip(gradients,self.poseModel.trainable_variables))

                index = index + 1
                sys.stdout.write('--------train loss is %.5f-----'%(loss/float(index)))
                sys.stdout.write('\r')
                sys.stdout.flush()

            index_val = 0
            # 测试
            for (x1,x2),y in self.dataset.testdata:
                prediction = self.poseModel([x1,x2])
                val_loss = val_loss + loss_fn(y,prediction)
                index_val = index_val + 1
            tf.print("The loss is %s,the learning rate is : %s, test loss is %s]"%(np.array(loss/float(index)),lr,val_loss/float(index_val)))
            K.backend.set_value(self.optm.lr,K.backend.get_value(lr*0.99))
            if step%40==0:
                self.save_model()

    def save_model(self):
        '''
        利用 save 函数来保存，可以保存为h5文件，也可以保存为文件夹的形式，推荐保存第二种，再使用tf2onnx转onnx
        >>> python -m tf2onnx.convert --saved-model kerasTempModel --output "model.onnx" --opset 14
        '''
        self.poseModel.save("model.h5")
        self.poseModel.save(self.dataset.model_path)
        # self.poseModel.save_weights("model.h5") #只保存权重，没有保存结构
        # tf.saved_model.save(self.poseModel,'tf2TempModel') #这种保存方式不再使用了

        
def test(dataset):
    im1 = cv.imread("imagesNDI/0.jpg")
    im1 = cv.resize(im1,(512,512))
    im1 = np.array(im1,np.float).reshape((1,512,512,3))/255.0
    im2 = cv.imread("imagesNDI/1.jpg")
    im2 = cv.resize(im2,(512,512))
    im2 = np.array(im2,np.float).reshape((1,512,512,3))/255.0
    posemodel = K.models.load_model(dataset.model_path,compile=False)
    pose = posemodel([im1,im2])
    print(np.array(pose))

if __name__ == "__main__":
    dataset = datasets("images")
    posenet = Posenet(dataset)
    posenet.train_fit()
    # posenet.train_gradient() #利用 apply_gradient的方式训练
    posenet.save_model()
    test(dataset)

二、配置ONNXruntime

CMakeLists.txt:
首先需要设置你的ONNXRUNTIME 的安装位置：

#******onnxruntime*****
set(ONNXRUNTIME_ROOT_PATH /path to your onnxruntime-master)
set(ONNXRUNTIME_INCLUDE_DIRS ${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime
                             ${ONNXRUNTIME_ROOT_PATH}/onnxruntime
                             ${ONNXRUNTIME_ROOT_PATH}/include/onnxruntime/core/session/)
set(ONNXRUNTIME_LIB ${ONNXRUNTIME_ROOT_PATH}/build/Linux/Release/libonnxruntime.so)

C++ main.cpp中
头文件：

#include <core/session/onnxruntime_cxx_api.h>
#include <core/providers/cuda/cuda_provider_factory.h>
#include <core/session/onnxruntime_c_api.h>
#include <core/providers/tensorrt/tensorrt_provider_factory.h>

三、模型推理流程

总体来看，整个ONNXRuntime的运行可以分为三个阶段：

Session构造；
模型加载与初始化；
运行；

1、第1阶段：Session构造

构造阶段即创建一个InferenceSession对象。在python前端构建Session对象时，python端会通过http://onnxruntime_pybind_state.cc调用C++中的InferenceSession类构造函数，得到一个InferenceSession对象。

InferenceSession构造阶段会进行各个成员的初始化，成员包括负责OpKernel管理的KernelRegistryManager对象，持有Session配置信息的SessionOptions对象，负责图分割的GraphTransformerManager，负责log管理的LoggingManager等。当然，这个时候InferenceSession就是一个空壳子，只完成了对成员对象的初始构建。

2、第2阶段：模型加载与初始化

在完成InferenceSession对象的构造后，会将onnx模型加载到InferenceSession中并进行进一步的初始化。

2.1. 模型加载

模型加载时，会在C++后端会调用对应的Load()函数，InferenceSession一共提供了8种Load函数。包读从url，ModelProto，void* model data，model istream等读取ModelProto。InferenceSession会对ModelProto进行解析然后持有其对应的Model成员。

2.2. Providers注册

在Load函数结束后，InferenceSession会调用两个函数：RegisterExecutionProviders()和sess->Initialize();

RegisterExecutionProviders函数会完成ExecutionProvider的注册工作。这里解释一下ExecutionProvider，ONNXRuntime用Provider表示不同的运行设备比如CUDAProvider等。目前ONNXRuntimev1.0支持了包括CPU，CUDA，TensorRT，MKL等七种Providers。通过调用sess->RegisterExecutionProvider()函数，InferenceSession通过一个list持有当前运行环境中支持的ExecutionProviders。

2.3. InferenceSession初始化

即sess->Initialize()，这时InferenceSession会根据自身持有的model和execution providers进行进一步的初始化（在第一阶段Session构造时仅仅持有了空壳子成员变量）。该步骤是InferenceSession初始化的核心，一系列核心操作如内存分配，model partition，kernel注册等都会在这个阶段完成。

首先，session会根据level注册 graph optimization transformers，并通过GraphTransformerManager成员进行持有。
接下来session会进行OpKernel注册，OpKernel即定义的各个node对应在不同运行设备上的计算逻辑。这个过程会将持有的各个ExecutionProvider上定义的所有node对应的Kernel注册到session中，session通过KernelRegistryManager成员进行持有和管理。
然后session会对Graph进行图变换，包括插入copy节点，cast节点等。
接下来是model partition，也就是根运行设备对graph进行切分，决定每个node运行在哪个provider上。
最后，为每个node创建ExecutePlan，运行计划主要包含了各个op的执行顺序，内存申请管理，内存复用管理等操作。

3、第3阶段：模型运行

模型运行即InferenceSession每次读入一个batch的数据并进行计算得到模型的最终输出。然而其实绝大多数的工作早已经在InferenceSession初始化阶段完成。细看下源码就会发现run阶段主要是顺序调用各个node的对应OpKernel进行计算。

四、模型的部署

和其他所有主流框架相同，ONNXRuntime最常用的语言是python，而实际负责执行框架运行的则是C++。

下面就是C++通过onnxruntime对.onnx模型的使用，参考官方样例和常见问题写的模型多输入多输出的情况，部分参数可以参考样例或者查官方API文档。

1. 模型的初始化设置

	//模型位置
    string model_path = "../model.onnx";
	//初始化设置ONNXRUNTIME 的环境
    Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
    Ort::SessionOptions session_options;
    //TensorRT加速开启，CUDA加速开启
    OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
    OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
    Ort::AllocatorWithDefaultOptions allocator;
    //加载ONNX模型
    Ort::Session session(env, model_path.c_str(), session_options);
    Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);

打印模型信息：printModelInfo函数

void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
{
    //输出模型输入节点的数量
    size_t num_input_nodes = session.GetInputCount();
    size_t num_output_nodes = session.GetOutputCount();
    cout<<"Number of input node is:"<<num_input_nodes<<endl;
    cout<<"Number of output node is:"<<num_output_nodes<<endl;

    //获取输入输出维度
    for(auto i = 0; i<num_input_nodes;i++)
    {
        std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"input "<<i<<" dim is: ";
        for(auto j=0; j<input_dims.size();j++)
            cout<<input_dims[j]<<" ";
    }
    for(auto i = 0; i<num_output_nodes;i++)
    {
        std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"output "<<i<<" dim is: ";
        for(auto j=0; j<output_dims.size();j++)
            cout<<output_dims[j]<<" ";
    }
    //输入输出的节点名
    cout<<endl;//换行输出
    for(auto i = 0; i<num_input_nodes;i++)
        cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
    for(auto i = 0; i<num_output_nodes;i++)
        cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;

    //input_dims_2[0] = input_dims_1[0] = output_dims[0] = 1;//batch size = 1
}

函数应用：

//打印模型的信息
printModelInfo(session,allocator);

输出结果：

Number of input node is:2
Number of output node is:1

input 0 dim is: -1 512 512 3 
input 1 dim is: -1 512 512 3 
output 0 dim is: -1 6 
The input op-name 0 is:input1
The input op-name 1 is:input2
The output op-name 0 is:Output

如果事先不知道网络，通过打印信息这时候就可以定义全局变量：

//输入网络的维度
static constexpr const int width = 512;
static constexpr const int height = 512;
static constexpr const int channel = 3;
std::array<int64_t, 4> input_shape_{ 1,height, width,channel};

2、构建推理

2.1 构建推理函数computPoseDNN()步骤

对应用Opencv输入的Mat图像进行resize

    Mat Input_1,Input_2;
    resize(img_1,Input_1,Size(512,512));
    resize(img_2,Input_2,Size(512,512));

指定input和output的节点名，当然也可以定义在全局变量中，这里为了方便置入函数中

    std::vector<const char*> input_node_names = {"input1","input2"};
    std::vector<const char*> output_node_names = {"Output"};

分配image_ref和image_cur的内存，用指针数组存储，这里长度为 512 * 512 * 3，因为不能直接把Mat矩阵输入，所以需要数组来存储图像数据，然后再转ONNXRUNTIME专有的tensor类型即可：

    std::array<float, width * height *channel> input_image_1{};
    std::array<float, width * height *channel> input_image_2{};
    float* input_1 =  input_image_1.data();
    float* input_2 =  input_image_2.data();

这里float类型根据自己网络需要来，也有可能是double, 可以利用下面的代码输出网络类型：

cout<<session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetElementType();

上面的c++代码会输出索引，对应下面的数据类型：

typedef enum ONNXTensorElementDataType {
  ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED,
  ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,   // maps to c type float
  ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8,   // maps to c type uint8_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8,    // maps to c type int8_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16,  // maps to c type uint16_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16,   // maps to c type int16_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32,   // maps to c type int32_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64,   // maps to c type int64_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING,  // maps to c++ type std::string
  ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL,
  ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT16,
  ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE,      // maps to c type double
  ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32,      // maps to c type uint32_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64,      // maps to c type uint64_t
  ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX64,   // complex with float32 real and imaginary components
  ONNX_TENSOR_ELEMENT_DATA_TYPE_COMPLEX128,  // complex with float64 real and imaginary components
  ONNX_TENSOR_ELEMENT_DATA_TYPE_BFLOAT16     // Non-IEEE floating-point format based on IEEE754 single-precision
} ONNXTensorElementDataType;

例如，如果cout 输出 1，那么网络输出类型就是 float;

利用循环对float的数组进行赋值：这里可以是 CHW 或者 HWC 的格式：你在训练中很可能对数据进行了归一化处理，比如除以了255.0，这里数据还原就需要除以255.0

    for (int i = 0; i < Input_1.rows; i++) {
        for (int j = 0; j < Input_1.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                //NCHW 格式
//                if (c == 0)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
//                if (c == 1)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
//                if (c == 2)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;


            }
        }
    }
    for (int i = 0; i < Input_2.rows; i++) {
        for (int j = 0; j < Input_2.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
            }
        }
    }

这里由于不同网络可能有多个输入节点和多个输出节点，这里需要用std::vector来定义Ort 的tensor；利用两个输入数据创建两个tensor：

    std::vector<Ort::Value> input_tensors;
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));

其中 input_shape_就是输入的维度：

std::array<int64_t, 4> input_shape_{ 1,512, 512,3};

前向推理：同样定义输出的tensor也为 vector，保证通用性

    std::vector<Ort::Value> output_tensors;

    output_tensors = session.Run(Ort::RunOptions { nullptr },
                                    input_node_names.data(), //输入节点名
                                    input_tensors.data(),     //input tensors
                                    input_tensors.size(),     //2
                                    output_node_names.data(), //输出节点名
                                    output_node_names.size()); //1

输出结果获取：由于本例输出只有一个维度，所以只需要 output_tensors[0]即可取出结果：

float* output = output_tensors[0].GetTensorMutableData<float>();

之后再进行位姿重构：

    Eigen::Vector3d t(output[0],output[1],output[2]);
    Eigen::Vector3d r(output[3],output[4],output[5]);

    // 初始化旋转向量
    Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
    Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
    Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
    // 转换为旋转矩阵，x y z的顺式
    Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();
    return Sophus::SE3(R_matrix_xyz,t);

2.2 函数具体代码

Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
{
    Mat Input_1,Input_2;
    resize(img_1,Input_1,Size(512,512));
    resize(img_2,Input_2,Size(512,512));
    
    std::vector<const char*> input_node_names = {"input1","input2"};
    std::vector<const char*> output_node_names = {"Output"};

    //将图像存储到数组中，BGR--->RGB
    std::array<float, width * height *channel> input_image_1{};
    std::array<float, width * height *channel> input_image_2{};

    float* input_1 =  input_image_1.data();
    float* input_2 =  input_image_2.data();

    for (int i = 0; i < Input_1.rows; i++) {
        for (int j = 0; j < Input_1.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                //NCHW 格式
//                if (c == 0)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
//                if (c == 1)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
//                if (c == 2)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;


            }
        }
    }
    for (int i = 0; i < Input_2.rows; i++) {
        for (int j = 0; j < Input_2.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
            }
        }
    }

    std::vector<Ort::Value> input_tensors;
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));

    std::vector<Ort::Value> output_tensors;

    output_tensors = session.Run(Ort::RunOptions { nullptr },
                                    input_node_names.data(), //输入节点名
                                    input_tensors.data(),     //input tensors
                                    input_tensors.size(),     //2
                                    output_node_names.data(), //输出节点名
                                    output_node_names.size()); //1

//    cout<<output_tensors.size()<<endl;//输出的维度
    float* output = output_tensors[0].GetTensorMutableData<float>();
    Eigen::Vector3d t(output[0],output[1],output[2]);
    Eigen::Vector3d r(output[3],output[4],output[5]);

    // 初始化旋转向量
    Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
    Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
    Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
    // 转换为旋转矩阵
    Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();

    return Sophus::SE3(R_matrix_xyz,t);

五、示例应用

#include <core/session/onnxruntime_cxx_api.h>
#include <core/providers/cuda/cuda_provider_factory.h>
#include <core/session/onnxruntime_c_api.h>
#include <core/providers/tensorrt/tensorrt_provider_factory.h>

#include <opencv2/opencv.hpp>
#include <sophus/se3.h>

#include <iostream>

Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session, Ort::MemoryInfo &memory_info);

//输入网络的维度
static constexpr const int width = 512;
static constexpr const int height = 512;
static constexpr const int channel = 3;
std::array<int64_t, 4> input_shape_{ 1,height, width,channel};

using namespace cv;
using namespace std;
int main()
{
	//模型位置
    string model_path = "../model.onnx";

    Ort::Env env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "PoseEstimate");
    Ort::SessionOptions session_options;
    //CUDA加速开启
    OrtSessionOptionsAppendExecutionProvider_Tensorrt(session_options, 0); //tensorRT
    OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
    Ort::AllocatorWithDefaultOptions allocator;
    //加载ONNX模型
    Ort::Session session(env, model_path.c_str(), session_options);
    Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    //打印模型的信息
    printModelInfo(session,allocator);

	Mat img_1 = imread("/path_to_your_img1",IMREAD_COLOR);
    Mat img_2 = imread("/path_to_your_img2",IMREAD_COLOR);
    Sophus::SE3 pose = computePoseDNN(img_1,img_2,session,memory_info);
    
}
Sophus::SE3 computePoseDNN(Mat img_1, Mat img_2, Ort::Session &session,Ort::MemoryInfo &memory_info)
{
    Mat Input_1,Input_2;
    resize(img_1,Input_1,Size(512,512));
    resize(img_2,Input_2,Size(512,512));
    std::vector<const char*> input_node_names = {"input1","input2"};
    std::vector<const char*> output_node_names = {"Output"};

    //将图像存储到uchar数组中，BGR--->RGB
    std::array<float, width * height *channel> input_image_1{};
    std::array<float, width * height *channel> input_image_2{};

    float* input_1 =  input_image_1.data();
    float* input_2 =  input_image_2.data();

    for (int i = 0; i < Input_1.rows; i++) {
        for (int j = 0; j < Input_1.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_1[i*Input_1.cols*3+j*3+c] = Input_1.ptr<uchar>(i)[j*3+0]/255.0;
                //NCHW 格式
//                if (c == 0)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 2]/255.0;
//                if (c == 1)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 1]/255.0;
//                if (c == 2)
//                     input_1[c*imgSource.rows*imgSource.cols + i * imgSource.cols + j] = imgSource.ptr<uchar>(i)[j * 3 + 0]/255.0;


            }
        }
    }
    for (int i = 0; i < Input_2.rows; i++) {
        for (int j = 0; j < Input_2.cols; j++) {
            for (int c = 0; c < 3; c++)
            {
                //NHWC 格式
                if(c==0)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+2]/255.0;
                if(c==1)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+1]/255.0;
                if(c==2)
                    input_2[i*Input_2.cols*3+j*3+c] = Input_2.ptr<uchar>(i)[j*3+0]/255.0;
            }
        }
    }

    std::vector<Ort::Value> input_tensors;
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_1, input_image_1.size(), input_shape_.data(), input_shape_.size()));
    input_tensors.push_back(Ort::Value::CreateTensor<float>(
            memory_info, input_2, input_image_2.size(), input_shape_.data(), input_shape_.size()));
            
    std::vector<Ort::Value> output_tensors;
    output_tensors = session.Run(Ort::RunOptions { nullptr },
                                    input_node_names.data(), //输入节点名
                                    input_tensors.data(),     //input tensors
                                    input_tensors.size(),     //2
                                    output_node_names.data(), //输出节点名
                                    output_node_names.size()); //1

//    cout<<output_tensors.size()<<endl;//输出的维度
    float* output = output_tensors[0].GetTensorMutableData<float>();
    Eigen::Vector3d t(output[0],output[1],output[2]);
    Eigen::Vector3d r(output[3],output[4],output[5]);

    // 初始化旋转向量，绕z轴旋转，y轴，x轴；
    Eigen::AngleAxisd R_z(r[2], Eigen::Vector3d(0,0,1));
    Eigen::AngleAxisd R_y(r[1], Eigen::Vector3d(0,1,0));
    Eigen::AngleAxisd R_x(r[0], Eigen::Vector3d(1,0,0));
    // 转换为旋转矩阵
    Eigen::Matrix3d R_matrix_xyz  = R_z.toRotationMatrix()*R_y.toRotationMatrix()*R_x.toRotationMatrix();

    return Sophus::SE3(R_matrix_xyz,t);
}

void printModelInfo(Ort::Session &session, Ort::AllocatorWithDefaultOptions &allocator)
{
    //输出模型输入节点的数量
    size_t num_input_nodes = session.GetInputCount();
    size_t num_output_nodes = session.GetOutputCount();
    cout<<"Number of input node is:"<<num_input_nodes<<endl;
    cout<<"Number of output node is:"<<num_output_nodes<<endl;

    //获取输入输出维度
    for(auto i = 0; i<num_input_nodes;i++)
    {
        std::vector<int64_t> input_dims = session.GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"input "<<i<<" dim is: ";
        for(auto j=0; j<input_dims.size();j++)
            cout<<input_dims[j]<<" ";
    }
    for(auto i = 0; i<num_output_nodes;i++)
    {
        std::vector<int64_t> output_dims = session.GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
        cout<<endl<<"output "<<i<<" dim is: ";
        for(auto j=0; j<output_dims.size();j++)
            cout<<output_dims[j]<<" ";
    }
    //输入输出的节点名
    cout<<endl;//换行输出
    for(auto i = 0; i<num_input_nodes;i++)
        cout<<"The input op-name "<<i<<" is:"<<session.GetInputName(i, allocator)<<endl;
    for(auto i = 0; i<num_output_nodes;i++)
        cout<<"The output op-name "<<i<<" is:"<<session.GetOutputName(i, allocator)<<endl;
}

C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客_c++ onnxruntime

tensorflow keras 搭建相机位姿估计网络--例_机器人学渣的博客-CSDN博客_位姿估计网络

C++ 上用 ONNXruntime 部署自己的模型_机器人学渣的博客-CSDN博客

ONNXRuntime学习笔记(四) - Lee-zq - 博客园

onnxruntime c++ 代码搜集_落花逐流水的博客-CSDN博客

onnxruntime调用AI模型的python和C++编程_Arnold-FY-Chen的博客-CSDN博客