前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >终于把TensorRT的engine模型的结构图画出来了!

终于把TensorRT的engine模型的结构图画出来了!

作者头像
老潘
发布2023-10-19 11:07:28
4720
发布2023-10-19 11:07:28
举报
文章被收录于专栏:深度学习那些事儿

终于把TensorRT的engine模型的结构图画出来了!

大概长这样(截取了最终模型图的输入部分),仔细看看:

可以看到很多层被融合了,比如conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11这个部分。也有没有被融合的,比如MaxPool_12。另外QuantizeLinear这个量化算子,可能有些童鞋没有见过,大家可以把它当做一个层就可以。

可以看到上面这个模型输入是Float而输出是Int8。这个模型是由TensorRT官方提供的pytorch-quantization工具对Pytorch模型进行量化后导出ONNX,然后再由TensorRT-8转化得到的engine,这个engine的精度是INT8。

PS:关于TensorRT的量化细节,老潘后续文章会陆续讲,不着急哈。

TensorRT的优化

众所周知,TensorRT会对模型做很多的优化,比如前后层融合(CONV+BN+RELU)、比如水平层融合、又比如去掉concat直接操作等等:

更多的细节可以看我之前的文章《内卷成啥了还不知道TensorRT?超详细入门指北,来看看吧!》回忆一下。

总之,通过TensorRT优化后的模型,基本已经“面目全非”了,TensorRT支持很多层的融合,你的模型扔给TensorRT再出来,会发现很多层都被合体了。当然这样做的目的是为了优化访存,减少数据在每层之间传输的消耗。

不过,这样做并不都没毛病,有时候会有奇奇怪怪的BUG。我们需要注意。

被合体之后的模型,我们一般无法通过Netron来读取查看,毕竟TensorRT是闭源的,其生成的engine结构之复杂,只靠猜是不行的。不过TensorRT知道其闭源的缺点,为我们引入了log接口,如果我们想看到融合后的模型长什么样,只要在build engine开启verbose模式即可。

Verbose查看engine结构

很简单,拿TensorRT的官方工具trtexec为例,我们只需要在构建的时候添加verbose命令:

代码语言:javascript
复制
./trtexec --explicitBatch --onnx=debug.onnx --saveEngine=debug.trt  --verbose

即可在转换的时候得到大量的转换信息,例如build信息,我们可以看到这个模型的构建精度是FP32+INT8:

代码语言:javascript
复制
[08/25/2021-17:30:04] [I] === Build Options ===
[08/25/2021-17:30:04] [I] Max batch: explicit
[08/25/2021-17:30:04] [I] Workspace: 4096 MiB
[08/25/2021-17:30:04] [I] minTiming: 1
[08/25/2021-17:30:04] [I] avgTiming: 8
[08/25/2021-17:30:04] [I] Precision: FP32+INT8
[08/25/2021-17:30:04] [I] Calibration: Dynamic
[08/25/2021-17:30:04] [I] Refit: Disabled
[08/25/2021-17:30:04] [I] Sparsity: Disabled
[08/25/2021-17:30:04] [I] Safe mode: Disabled
[08/25/2021-17:30:04] [I] Restricted mode: Disabled
[08/25/2021-17:30:04] [I] Save engine: debug_int8.trt

在经过漫长且深奥的一堆优化步骤之后,我们可以看到最终模型的engine结构:

代码语言:javascript
复制
[V] [TRT] Engine Layer Information:
Layer(Scale): QuantizeLinear_2_quantize_scale_node, Tactic: 0, input[Float(1,3,-17,-18)] -> 255[Int8(1,3,-17,-18)]
Layer(CaskConvolution): conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11, Tactic: 4438325421691896755, 255[Int8(1,3,-17,-18)] -> 267[Int8(1,64,-40,-44)]
Layer(CudaPooling): MaxPool_12, Tactic: -3, 267[Int8(1,64,-40,-44)] -> Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to MaxPool_12, Tactic: 0, Reformatted Output Tensor 0 to MaxPool_12[Int8(1,64,-21,-24)] -> 270[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv1.weight + QuantizeLinear_20_quantize_scale_node + Conv_22 + Relu_24, Tactic: 4871133328510103657, 270[Int8(1,64,-21,-24)] -> 284[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.0.conv2.weight + QuantizeLinear_32_quantize_scale_node + Conv_34 + Add_42 + Relu_43, Tactic: 4871133328510103657, 284[Int8(1,64,-21,-24)], 270[Int8(1,64,-21,-24)] -> 305[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv1.weight + QuantizeLinear_51_quantize_scale_node + Conv_53 + Relu_55, Tactic: 4871133328510103657, 305[Int8(1,64,-21,-24)] -> 319[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.1.conv2.weight + QuantizeLinear_63_quantize_scale_node + Conv_65 + Add_73 + Relu_74, Tactic: 4871133328510103657, 319[Int8(1,64,-21,-24)], 305[Int8(1,64,-21,-24)] -> 340[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv1.weight + QuantizeLinear_82_quantize_scale_node + Conv_84 + Relu_86, Tactic: 4871133328510103657, 340[Int8(1,64,-21,-24)] -> 354[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer1.2.conv2.weight + QuantizeLinear_94_quantize_scale_node + Conv_96 + Add_104 + Relu_105, Tactic: 4871133328510103657, 354[Int8(1,64,-21,-24)], 340[Int8(1,64,-21,-24)] -> 375[Int8(1,64,-21,-24)]
Layer(CaskConvolution): layer2.0.conv1.weight + QuantizeLinear_113_quantize_scale_node + Conv_115 + Relu_117, Tactic: -1841683966837205309, 375[Int8(1,64,-21,-24)] -> 389[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.downsample.0.weight + QuantizeLinear_136_quantize_scale_node + Conv_138, Tactic: -1494157908358500249, 375[Int8(1,64,-21,-24)] -> 415[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.0.conv2.weight + QuantizeLinear_125_quantize_scale_node + Conv_127 + Add_146 + Relu_147, Tactic: -1841683966837205309, 389[Int8(1,128,-52,-37)], 415[Int8(1,128,-52,-37)] -> 423[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv1.weight + QuantizeLinear_155_quantize_scale_node + Conv_157 + Relu_159, Tactic: -1841683966837205309, 423[Int8(1,128,-52,-37)] -> 437[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.1.conv2.weight + QuantizeLinear_167_quantize_scale_node + Conv_169 + Add_177 + Relu_178, Tactic: -1841683966837205309, 437[Int8(1,128,-52,-37)], 423[Int8(1,128,-52,-37)] -> 458[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv1.weight + QuantizeLinear_186_quantize_scale_node + Conv_188 + Relu_190, Tactic: -1841683966837205309, 458[Int8(1,128,-52,-37)] -> 472[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.2.conv2.weight + QuantizeLinear_198_quantize_scale_node + Conv_200 + Add_208 + Relu_209, Tactic: -1841683966837205309, 472[Int8(1,128,-52,-37)], 458[Int8(1,128,-52,-37)] -> 493[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv1.weight + QuantizeLinear_217_quantize_scale_node + Conv_219 + Relu_221, Tactic: -1841683966837205309, 493[Int8(1,128,-52,-37)] -> 507[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer2.3.conv2.weight + QuantizeLinear_229_quantize_scale_node + Conv_231 + Add_239 + Relu_240, Tactic: -1841683966837205309, 507[Int8(1,128,-52,-37)], 493[Int8(1,128,-52,-37)] -> 528[Int8(1,128,-52,-37)]
Layer(CaskConvolution): layer3.0.conv1.weight + QuantizeLinear_248_quantize_scale_node + Conv_250 + Relu_252, Tactic: -8431788508843860955, 528[Int8(1,128,-52,-37)] -> 542[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.downsample.0.weight + QuantizeLinear_271_quantize_scale_node + Conv_273, Tactic: -5697614955743334137, 528[Int8(1,128,-52,-37)] -> 568[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.0.conv2.weight + QuantizeLinear_260_quantize_scale_node + Conv_262 + Add_281 + Relu_282, Tactic: -496455309852654971, 542[Int8(1,256,-59,-62)], 568[Int8(1,256,-59,-62)] -> 576[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv1.weight + QuantizeLinear_290_quantize_scale_node + Conv_292 + Relu_294, Tactic: -8431788508843860955, 576[Int8(1,256,-59,-62)] -> 590[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.1.conv2.weight + QuantizeLinear_302_quantize_scale_node + Conv_304 + Add_312 + Relu_313, Tactic: -496455309852654971, 590[Int8(1,256,-59,-62)], 576[Int8(1,256,-59,-62)] -> 611[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv1.weight + QuantizeLinear_321_quantize_scale_node + Conv_323 + Relu_325, Tactic: -8431788508843860955, 611[Int8(1,256,-59,-62)] -> 625[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.2.conv2.weight + QuantizeLinear_333_quantize_scale_node + Conv_335 + Add_343 + Relu_344, Tactic: -496455309852654971, 625[Int8(1,256,-59,-62)], 611[Int8(1,256,-59,-62)] -> 646[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv1.weight + QuantizeLinear_352_quantize_scale_node + Conv_354 + Relu_356, Tactic: -8431788508843860955, 646[Int8(1,256,-59,-62)] -> 660[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.3.conv2.weight + QuantizeLinear_364_quantize_scale_node + Conv_366 + Add_374 + Relu_375, Tactic: -496455309852654971, 660[Int8(1,256,-59,-62)], 646[Int8(1,256,-59,-62)] -> 681[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv1.weight + QuantizeLinear_383_quantize_scale_node + Conv_385 + Relu_387, Tactic: -8431788508843860955, 681[Int8(1,256,-59,-62)] -> 695[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.4.conv2.weight + QuantizeLinear_395_quantize_scale_node + Conv_397 + Add_405 + Relu_406, Tactic: -496455309852654971, 695[Int8(1,256,-59,-62)], 681[Int8(1,256,-59,-62)] -> 716[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv1.weight + QuantizeLinear_414_quantize_scale_node + Conv_416 + Relu_418, Tactic: -8431788508843860955, 716[Int8(1,256,-59,-62)] -> 730[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer3.5.conv2.weight + QuantizeLinear_426_quantize_scale_node + Conv_428 + Add_436 + Relu_437, Tactic: -496455309852654971, 730[Int8(1,256,-59,-62)], 716[Int8(1,256,-59,-62)] -> 751[Int8(1,256,-59,-62)]
Layer(CaskConvolution): layer4.0.conv1.weight + QuantizeLinear_445_quantize_scale_node + Conv_447 + Relu_449, Tactic: -6371781333659293809, 751[Int8(1,256,-59,-62)] -> 765[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.downsample.0.weight + QuantizeLinear_468_quantize_scale_node + Conv_470, Tactic: -1494157908358500249, 751[Int8(1,256,-59,-62)] -> 791[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.0.conv2.weight + QuantizeLinear_457_quantize_scale_node + Conv_459 + Add_478 + Relu_479, Tactic: -2328318099174473157, 765[Int8(1,512,-71,-72)], 791[Int8(1,512,-71,-72)] -> 799[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv1.weight + QuantizeLinear_487_quantize_scale_node + Conv_489 + Relu_491, Tactic: -2328318099174473157, 799[Int8(1,512,-71,-72)] -> 813[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.1.conv2.weight + QuantizeLinear_499_quantize_scale_node + Conv_501 + Add_509 + Relu_510, Tactic: -2328318099174473157, 813[Int8(1,512,-71,-72)], 799[Int8(1,512,-71,-72)] -> 834[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv1.weight + QuantizeLinear_518_quantize_scale_node + Conv_520 + Relu_522, Tactic: -2328318099174473157, 834[Int8(1,512,-71,-72)] -> 848[Int8(1,512,-71,-72)]
Layer(CaskConvolution): layer4.2.conv2.weight + QuantizeLinear_530_quantize_scale_node + Conv_532 + Add_540 + Relu_541, Tactic: -2328318099174473157, 848[Int8(1,512,-71,-72)], 834[Int8(1,512,-71,-72)] -> 869[Int8(1,512,-71,-72)]
Layer(CaskDeconvolution): deconv_layers.0.weight + QuantizeLinear_549_quantize_scale_node + ConvTranspose_551, Tactic: -3784829056659735491, 869[Int8(1,512,-71,-72)] -> 881[Int8(1,512,-46,-47)]
Layer(CaskConvolution): deconv_layers.1.weight + QuantizeLinear_559_quantize_scale_node + Conv_561 + Relu_563, Tactic: -496455309852654971, 881[Int8(1,512,-46,-47)] -> 895[Int8(1,256,-46,-47)]
Layer(CaskDeconvolution): deconv_layers.4.weight + QuantizeLinear_571_quantize_scale_node + ConvTranspose_573, Tactic: -3784829056659735491, 895[Int8(1,256,-46,-47)] -> 907[Int8(1,256,-68,-55)]
Layer(CaskConvolution): deconv_layers.5.weight + QuantizeLinear_581_quantize_scale_node + Conv_583 + Relu_585, Tactic: -8431788508843860955, 907[Int8(1,256,-68,-55)] -> 921[Int8(1,256,-68,-55)]
Layer(CaskDeconvolution): deconv_layers.8.weight + QuantizeLinear_593_quantize_scale_node + ConvTranspose_595, Tactic: -2621193268472024213, 921[Int8(1,256,-68,-55)] -> 933[Int8(1,256,-29,-32)]
Layer(CaskConvolution): deconv_layers.9.weight + QuantizeLinear_603_quantize_scale_node + Conv_605 + Relu_607, Tactic: -8431788508843860955, 933[Int8(1,256,-29,-32)] -> 947[Int8(1,256,-29,-32)]
Layer(CaskConvolution): hm.0.weight + QuantizeLinear_615_quantize_scale_node + Conv_617 + Relu_618, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 960[Int8(1,64,-29,-32)]
Layer(CaskConvolution): wh.0.weight + QuantizeLinear_636_quantize_scale_node + Conv_638 + Relu_639, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 985[Int8(1,64,-29,-32)]
Layer(CaskConvolution): reg.0.weight + QuantizeLinear_657_quantize_scale_node + Conv_659 + Relu_660, Tactic: 4871133328510103657, 947[Int8(1,256,-29,-32)] -> 1010[Int8(1,64,-29,-32)]
Layer(CaskConvolution): hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628, Tactic: -7185527339793611699, 960[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628, Tactic: 0, Reformatted Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628[Float(1,2,-29,-32)] -> hm[Float(1,2,-29,-32)]
Layer(CaskConvolution): wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649, Tactic: -7185527339793611699, 985[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649, Tactic: 0, Reformatted Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649[Float(1,2,-29,-32)] -> wh[Float(1,2,-29,-32)]
Layer(CaskConvolution): reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670, Tactic: -7185527339793611699, 1010[Int8(1,64,-29,-32)] -> Reformatted Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670[Float(1,2,-29,-32)]
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670, Tactic: 0, Reformatted Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670[Float(1,2,-29,-32)] -> reg[Float(1,2,-29,-32)]

大家能猜到以上模型的backbone是什么吗?

不画图根本看不出来好不。

由于老潘这几天翻Pytorch的Pull Request记录比较勤快,无意间发现了一个好东西——engine_layer_visualize.py,其commit在这里:

这是jerryzh168大神开源了Facebook内部查看engine的工具,使用pydot和graphviz来画神经网络结构图,查了一下之前Keras竟然也是使用这个库来画图的。

使用Pydot和graphviz画TensorRT的Engine图

使用方式很简单,首先安装:

代码语言:javascript
复制
pip install pydot
conda install python-graphviz

PS:别问我为什么先pip installconda install,我这边只有这样才不报错…否则会报[Errno 2] "dot" not found in path.

然后利用以下代码:

代码语言:javascript
复制
# (c) Facebook, Inc. and its affiliates. Confidential and proprietary.

import argparse
import re
from typing import NamedTuple, List, Optional

import pydot


"""
log_file is generated by tensorrt verbose logger during building engine.
profile_file is generated by tensorrt profiler.

Curretnly we support processing multiple logs in one log_file, which
would generate multiple dot graphs. However, multiple engine profiles are not
supported.

Usage:
    python torch/fx/experimental/fx2trt/tools/engine_layer_visualize.py --log_file aaa --profile_file bbb

Usage(Facebook):
    buck run //caffe2/torch/fx/experimental/fx2trt/tools:engine_layer_visualize -- --log_file aaa --profile_file bbb
"""


parser = argparse.ArgumentParser()
parser.add_argument(
    "--log_file",
    type=str,
    default="",
    help="TensorRT VERBOSE logging when building engines.",
)
parser.add_argument(
    "--profile_file",
    type=str,
    default="",
    help="TensorRT execution context profiler output.",
)
args = parser.parse_args()

...

完整代码在这里: https://github.com/pytorch/pytorch/pull/66431/files, 这里就不粘了。

需要注意我们需要输入log_file也就是刚才开启Verbose的构建信息,然后profile_file则是使用TensorRT来profile的信息,最简单的可以通过trtexec这样获取到:

代码语言:javascript
复制
./trtexec --loadEngine=debug_int8.trt --dumpProfile --shapes=input:1x3x512x512 --exportProfile=debug_profile

然后会产生类似于这样的profile信息,详细展示了融合后每层的平均运行时间、以及总体运行时间、时间占比:

代码语言:javascript
复制
[
  { "count" : 961 }
, { "name" : "QuantizeLinear_2_quantize_scale_node", "timeMs" : 19.9954, "averageMs" : 0.0208069, "percentage" : 0.801597 }
, { "name" : "conv1.weight + QuantizeLinear_7_quantize_scale_node + Conv_9 + Relu_11", "timeMs" : 86.6105, "averageMs" : 0.0901253, "percentage" : 3.47213 }
, { "name" : "MaxPool_12", "timeMs" : 28.0466, "averageMs" : 0.0291848, "percentage" : 1.12436 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to MaxPool_12", "timeMs" : 12.9771, "averageMs" : 0.0135037, "percentage" : 0.520239 }
, { "name" : "layer1.0.conv1.weight + QuantizeLinear_20_quantize_scale_node + Conv_22 + Relu_24", "timeMs" : 28.8356, "averageMs" : 0.0300059, "percentage" : 1.15599 }
, { "name" : "layer1.0.conv2.weight + QuantizeLinear_32_quantize_scale_node + Conv_34 + Add_42 + Relu_43", "timeMs" : 31.3897, "averageMs" : 0.0326635, "percentage" : 1.25838 }
, { "name" : "layer1.1.conv1.weight + QuantizeLinear_51_quantize_scale_node + Conv_53 + Relu_55", "timeMs" : 28.788, "averageMs" : 0.0299563, "percentage" : 1.15408 }
, { "name" : "layer1.1.conv2.weight + QuantizeLinear_63_quantize_scale_node + Conv_65 + Add_73 + Relu_74", "timeMs" : 31.1857, "averageMs" : 0.0324513, "percentage" : 1.25021 }
, { "name" : "layer1.2.conv1.weight + QuantizeLinear_82_quantize_scale_node + Conv_84 + Relu_86", "timeMs" : 28.7898, "averageMs" : 0.0299581, "percentage" : 1.15415 }
, { "name" : "layer1.2.conv2.weight + QuantizeLinear_94_quantize_scale_node + Conv_96 + Add_104 + Relu_105", "timeMs" : 31.1666, "averageMs" : 0.0324314, "percentage" : 1.24944 }
, { "name" : "layer2.0.conv1.weight + QuantizeLinear_113_quantize_scale_node + Conv_115 + Relu_117", "timeMs" : 20.9996, "averageMs" : 0.0218519, "percentage" : 0.841856 }
, { "name" : "layer2.0.downsample.0.weight + QuantizeLinear_136_quantize_scale_node + Conv_138", "timeMs" : 10.1555, "averageMs" : 0.0105677, "percentage" : 0.407126 }
, { "name" : "layer2.0.conv2.weight + QuantizeLinear_125_quantize_scale_node + Conv_127 + Add_146 + Relu_147", "timeMs" : 31.8969, "averageMs" : 0.0331914, "percentage" : 1.27872 }
, { "name" : "layer2.1.conv1.weight + QuantizeLinear_155_quantize_scale_node + Conv_157 + Relu_159", "timeMs" : 30.5402, "averageMs" : 0.0317796, "percentage" : 1.22433 }
, { "name" : "layer2.1.conv2.weight + QuantizeLinear_167_quantize_scale_node + Conv_169 + Add_177 + Relu_178", "timeMs" : 32.0256, "averageMs" : 0.0333253, "percentage" : 1.28388 }
, { "name" : "layer2.2.conv1.weight + QuantizeLinear_186_quantize_scale_node + Conv_188 + Relu_190", "timeMs" : 30.5798, "averageMs" : 0.0318208, "percentage" : 1.22591 }
, { "name" : "layer2.2.conv2.weight + QuantizeLinear_198_quantize_scale_node + Conv_200 + Add_208 + Relu_209", "timeMs" : 31.813, "averageMs" : 0.0331041, "percentage" : 1.27536 }
, { "name" : "layer2.3.conv1.weight + QuantizeLinear_217_quantize_scale_node + Conv_219 + Relu_221", "timeMs" : 30.6143, "averageMs" : 0.0318568, "percentage" : 1.2273 }
, { "name" : "layer2.3.conv2.weight + QuantizeLinear_229_quantize_scale_node + Conv_231 + Add_239 + Relu_240", "timeMs" : 32.123, "averageMs" : 0.0334266, "percentage" : 1.28778 }
, { "name" : "layer3.0.conv1.weight + QuantizeLinear_248_quantize_scale_node + Conv_250 + Relu_252", "timeMs" : 21.1744, "averageMs" : 0.0220337, "percentage" : 0.848863 }
, { "name" : "layer3.0.downsample.0.weight + QuantizeLinear_271_quantize_scale_node + Conv_273", "timeMs" : 12.0922, "averageMs" : 0.0125829, "percentage" : 0.484765 }
, { "name" : "layer3.0.conv2.weight + QuantizeLinear_260_quantize_scale_node + Conv_262 + Add_281 + Relu_282", "timeMs" : 34.8428, "averageMs" : 0.0362568, "percentage" : 1.39682 }
, { "name" : "layer3.1.conv1.weight + QuantizeLinear_290_quantize_scale_node + Conv_292 + Relu_294", "timeMs" : 31.9807, "averageMs" : 0.0332785, "percentage" : 1.28207 }
, { "name" : "layer3.1.conv2.weight + QuantizeLinear_302_quantize_scale_node + Conv_304 + Add_312 + Relu_313", "timeMs" : 34.4399, "averageMs" : 0.0358375, "percentage" : 1.38066 }
, { "name" : "layer3.2.conv1.weight + QuantizeLinear_321_quantize_scale_node + Conv_323 + Relu_325", "timeMs" : 31.7602, "averageMs" : 0.0330491, "percentage" : 1.27324 }
, { "name" : "layer3.2.conv2.weight + QuantizeLinear_333_quantize_scale_node + Conv_335 + Add_343 + Relu_344", "timeMs" : 35.1158, "averageMs" : 0.0365409, "percentage" : 1.40776 }
, { "name" : "layer3.3.conv1.weight + QuantizeLinear_352_quantize_scale_node + Conv_354 + Relu_356", "timeMs" : 32.027, "averageMs" : 0.0333267, "percentage" : 1.28393 }
, { "name" : "layer3.3.conv2.weight + QuantizeLinear_364_quantize_scale_node + Conv_366 + Add_374 + Relu_375", "timeMs" : 34.6465, "averageMs" : 0.0360526, "percentage" : 1.38895 }
, { "name" : "layer3.4.conv1.weight + QuantizeLinear_383_quantize_scale_node + Conv_385 + Relu_387", "timeMs" : 31.7624, "averageMs" : 0.0330514, "percentage" : 1.27332 }
, { "name" : "layer3.4.conv2.weight + QuantizeLinear_395_quantize_scale_node + Conv_397 + Add_405 + Relu_406", "timeMs" : 34.3392, "averageMs" : 0.0357328, "percentage" : 1.37663 }
, { "name" : "layer3.5.conv1.weight + QuantizeLinear_414_quantize_scale_node + Conv_416 + Relu_418", "timeMs" : 31.728, "averageMs" : 0.0330156, "percentage" : 1.27195 }
, { "name" : "layer3.5.conv2.weight + QuantizeLinear_426_quantize_scale_node + Conv_428 + Add_436 + Relu_437", "timeMs" : 34.2101, "averageMs" : 0.0355985, "percentage" : 1.37145 }
, { "name" : "layer4.0.conv1.weight + QuantizeLinear_445_quantize_scale_node + Conv_447 + Relu_449", "timeMs" : 25.4399, "averageMs" : 0.0264723, "percentage" : 1.01986 }
, { "name" : "layer4.0.downsample.0.weight + QuantizeLinear_468_quantize_scale_node + Conv_470", "timeMs" : 8.88198, "averageMs" : 0.00924243, "percentage" : 0.35607 }
, { "name" : "layer4.0.conv2.weight + QuantizeLinear_457_quantize_scale_node + Conv_459 + Add_478 + Relu_479", "timeMs" : 44.1804, "averageMs" : 0.0459734, "percentage" : 1.77115 }
, { "name" : "layer4.1.conv1.weight + QuantizeLinear_487_quantize_scale_node + Conv_489 + Relu_491", "timeMs" : 44.3623, "averageMs" : 0.0461627, "percentage" : 1.77844 }
, { "name" : "layer4.1.conv2.weight + QuantizeLinear_499_quantize_scale_node + Conv_501 + Add_509 + Relu_510", "timeMs" : 44.3341, "averageMs" : 0.0461333, "percentage" : 1.77731 }
, { "name" : "layer4.2.conv1.weight + QuantizeLinear_518_quantize_scale_node + Conv_520 + Relu_522", "timeMs" : 42.4246, "averageMs" : 0.0441463, "percentage" : 1.70076 }
, { "name" : "layer4.2.conv2.weight + QuantizeLinear_530_quantize_scale_node + Conv_532 + Add_540 + Relu_541", "timeMs" : 43.7076, "averageMs" : 0.0454813, "percentage" : 1.75219 }
, { "name" : "deconv_layers.0.weight + QuantizeLinear_549_quantize_scale_node + ConvTranspose_551", "timeMs" : 77.9405, "averageMs" : 0.0811035, "percentage" : 3.12456 }
, { "name" : "deconv_layers.1.weight + QuantizeLinear_559_quantize_scale_node + Conv_561 + Relu_563", "timeMs" : 60.049, "averageMs" : 0.0624859, "percentage" : 2.40731 }
, { "name" : "deconv_layers.4.weight + QuantizeLinear_571_quantize_scale_node + ConvTranspose_573", "timeMs" : 107.53, "averageMs" : 0.111894, "percentage" : 4.31079 }
, { "name" : "deconv_layers.5.weight + QuantizeLinear_581_quantize_scale_node + Conv_583 + Relu_585", "timeMs" : 80.9985, "averageMs" : 0.0842856, "percentage" : 3.24715 }
, { "name" : "deconv_layers.8.weight + QuantizeLinear_593_quantize_scale_node + ConvTranspose_595", "timeMs" : 381.204, "averageMs" : 0.396674, "percentage" : 15.2821 }
, { "name" : "deconv_layers.9.weight + QuantizeLinear_603_quantize_scale_node + Conv_605 + Relu_607", "timeMs" : 221.925, "averageMs" : 0.230931, "percentage" : 8.89675 }
, { "name" : "hm.0.weight + QuantizeLinear_615_quantize_scale_node + Conv_617 + Relu_618", "timeMs" : 84.4777, "averageMs" : 0.087906, "percentage" : 3.38663 }
, { "name" : "wh.0.weight + QuantizeLinear_636_quantize_scale_node + Conv_638 + Relu_639", "timeMs" : 85.658, "averageMs" : 0.0891342, "percentage" : 3.43395 }
, { "name" : "reg.0.weight + QuantizeLinear_657_quantize_scale_node + Conv_659 + Relu_660", "timeMs" : 85.4159, "averageMs" : 0.0888823, "percentage" : 3.42424 }
, { "name" : "hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628", "timeMs" : 19.5074, "averageMs" : 0.0202991, "percentage" : 0.782035 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to hm.2.weight + QuantizeLinear_626_quantize_scale_node + Conv_628", "timeMs" : 6.52869, "averageMs" : 0.00679364, "percentage" : 0.261729 }
, { "name" : "wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649", "timeMs" : 18.7298, "averageMs" : 0.0194899, "percentage" : 0.750862 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to wh.2.weight + QuantizeLinear_647_quantize_scale_node + Conv_649", "timeMs" : 6.69421, "averageMs" : 0.00696588, "percentage" : 0.268364 }
, { "name" : "reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670", "timeMs" : 18.7625, "averageMs" : 0.0195239, "percentage" : 0.752172 }
, { "name" : "Reformatting CopyNode for Output Tensor 0 to reg.2.weight + QuantizeLinear_668_quantize_scale_node + Conv_670", "timeMs" : 7.04306, "averageMs" : 0.00732889, "percentage" : 0.28235 }
]

然后通过上述代码生成EngineLayers_0.dot

这个.dot就包含了网络计算图的信息,节点、线段等。

最终通过以下代码画图就可以了!

代码语言:javascript
复制
import pydot

graphs = pydot.graph_from_dot_file("EngineLayers_0.dot")
graph = graphs[0]
graph.write_png("trt_engine.png")

简单对比

简单对比下原模型和构建engine后的模型:

  • 输入部分:
  • 输出部分:

关于TensorRT模型量化的细节部分,老潘之后会花篇幅单独说,这里就不详谈了。

结语

如果你遇到画出来的图是这样的:

恭喜你!你的电脑是万中无一的绝世高手!解决方法很简单,换台电脑就好了(逃)!

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021-10-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • TensorRT的优化
  • Verbose查看engine结构
  • 使用Pydot和graphviz画TensorRT的Engine图
  • 简单对比
  • 结语
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档