PC端python3.6+onnx1.8的docker镜像中edgeai-benchmark生产模型在SDK-08_05_00_11版本TDA4VM板子edgeai-benchmark无法运行

Jay Meng

Other Parts Discussed in Thread: TDA4VM

1、问题：PC端python3.6+onnx1.8的docker镜像中edgeai-benchmark生产模型在SDK-08_05_00_11版本TDA4VM板子edgeai-benchmark无法运行

2、错误log：

INFO:20221218-130103: starting - cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx
INFO:20221218-130103: model_path - /opt/edgeai-modelzoo/models/vision/classification/imagenet1k/torchvision/resnet50.onnx
INFO:20221218-130103: model_file - /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx/model/resnet50.onnx

INFO:20221218-130103: running - cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx
INFO:20221218-130103: pipeline_config - {'task_type': 'classification', 'dataset_category': 'imagenet', 'calibration_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0xffff697ec910>, 'input_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0xffff697ec370>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0xffff697ec5b0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0xffff697ec6a0>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0xffff697ec490>, 'model_info': {'metric_reference': {'accuracy_top1%': 76.15}, 'model_shortlist': 30}}libtidl_onnxrt_EP loaded 0x35e11f80
Final number of subgraphs created are : 1, - Offloaded Nodes - 125, Total Nodes - 125
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
594557.953562 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
594557.953591 s: VX_ZONE_INIT:Enabled
594557.953599 s: VX_ZONE_ERROR:Enabled
594557.953606 s: VX_ZONE_WARNING:Enabled
594557.953975 s: VX_ZONE_INIT:[tivxInitLocal:145] Initialization Done !!!
594557.954018 s: VX_ZONE_INIT:[tivxHostInitLocal:93] Initialization Done for HOST !!!
594558.025423 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
594558.025451 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
594558.025477 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
594558.025495 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
594558.025511 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
594558.025530 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
594558.025553 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
594558.025569 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
infer 2/2: cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on| | 0% 0/1| [< ]594558.079894 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
594558.079924 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
594558.079938 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
594558.079948 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
594558.079957 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
594558.079968 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
594558.079981 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
594558.079990 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
594558.080101 s: VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
594558.080111 s: VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
594558.080116 s: VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
ERROR: Running TIDL graph ... Failed !!!

infer 2/2: cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on| 100%|##########|| 1/1 [00:00<00:00, 18.44it/s]
*** mgg *** description= 2/2 run_dir_base= cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx elapsed_time= 1303.8575649261475 ms

SUCCESS:20221218-130105: benchmark results - {'infer_path': 'cl-6110_onnxrt_imagenet1k_torchvision_resnet50_onnx', 'accuracy_top1%': 0.0, 'num_subgraphs': 1, 'infer_time_core_ms': 16129.723087, 'infer_time_subgraph_ms': 41.80643, 'ddr_transfer_mb': 82.945088, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}
594558.119588 s: VX_ZONE_INIT:[tivxHostDeInitLocal:107] De-Initialization Done for HOST !!!
594558.123975 s: VX_ZONE_INIT:[tivxDeInitLocal:223] De-Initialization Done !!!
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: DeInit ... Done !!!
MEM: Deinit ... !!!
DDR_SHARED_MEM: Alloc's: 7 alloc's of 26958100 bytes
DDR_SHARED_MEM: Free's : 7 free's of 26958100 bytes
DDR_SHARED_MEM: Open's : 0 allocs of 0 bytes
DDR_SHARED_MEM: Total size: 536870912 bytes
MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!

2 年多前

0 Nancy Wang 2 年多前

TI__Guru** 110395 points

我将您的问题升级到了英文论坛，会有产品线专家给您支持，请及时跟进。

e2e.ti.com/.../tda4vm-the-edgeai-benchmark-production-model-cannot-run-on-tda4vm

0 Nancy Wang 2 年多前

TI__Guru** 110395 points

英文论坛已有回复，请及时跟进。

0 Jay Meng 2 年多前回复 Nancy Wang

Prodigy 40 points

OK,TKS!

I changed setup_pc.sh according to your answer. Now running cl-6110_onnxrt_imagenet1k_torchvision_resnet50_on is successful, but other models also have errors, such as

1> running od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx error log:

INFO:20221218-081136: starting - od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx
INFO:20221218-081136: model_path - /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
INFO:20221218-081136: model_file - /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx/model/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Downloading 1/1: /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Downloading software-dl.ti.com/.../ssd_mobilenetv2_lite_512x512_20201214_model.onnx to /opt/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx/model/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
12795904it [00:53, 240803.44it/s]
Download done for /opt/edgeai-modelzoo/models/vision/detection/coco/edgeai-mmdet/ssd_mobilenetv2_lite_512x512_20201214_model.onnx
Traceback (most recent call last):
File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 154, in _run_pipeline
    result = cls._run_pipeline_impl(settings, pipeline_config, description)
File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 125, in _run_pipeline_impl
    accuracy_result = accuracy_pipeline(description)
File "/opt/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 103, in __call__
    self.session.start()
File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/onnxrt_session.py", line 47, in start
    super().start()
File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 140, in start
    self.get_model()
File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 402, in get_model
    optimization_done = self._optimize_model(is_new_file=(not model_file_exists))
File "/opt/edgeai-benchmark/edgeai_benchmark/sessions/basert_session.py", line 443, in _optimize_model
    from osrt_model_tools.onnx_tools import onnx_model_opt as onnxopt
ModuleNotFoundError: No module named 'osrt_model_tools'
No module named 'osrt_model_tools'

2> running od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx error log:

Final number of subgraphs created are : 1, - Offloaded Nodes - 186, Total Nodes - 186
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
748472.194157 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
748472.194182 s: VX_ZONE_INIT:Enabled
748472.194190 s: VX_ZONE_ERROR:Enabled
748472.194197 s: VX_ZONE_WARNING:Enabled
748472.194537 s: VX_ZONE_INIT:[tivxInitLocal:145] Initialization Done !!!
748472.194579 s: VX_ZONE_INIT:[tivxHostInitLocal:93] Initialization Done for HOST !!!
748472.255784 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
748472.255811 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
748472.255824 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
748472.255834 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
748472.255843 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
748472.255853 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!
748472.255865 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
748472.255874 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
infer 3/50: od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_| | 0% 0/1| [< ]748472.369270 s: VX_ZONE_ERROR:[ownContextSendCmd:802] Command ack message returned failure cmd_status: -1
748472.369299 s: VX_ZONE_ERROR:[ownContextSendCmd:838] tivxEventWait() failed.
748472.369313 s: VX_ZONE_ERROR:[ownNodeKernelInit:525] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
748472.369322 s: VX_ZONE_ERROR:[ownNodeKernelInit:526] Please be sure the target callbacks have been registered for this core
748472.369331 s: VX_ZONE_ERROR:[ownNodeKernelInit:527] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
748472.369341 s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!
748472.369354 s: VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
748472.369363 s: VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
748472.369496 s: VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
748472.369507 s: VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
748472.369517 s: VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
infer 3/50: od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_| 100%|##########|| 1/1 [00:00<00:00, 8.53it/s]

*** mgg *** description= 3/50 run_dir_base= od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx elapsed_time= 3147.7415561676025 ms

SUCCESS:20221218-075935: benchmark results - {'infer_path': 'od-8000_onnxrt_coco_mlperf_ssd_resnet34-ssd1200_onnx', 'accuracy_ap[.5:.95]%': 0.0, 'accuracy_ap50%': 0.0, 'num_subgraphs': 1, 'infer_time_core_ms': 7310027617237.317, 'infer_time_subgraph_ms': 34.935245, 'ddr_transfer_mb': 74.958144, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}
748472.899576 s: VX_ZONE_INIT:[tivxHostDeInitLocal:107] De-Initialization Done for HOST !!!
748472.903047 s: VX_ZONE_INIT:[tivxDeInitLocal:223] De-Initialization Done !!!
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: DeInit ... Done !!!
MEM: Deinit ... !!!
DDR_SHARED_MEM: Alloc's: 9 alloc's of 25549012 bytes
DDR_SHARED_MEM: Free's : 9 free's of 25549012 bytes
DDR_SHARED_MEM: Open's : 0 allocs of 0 bytes
DDR_SHARED_MEM: Total size: 536870912 bytes
MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!

0 Nancy Wang 2 年多前回复 Jay Meng

TI__Guru** 110395 points

已跟进。

0 Jay Meng 2 年多前回复 Nancy Wang

Prodigy 40 points

OK,TKS!

Now there is new errors, detailed log:

INFO:20230518-081053: starting process on parallel_device - 0

INFO:20230518-081053: starting - cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx
INFO:20230518-081053: model_path - /home/cambricon/work/ai/edgeai-modelzoo/models/vision/classification/imagenet1k/torchvision/mobilenetv2.onnx
INFO:20230518-081053: model_file - /home/cambricon/work/ai/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx/model/mobilenetv2.onnx

INFO:20230518-081053: running - cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx
INFO:20230518-081053: pipeline_config - {'task_type': 'classification', 'dataset_category': 'imagenet', 'calibration_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0x7f81afe78dd0>, 'input_dataset': <edgeai_benchmark.datasets.imagenet.ImageNetCls object at 0x7f81afe78810>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7f81ad429d10>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7f81ad429c50>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7f816a32b0d0>, 'model_info': {'metric_reference': {'accuracy_top1%': 69.76}, 'model_shortlist': 30}}
INFO:20230518-081053: infer - cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx - this may take some time...libtidl_onnxrt_EP loaded 0x55f6367c7060

******** WARNING ******* : Could not open /home/cambricon/work/ai/edgeai-benchmark/work_dirs/modelartifacts/TDA4VM/8bits/cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_onnx/artifacts/allowedNode.txt for reading... Entire model will run on ARM without any delegation to TIDL !
Final number of subgraphs created are : 1, - Offloaded Nodes - 0, Total Nodes - 0
infer : cl-0016_onnxrt_imagenet1k_torchvision_mobilenetv2_on|   0%|          || 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 154, in _run_pipeline
    result = cls._run_pipeline_impl(settings, pipeline_config, description)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/pipeline_runner.py", line 125, in _run_pipeline_impl
    accuracy_result = accuracy_pipeline(description)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 122, in __call__
    param_result = self._run(description=description)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 164, in _run
    output_list = self._infer_frames(description)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 229, in _infer_frames
    output, info_dict = self._run_with_log(session.infer_frame, data, info_dict)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/pipelines/accuracy_pipeline.py", line 302, in _run_with_log
    return func(*args, **kwargs)
File "/home/cambricon/work/ai/edgeai-benchmark/edgeai_benchmark/sessions/onnxrt_session.py", line 105, in infer_frame
    outputs = self.interpreter.run(output_keys, input_dict)
File "/home/edgeai/.pyenv/versions/py36/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(uint8)) , expected: (tensor(float))
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(uint8)) , expected: (tensor(float))
TASKS                                                       | 100%|##########|| 2/2 [00:06<00:00, 3.10s/it]

0 Nancy Wang 2 年多前回复 Jay Meng

TI__Guru** 110395 points

已跟进。

0 Nancy Wang 2 年多前回复 Jay Meng

TI__Guru** 110395 points

since you had the error earlier, there are some that causes issue. Please delete the modelartifacts/TDA4VM/8bits folder and try again.

处理器

处理器论坛

PC端python3.6+onnx1.8的docker镜像中edgeai-benchmark生产模型在SDK-08_05_00_11版本TDA4VM板子edgeai-benchmark无法运行