This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[参考译文] PROCESSOR-SDK-J721S2:基于 OpenACC 的 TIDL 发生运行时错误

Guru**** 2337190 points
请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1495252/processor-sdk-j721s2-runtime-error-occurs-from-openacc-based-tidl

器件型号:PROCESSOR-SDK-J721S2

工具/软件:

这是施特拉德维奇的诚勋。
我在 build_with_OPENACC 生成的可执行文件中遇到运行时错误。

环境详细信息
  1. HPC SDK 23.7
  1. 适用于 J721S2的 PSDK 9.2
  1. Docker 图像:  NVIDIA 提供的 Nvidia/CUDA:11.8.0-devel-ubuntu22.04
  1. 与 NVIDIA 相关的环境(在两台不同的 PC 上测试):
    • (4-1) RTX 4080
      • NVIDIA 显卡驱动程序:535.183
      • CUDA 驱动程序:预安装在 HPC SDK 中(12.2)
    • (4-2) RTX 3070、Titan X (同一台机器上有两个 GPU)
      • NVIDIA 显卡驱动程序:530.41
      • CUDA 驱动程序:预安装在 HPC SDK 中(12.2)
  1. 修改了编译设置:在我们为使用 TIDL 库的可执行文件进行的编译配置过程中、我们添加了以下链接信息:

LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib
LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64
SHARED_LIBS += acccuda acchost accdevice accdevaux cudart

发现运行时错误
- RTX 4080
Accelerator Fatal Error: No CUDA device code available
 File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c
 Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463
 Line: 473
- RTX 3070
(将 NVC++构建选项从 -gpu=ccall 更改为 -gpu=cc86)
Accelerator Fatal Error: No CUDA device code available
 File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c
 Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463
 Line: 473
 
-Titan X (RTX 3070的可执行文件相同)
(将 NVC++构建选项从 -gpu=ccall 更改为 -gpu=cc86)
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc80 -gpu=cc86 -acc=host or -acc=multicore
Rebuild this file with -gpu=cc61 to use NVIDIA Tesla GPU 0
 File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c
 Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463
 Line: 473
       
它的格式
这表明我们使用的是随 HPC SDK 预安装的 CUDA 12.2版本。
 ldd 在可执行文件上运行时,它显示链接的 OpenACC 和 CUDA 相关库如下:
libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacccuda.so (0x00007fe543400000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacchost.so (0x00007fe543000000)
libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevice.so (0x00007fe542800000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevaux.so (0x00007fe542400000)
libcudart.so.12 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64/libcudart.so.12 (0x00007f19bac00000)
构建输出(使用 nvc++ -v)确认:
Export PGI_CURR_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2
Export NVHPC_CURRENT_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2
Export NVHPC_CURRENT_CUDA_VERSION=12.2.53
Export NVCOMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7
Export PGI=/opt/nvidia/hpc_sdk
运行时错误发生在函数中 TIDL_refConv2dKernelFast ,尽管构建日志显示"生成 NVIDIA GPU 代码"和生成 .ptx  FAT 二进制文件。
void TIDL_refConv2dKernelFast<1, unsigned short, signed char, int, int>(unsigned short*, signed char*, int*, int*, int*, int*, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int):
    473, Generating present(pCoeffs[:((numInChannels-1)*(coeffsWidth*coeffsHeight))+((coeffsWidth*(coeffsHeight*(numInChannels*(numOutChannels-1))))+(numOutChannels*(coeffsWidth*((numInChannels*(numGroups-1))*coeffsHeight))))+1],pInChannel[:((width%strideWidth)+(width-strideWidth))+((inImPitch*((height%strideHeight)+(height-strideHeight)))+((inChPitch*(numInChannels-1))+((inBatchPitch*(numBatches-1))+(inChPitch*(numInChannels*(numGroups-1))))))+1],pBias[:numOutChannels+((numGroups-1)*numOutChannels)],accPtr[:(((width%strideWidth)+(width-strideWidth))/strideWidth)+((((height%strideHeight)+(height-strideHeight))*outImPitch)+(((numOutChannels-1)*outChPitch)+(((numBatches-1)*outBatchPitch)+(((numGroups-1)*numOutChannels)*outChPitch))))+1])
         Generating implicit firstprivate(numGroups,strideHeight,topPad,width,pInChannel,numInChannels,numBatches,leftPad,inWidth,isOTFpad,inHeight,strideWidth,inImPitch,height,numOutChannels)
         Generating NVIDIA GPU code
        496, #pragma acc loop gang, vector(128) collapse(5) /* blockIdx.x threadIdx.x */
        498,   /* blockIdx.x threadIdx.x collapsed */
        500,   /* blockIdx.x threadIdx.x collapsed */
        502,   /* blockIdx.x threadIdx.x collapsed */
        504,   /* blockIdx.x threadIdx.x collapsed */
             Generating reduction(min:_min)
             Generating reduction(max:_max)
        519, #pragma acc loop seq
        524, #pragma acc loop seq
        527, #pragma acc loop seq
    504, Generating implicit firstprivate(enableBias,inBatchPitch,inChPitch,outBatchPitch,outImPitch,outChPitch)
    519, Generating implicit firstprivate(coeffsHeight,coeffsWidth)
    527, Generating implicit firstprivate(dilationHeight,startRowNumberInTensor,padVal,dilationWidth)
    
    
    
    ...................
    
 /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -usenvvm -nvvm70 -reloc /tmp/nvacceWgemkX4NEtn.gpu -computecap 86 -ptx /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvaccKWgeSc99ei3K.bin -ftz -cuda12020
 /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -reloc -cuda12020 -fat src/tidl_conv2d_base.c -sm 86 /tmp/nvaccKWgeSc99ei3K.bin -compute 86 /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvacceWgemyLVfjWr.fat
NVC++/x86-64 Linux 23.7-0: compilation successful
    
    

我 已附上完整的构建日志、并请求您帮助诊断在这些情况下触发运行时错误("无 CUDA 设备代码可用")的原因。
非常感谢您提供的任何帮助或建议。
提前感谢您的支持。
e2e.ti.com/.../TIDL_5F00_build_5F00_log.txt