工具/软件:
这是施特拉德维奇的诚勋。
我在 build_with_OPENACC 生成的可执行文件中遇到运行时错误。
- HPC SDK 23.7
- 适用于 J721S2的 PSDK 9.2
- Docker 图像: NVIDIA 提供的 Nvidia/CUDA:11.8.0-devel-ubuntu22.04
- 与 NVIDIA 相关的环境(在两台不同的 PC 上测试):
- (4-1) RTX 4080
- NVIDIA 显卡驱动程序:535.183
- CUDA 驱动程序:预安装在 HPC SDK 中(12.2)
- (4-2) RTX 3070、Titan X (同一台机器上有两个 GPU)
- NVIDIA 显卡驱动程序:530.41
- CUDA 驱动程序:预安装在 HPC SDK 中(12.2)
- 修改了编译设置:在我们为使用 TIDL 库的可执行文件进行的编译配置过程中、我们添加了以下链接信息:
LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib LDIRS += /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64 SHARED_LIBS += acccuda acchost accdevice accdevaux cudart
Accelerator Fatal Error: No CUDA device code available File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
-gpu=ccall
更改为 -gpu=cc86
) Accelerator Fatal Error: No CUDA device code available File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
-gpu=ccall
更改为 -gpu=cc86
) Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc80 -gpu=cc86 -acc=host or -acc=multicore Rebuild this file with -gpu=cc61 to use NVIDIA Tesla GPU 0 File: /home/seunghun/strad/svnet3/src_tda4x/platforms/92_j721s2/c7x-mma-tidl/ti_dl/algo/src/tidl_conv2d_base.c Function: _Z24TIDL_refConv2dKernelFastILi3EaaiiEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiiiiiiiii:463 Line: 473
ldd
在可执行文件上运行时,它显示链接的 OpenACC 和 CUDA 相关库如下: libacccuda.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacccuda.so (0x00007fe543400000) libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libacchost.so (0x00007fe543000000) libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevice.so (0x00007fe542800000) libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/lib/libaccdevaux.so (0x00007fe542400000) libcudart.so.12 => /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2/lib64/libcudart.so.12 (0x00007f19bac00000)
nvc++ -v
)确认: Export PGI_CURR_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 Export NVHPC_CURRENT_CUDA_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 Export NVHPC_CURRENT_CUDA_VERSION=12.2.53 Export NVCOMPILER=/opt/nvidia/hpc_sdk/Linux_x86_64/23.7 Export PGI=/opt/nvidia/hpc_sdk
TIDL_refConv2dKernelFast
,尽管构建日志显示"生成 NVIDIA GPU 代码"和生成 .ptx
FAT 二进制文件。 void TIDL_refConv2dKernelFast<1, unsigned short, signed char, int, int>(unsigned short*, signed char*, int*, int*, int*, int*, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int): 473, Generating present(pCoeffs[:((numInChannels-1)*(coeffsWidth*coeffsHeight))+((coeffsWidth*(coeffsHeight*(numInChannels*(numOutChannels-1))))+(numOutChannels*(coeffsWidth*((numInChannels*(numGroups-1))*coeffsHeight))))+1],pInChannel[:((width%strideWidth)+(width-strideWidth))+((inImPitch*((height%strideHeight)+(height-strideHeight)))+((inChPitch*(numInChannels-1))+((inBatchPitch*(numBatches-1))+(inChPitch*(numInChannels*(numGroups-1))))))+1],pBias[:numOutChannels+((numGroups-1)*numOutChannels)],accPtr[:(((width%strideWidth)+(width-strideWidth))/strideWidth)+((((height%strideHeight)+(height-strideHeight))*outImPitch)+(((numOutChannels-1)*outChPitch)+(((numBatches-1)*outBatchPitch)+(((numGroups-1)*numOutChannels)*outChPitch))))+1]) Generating implicit firstprivate(numGroups,strideHeight,topPad,width,pInChannel,numInChannels,numBatches,leftPad,inWidth,isOTFpad,inHeight,strideWidth,inImPitch,height,numOutChannels) Generating NVIDIA GPU code 496, #pragma acc loop gang, vector(128) collapse(5) /* blockIdx.x threadIdx.x */ 498, /* blockIdx.x threadIdx.x collapsed */ 500, /* blockIdx.x threadIdx.x collapsed */ 502, /* blockIdx.x threadIdx.x collapsed */ 504, /* blockIdx.x threadIdx.x collapsed */ Generating reduction(min:_min) Generating reduction(max:_max) 519, #pragma acc loop seq 524, #pragma acc loop seq 527, #pragma acc loop seq 504, Generating implicit firstprivate(enableBias,inBatchPitch,inChPitch,outBatchPitch,outImPitch,outChPitch) 519, Generating implicit firstprivate(coeffsHeight,coeffsWidth) 527, Generating implicit firstprivate(dilationHeight,startRowNumberInTensor,padVal,dilationWidth) ................... /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -usenvvm -nvvm70 -reloc /tmp/nvacceWgemkX4NEtn.gpu -computecap 86 -ptx /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvaccKWgeSc99ei3K.bin -ftz -cuda12020 /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/bin/tools/nvdd -dcuda /opt/nvidia/hpc_sdk/Linux_x86_64/23.7/cuda/12.2 -reloc -cuda12020 -fat src/tidl_conv2d_base.c -sm 86 /tmp/nvaccKWgeSc99ei3K.bin -compute 86 /tmp/nvacceWgem5n6NRPb.ptx -o /tmp/nvacceWgemyLVfjWr.fat NVC++/x86-64 Linux 23.7-0: compilation successful
非常感谢您提供的任何帮助或建议。