This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[参考译文] TMS320C6678:C6000编译器 V8.3.2生成的代码比 v7.4.1效率低

Guru**** 1758495 points
请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1081021/tms320c6678-c6000-compiler-v8-3-2-generates-less-efficient-code-than-v7-4-1

部件号:TMS320C6678
“线程:测试”中讨论的其它部件

你(们)好  

 使用 C6000 V8.3.2构建的函数的性能低于使用 v7.4.21构建的同一源代码。 下面是函数源代码

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
* test_adap_emph.cpp
*
* Created on: Feb 28, 2022
* Author: Wai.Kwok.Law
*/
#include "image_process_be_filter_aniso2d.h"
#include "nam_tool.h"
#include "image_process_be.h"
#include "image_process_be_main.h"
#include <c6x.h>
using XtNamApi::RangeClip;
#pragma CODE_SECTION(".sect_DDR2_code")
void adap_emph(ip_int* restrict adap,
const ip_int* restrict lineBufOrig,
const ip_int* restrict lineBufFilter,
const ip_int* restrict range,
const tbl_int* restrict wghtTblS,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#ifndef NAM_TOOL_H
#define NAM_TOOL_H
namespace XtNamApi
{
/*!
* zero or not decision
* @param[in] value evaluated value
* @retval true zero
* @retval false not zero
*/
template <typename T>
static inline bool IsZero(T value)
{
return ( value == static_cast<T>(0) );
}
/*!
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

使用两个编译器版本生成的汇编代码是  

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
;******************************************************************************
;* TMS320C6x C/C++ Codegen PC v7.4.21 *
;* Date/Time created: Mon Feb 28 22:34:52 2022 *
;******************************************************************************
.compiler_opts --abi=eabi --c64p_l1d_workaround=off --endian=little --hll_source=on --long_precision_bits=32 --mem_model:code=near --mem_model:const=data --mem_model:data=far --object_format=elf --silicon_version=6600 --symdebug:dwarf --symdebug:dwarf_version=3
;******************************************************************************
;* GLOBAL FILE PARAMETERS *
;* *
;* Architecture : TMS320C66xx *
;* Optimization : Enabled at level 3 *
;* Optimizing for : Speed *
;* Based on options: -o3, no -ms *
;* Endian : Little *
;* Interrupt Thrshld : Disabled *
;* Data Access Model : Far *
;* Pipelining : Enabled *
;* Speculate Loads : Enabled with threshold = 0 *
;* Memory Aliases : Presume are aliases (pessimistic) *
;* Debug Info : DWARF Debug w/Optimization *
;* *
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
;******************************************************************************
;* G3 TMS320C6x C/C++ Codegen PC v8.3.2 *
;* Date/Time created: Mon Feb 28 22:33:50 2022 *
;******************************************************************************
.compiler_opts --abi=eabi --array_alignment=8 --c64p_l1d_workaround=off --endian=little --hll_source=on --long_precision_bits=32 --mem_model:code=near --mem_model:const=data --mem_model:data=far --object_format=elf --silicon_version=6600 --symdebug:dwarf --symdebug:dwarf_version=4
;******************************************************************************
;* GLOBAL FILE PARAMETERS *
;* *
;* Architecture : TMS320C66xx *
;* Optimization : Enabled at level 3 *
;* Optimizing for : Speed *
;* Based on options: -o3, no -ms *
;* Endian : Little *
;* Interrupt Thrshld : Disabled *
;* Data Access Model : Far *
;* Pipelining : Enabled *
;* Speculate Loads : Enabled with threshold = 0 *
;* Memory Aliases : Presume are aliases (pessimistic) *
;* Debug Info : DWARF Debug *
;* *
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

编译命令是

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"C:/ti/c6000_7.4.21/bin/cl6x" -mv6600 --abi=eabi -O3 -g --optimize_with_debug=on
--include_path="C:/ti/c6000_7.4.21/include" --include_path="C:/ti/c6000_7.4.21/dsplib_c66x_3_4_0_0/packages" --include_path="C:/ti/c6000_7.4.21/dsplib_c66x_3_4_0_0/packages/ti/dsplib" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_euclidNorm_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNcmpy1xN_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNmpy1xcNIQx_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_maxAbs16_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_norm64_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_shiftAndRndInt32_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_unpack16WithShift_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/IirFilterMex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNdiv_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_dotProd_mex" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNtoIQx_mex" --include_path="C:/git/phx_main_1/DevProjects/App/include" --include_path="C:/git/phx_main_1/DevProjects/App/Test/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/CoreMgr/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/Integ/test" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/InputMgr/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/OutputMgr/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/Test" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/IQMath/include" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/ImgProcess/include" --include_path="C:/git/phx_main_1/DevProjects/App/UlsLib/RSC/include" --include_path="C:/git/phx_main_1/DevProjects/App/UlsLib/RSC/test" --include_path="C:/git/phx_main_1/DevProjects/App/HwMgr/EDMA/include" --include_path="C:/git/phx_main_1/DevProjects/App/HwMgr/Misc/include" --include_path="C:/git/phx_main_1/DevProjects/App/HwMgr/PCIe/include" --include_path="C:/git/phx_main_1/DevProjects/App/HwMgr/SPCTL/include" --include_path="C:/git/phx_main_1/DevProjects/App/HwMgr/SRIO/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/Integ/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/MultiBeam/include" --include_path="C:/git/phx_main_1/DevProjects/App/Color/include" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/Math/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/CBufMgr/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/CineDataPack/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/Diagnostics/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/EDMA_API/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/Fifo/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/CounterExtender/include" --include_path="C:/git/phx_main_1/DevProjects/App/Components/FrameHistory/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/Integ/include" --include_path="C:/git/phx_main_1/DevProjects/App/Framework/CineMgr/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/MultiBeam/test" --include_path="C:/git/phx_main_1/DevProjects/App/Color/Integ/include" --include_path="C:/git/phx_main_1/DevProjects/App/Color/Integ/test" --include_path="C:/git/phx_main_1/DevProjects/App/Color/CFMP_Fir/include" --include_path="C:/git/phx_main_1/DevProjects/App/Doppler/include" --include_path="C:/git/phx_main_1/DevProjects/App/Doppler/Hilbert/include" --include_path="C:/git/phx_main_1/DevProjects/App/Doppler/WinFFT/include" --include_path="C:/git/phx_main_1/DevProjects/App/Doppler/GapFill/include" --include_path="C:/git/phx_main_1/DevProjects/App/Doppler/Integ/Test" --include_path="C:/git/phx_main_1/DevProjects/App/MMode/Integ/include" --include_path="C:/git/phx_main_1/DevProjects/App/MMode/Integ/test" --include_path="C:/git/phx_main_1/DevProjects/App/ECG/Integ/include" --include_path="C:/git/phx_main_1/DevProjects/App/UlsLib/AutoGain/include" --include_path="C:/git/phx_main_1/DevProjects/App/UlsLib/NeedleVis/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/ClearVisualization/common/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/ClearVisualization/ImageProcess/ImageProcessBe/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/ClearVisualization/param/include" --include_path="C:/git/phx_main_1/DevProjects/App/BMode/ClearVisualization/ImgProcessLib/include" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/UlsMath/include" --include_path="C:/git/phx_main_1/DevProjects/App/Lib/Utils/include"
--relaxed_ansi
--gcc
--define=_DEBUG
--define=DOPPLER_USE_STP_FUNCTIONS
--define=SOC_C6678
--define=VERSION_FOR_DSP
--define=_OS_SUPPORT
--define=__TMS320C6X__
--define=_TMS320C6600
--define=_TMS320C6700
--display_error_number
--diag_wrap=off
--diag_warning=225
--debug_software_pipeline
--mem_model:data=far
--optimizer_interlist
--strip_coff_underscore
--advice:performance=all
--preproc_with_compile
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"C:/ti/ti-cgt-c6000_8.3.2/bin/cl6x" -mv6600 -O3
--include_path="C:/ti/ti-cgt-c6000_8.3.2/include" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_euclidNorm_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNcmpy1xN_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNmpy1xcNIQx_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_maxAbs16_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_norm64_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_shiftAndRndInt32_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_unpack16WithShift_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/IirFilterMex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNdiv_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_dotProd_mex" --include_path="C:/git/phx_main/DevProjects/App/Lib/MexImport/DSP_IQMath_IQNtoIQx_mex" --include_path="C:/git/phx_main/DevProjects/App/include" --include_path="C:/git/phx_main/DevProjects/App/Test/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/CoreMgr/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/Integ/test" --include_path="C:/git/phx_main/DevProjects/App/Framework/InputMgr/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/OutputMgr/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/Test" --include_path="C:/git/phx_main/DevProjects/App/Lib/IQMath/include" --include_path="C:/git/phx_main/DevProjects/App/Lib/ImgProcess/include" --include_path="C:/git/phx_main/DevProjects/App/UlsLib/RSC/include" --include_path="C:/git/phx_main/DevProjects/App/UlsLib/RSC/test" --include_path="C:/git/phx_main/DevProjects/App/HwMgr/EDMA/include" --include_path="C:/git/phx_main/DevProjects/App/HwMgr/Misc/include" --include_path="C:/git/phx_main/DevProjects/App/HwMgr/PCIe/include" --include_path="C:/git/phx_main/DevProjects/App/HwMgr/SPCTL/include" --include_path="C:/git/phx_main/DevProjects/App/HwMgr/SRIO/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/Integ/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/MultiBeam/include" --include_path="C:/git/phx_main/DevProjects/App/Color/include" --include_path="C:/git/phx_main/DevProjects/App/Lib/Math/include" --include_path="C:/git/phx_main/DevProjects/App/Components/include" --include_path="C:/git/phx_main/DevProjects/App/Components/CBufMgr/include" --include_path="C:/git/phx_main/DevProjects/App/Components/CineDataPack/include" --include_path="C:/git/phx_main/DevProjects/App/Components/Diagnostics/include" --include_path="C:/git/phx_main/DevProjects/App/Components/EDMA_API/include" --include_path="C:/git/phx_main/DevProjects/App/Components/Fifo/include" --include_path="C:/git/phx_main/DevProjects/App/Components/CounterExtender/include" --include_path="C:/git/phx_main/DevProjects/App/Components/FrameHistory/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/Integ/include" --include_path="C:/git/phx_main/DevProjects/App/Framework/CineMgr/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/MultiBeam/test" --include_path="C:/git/phx_main/DevProjects/App/Color/Integ/include" --include_path="C:/git/phx_main/DevProjects/App/Color/Integ/test" --include_path="C:/git/phx_main/DevProjects/App/Color/CFMP_Fir/include" --include_path="C:/git/phx_main/DevProjects/App/Doppler/include" --include_path="C:/git/phx_main/DevProjects/App/Doppler/Hilbert/include" --include_path="C:/git/phx_main/DevProjects/App/Doppler/WinFFT/include" --include_path="C:/git/phx_main/DevProjects/App/Doppler/GapFill/include" --include_path="C:/git/phx_main/DevProjects/App/Doppler/Integ/Test" --include_path="C:/git/phx_main/DevProjects/App/MMode/Integ/include" --include_path="C:/git/phx_main/DevProjects/App/MMode/Integ/test" --include_path="C:/git/phx_main/DevProjects/App/UlsLib/AutoGain/include" --include_path="C:/git/phx_main/DevProjects/App/UlsLib/NeedleVis/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/ClearVisualization/common/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/ClearVisualization/ImageProcess/ImageProcessBe/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/ClearVisualization/param/include" --include_path="C:/git/phx_main/DevProjects/App/BMode/ClearVisualization/ImgProcessLib/include" --include_path="C:/git/phx_main/DevProjects/App/Lib/UlsMath/include" --include_path="C:/git/phx_main/DevProjects/App/Lib/Utils/include"
--advice:performance=all
--define=_DEBUG
--define=DOPPLER_USE_STP_FUNCTIONS
--define=SOC_C6678
--define=VERSION_FOR_DSP
--define=_OS_SUPPORT
--define=__TMS320C6X__
--define=_TMS320C6600
--define=_TMS320C6700
-g
--symdebug:dwarf_version=4
--relaxed_ansi
--diag_warning=225
--debug_software_pipeline
--mem_model:data=far
--asm_listing
--c_src_interlist
--strip_coff_underscore
--preproc_with_compile
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

装配体代码文件中估计的周期数为4400 (v7.4.2) 和5268 (V8.3.21)。 它们与目标处理器 C6678上测得的周期非常匹配。 较新的编译器(V8.3.21)的周期比较旧版本的编译器(v7.4.2)多800个。 这是一个内核函数,每秒调用多次。  

什么会导致 V8.3.21生成效率较低的代码? 我们非常感谢您的任何帮助。

谢谢!

惠国法律

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    惠国法律

    你好!

    我已将您的查询转发给编译专家。 他们很快就会回来。

    --

    同时,请查看这些文档(不确定是否已引用)  

    1. 优化 C66x DSP 上的环路 - https://www.ti.com/lit/an/sprabg7/sprabg7.pdf

    2. TMS320C6000优化 C/C++编译器 V8.3.x 用户指南(修订版 d) -  https://www.ti.com/lit/ug/sprui04d/sprui04d.pdf

    3. TMS320C6000汇编语言工具 V8.3.x 用户指南(修订版 d) - https://www.ti.com/lit/ug/sprui03d/sprui03d.pdf

    此致

    Shankari G

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    不幸的是,我无法重现该问题,因为我缺少头文件。  对于源文件  test_adap_emph.cpp, 请按照文章 “如何提交编译器测试用例”中的说明进行操作。

    同时,请升级到最新的8.3.x 版本的编译器,该版本当前为8.3.12版。  一些性能问题已经得到解决,这可能会解决问题。  如果没有,请添加编译器选项--legacy。   当为 C6000 CGT v7.4.x 或更早版本调谐的传统代码库在使用较新的编译器版本时表现出性能下降时,此选项可能会有所帮助。

    谢谢,此致,

    乔治

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    你好,乔治

    下面是预处理文件。 编译器选项可以在我的此线程的第一条消息中找到。

    e2e.ti.com/.../test_5F00_adap_5F00_emph.pp.txt

    升级到 V8.3.12仍会生成装配体代码,其周期估计数等于5268。 但是,添加选项--legacy 会生成装配体代码,其周期估计数等于4400,与 v7.4.21生成的代码相同。  

    除了该线程中发布的函数外,在将编译器从 v7.4.21升级到 V8.3.12时,还发现另外5个函数性能下降。 我将检查--legacy 能否修复其他功能。

    传统优化器比最新的编译器更好的原因是什么? 这是否也是其他人报告的常见问题? 我仍然想了解导致两个版本编译器之间差异的原因,这样我就可以在未来的代码开发中更加关注。

    谢谢!

    惠国法律

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    感谢您的测试案例。  我可以重现您看到的相同结果。

    [引用 userid="314351" url="~/support/processors-group/processors/f/processorser-forum/1010821/tms320c678-C6000-compiler-v8-3-2 generate-lecurity-code-than v7-4-1/4002637#4002637"]传统优化器比最新的编译器更好的任何原因[引用]

    有时会发生这种情况。  这就是为什么——遗产是可用的。  在软件流水线中调度指令是一个困难的问题。  在解决此类问题时,编译器使用启发法。   启发式算法是一种解决特定问题的算法,但在  所有情况下都可能无法提供最佳解决方案。 在某些情况下,使用启发式方法是因为在合理的时间(或空间)内,每个情况下都无法以最佳方式解决特定问题。  编译器的基础结构在7.4.x 版和8.3.x 版之间发生了巨大变化  启发法的工作和互动方式也发生了变化。  在许多情况下,8.3.x 的性能也或更高。  但是,正如您所看到的,在某些情况下,7.4.x 的效果会更好。  在这些情况下,使用--legacy 通常会导致8.3.x 生成与7.4.x 相同的代码

    谢谢,此致,

    乔治

x 出现错误。请重试或与管理员联系。