[参考译文] TMS320F2.8335万：tanhf()和矩阵乘法的计算优化

admin

Other Parts Discussed in Thread: CONTROLSUITE

请注意，本文内容源自机器翻译，可能存在语法或其它翻译错误，仅供参考。如需获取准确内容，请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/microcontrollers/c2000-microcontrollers-group/c2000/f/c2000-microcontrollers-forum/616630/tms320f28335-calculation-optimization-for-tanhf-and-matrix-multiply

部件号：TMS320F2.8335万
主题：controlSUITE中讨论的其他部件

大家好，

我目前正在处理电机控制应用程序。我想用模糊神经网络控制器替换PI控制器。在每个中断中，我需要执行3个矩阵乘法并使用tanhf()来计算结果。

在代码中，我将元素乘以元素，并使用math.lib中的tanhf()。但中断无法在0.1 ms内完成。我是否有任何方法可以优化代码或需要更改为更好的微芯片？

谢谢

杨孙

// match_multiply(*W1,inputlayer, hiddenlayer1,6,5)；
P=&W1[0][0];
Q=&inputlayer[0]；
hiddenlayer1[0]=*p** q +*(p+1)**(q+1)+*(p+2)**(p+2)**(q+3)+*(p+3)+*(p+4)**(q+4)；
hiddenlayer1[1]=*(p+5)** q +*(p+6)**(q+1)+*(p+7)**(q+2)+*(p+8)**(q+3)+*(p+9)**(q+4)；
hiddenlayer1[2]=*(p+10)** q +*(p+11)**(q+1)+*(p+12)**(q+2)+*(p+13)**(q+3)+*(p+14)**(q+4)；
hiddenlayer1[3]=*(p+15)** q +*(p+16)**(q+1)+*(p+17)**(q+2)+*(p+18)**(q+3)+*(p+19)**(q+4)；
hiddenlayer1[4]=*(p+20)** q +*(p+21)**(q+1)+*(p+22)**(q+2)+*(p+23)**(q+3)+*(p+24)**(q+4)；
hiddenlayer1[5]=*(p+25)** q +*(p+26)**(q+1)+*(p+27)**(q+2)+*(p+28)**(q+3)+*(p+29)**(q+4)；

hiddenlayer1[0]=(float)tanhf(hiddenlayer1[0])；
hiddenlayer1[1]=(float)tanhf(hiddenlayer1[1])；
hiddenlayer1[2]=(float)tanhf(hiddenlayer1[2])；
hiddenlayer1[3]=(float)tanhf(hiddenlayer1[3])；
hiddenlayer1[4]=(float)tanhf(hiddenlayer1[4])；
hiddenlayer1[5]=(float)tanhf(hiddenlayer1[5])；

//第二个隐藏图层
//match_multiply(*W2, hiddenlayer1, hiddenlayer2,6,7);
P=&W2[0][0];
q=&hiddenlayer1[0]；
hiddenlayer2[0]=*p** q +*(p+1)**(q+1)+*(p+2)**(p+3)**(q+3)+*(p+4)**(q+4)+*(p+5)**(q+5)+*(p+6)**
hiddenlayer2[1]=*(p+7)** q +*(p+8)**(q+1)+*(p+9)**(q+2)+*(p+10)**(q+3)+*(p+11)**(q+4)+*(p+12)**(q+5)+*(p+13)；
hiddenlayer2[2]=*(p+14)** q +*(p+15)**(q+1)+*(p+16)**(q+2)+*(p+17)**(q+3)+*(p+18)**(q+4)+*(p+19)**(q+5)+*(p+20)；
hiddenlayer2[3]=*(p+21)** q +*(p+22)**(q+1)+*(p+23)**(q+2)+*(p+24)**(q+3)+*(p+25)**(q+4)+*(p+26)**(q+5)+*(p+27)；
hiddenlayer2[4]=*(p+28)** q +*(p+29)**(q+1)+*(p+30)**(q+2)+*(p+31)**(q+3)+*(p+32)**(q+4)+*(p+33)**(q+5)+*(p+34)；
hiddenlayer2[5]=*(p+35)** q +*(p+36)**(q+1)+*(p+37)**(q+2)+*(p+38)**(q+3)+*(p+39)**(q+4)+*(p+40)**(q+5)+*(p+41)；

hiddenlayer2[0]=(float)tanhf(hiddenlayer2[0])；
hiddenlayer2[1]=(float)tanhf(hiddenlayer2[1])；
hiddenlayer2[2]=(float)tanhf(hiddenlayer2[2])；
hiddenlayer2[3]=(float)tanhf(hiddenlayer2[3])；
hiddenlayer2[4]=(float)tanhf(hiddenlayer2[4])；
hiddenlayer2[5]=(float)tanhf(hiddenlayer2[5])；

//第三个隐藏图层
P=&W3[0][0];
q=&hiddenlayer2[0]；
outputlayer [0]=* p** q +*(p+1)**(q+1)+*(p+2)**(q+2)+*(p+3)**(q+3)+*(p+4)**(q+4)+*(p+5)**(q+5)+*(p+6)**(q+6)；
outputlayer [1]=*(p+7)** q +*(p+8)**(q+1)+*(p+9)**(q+2)+*(p+10)**(q+3)+*(p+11)**(q+4)+*(p+12)**(q+5)+*(p+13)**(p+13)；
25.7196.4229万9223366</xmt-block>922.3366万；
25.7196.4229万9223366</xmt-block>922.3366万；

8 年多前

0 admin 8 年多前

TI__Guru**** 2540720 points

请注意，本文内容源自机器翻译，可能存在语法或其它翻译错误，仅供参考。如需获取准确内容，请参阅链接中的英语原文或自行翻译。

杨孙先生，您好！

我认为在F2.8335万上不可能实现这一点。这些14 x tanh()函数使用RTS库将需要大约726个周期，在150 MHz时约为68个周期，所以您的大量时间预算都是在这方面。

如果您可以更改为F2.8377万D (例如)，则每个内核上的频率为200 MHz，以及TMU的优点。你可以用它的指数值替换tanh()。而不是：

x =(浮子)塔尼(伏)；

来这里吧

y = expf(-2*v);
x =(1 - y)/(1 + y)；

指数仍然很昂贵，但TMU的除法并不昂贵，所以你可以在大约398个周期内完成，而不是726个周期。我们大概有28个人。

矩阵乘法在C中的成本将很高。您的第二个方程式看起来像一个(6x7)(7x1)产品，在我的机器上，它消耗1.3826万周期(大约92 us AD 150 MHz)。由于您提前了解矩阵尺寸，因此可以通过多种方式加快速度。最快的方法是手动编码汇编，但您可以先尝试使用C编译器优化器(如果尚未尝试)。

我们现在没有矩阵库，但这是我们在作品中的内容。除了乘法之外，您还在执行哪些其他矩阵运算？

此致，

Richard

0 admin 8 年多前

TI__Guru**** 2540720 points

请注意，本文内容源自机器翻译，可能存在语法或其它翻译错误，仅供参考。如需获取准确内容，请参阅链接中的英语原文或自行翻译。

您好，Richard：

感谢您的响应，我只需要处理3乘法，它们是(6*5)*(5*1)，(6*7)*(7*1)和(2*7)*(7*1)。尺寸是预先知道的。它们在我的代码中被当作30，42和14的乘法处理。我不是很懂编写汇编语言，我在控制套件中是否可以找到任何示例？

此致，

杨孙

0 admin 8 年多前

TI__Guru**** 2540720 points

请注意，本文内容源自机器翻译，可能存在语法或其它翻译错误，仅供参考。如需获取准确内容，请参阅链接中的英语原文或自行翻译。

杨孙先生，您好！

是的，您还将分别添加24，36和12项任务。我很遗憾controlSUITE中没有任何矩阵乘法函数。但是，浮点DSP库中有一些程序集编码矢量乘法函数，您可以在目录中找到这些函数：

...\controlSUITE\libs\DSP\FPU\v1_50_00_00\sources\vector

我没有尝试这样使用它们，但您或许可以将每个矩阵术语视为真正的矢量产品。

正如我所说，我们正在研究优化矩阵操作支持，但尚未设定发布日期。我真的很抱歉没有为您提供更多信息。

此致，

Richard

0 admin 8 年多前

TI__Guru**** 2540720 points

请注意，本文内容源自机器翻译，可能存在语法或其它翻译错误，仅供参考。如需获取准确内容，请参阅链接中的英语原文或自行翻译。

您好，Richard：
我用近似值替换了tanh函数，现在代码可以工作了。如果DSP将来能够支持矩阵计算，那将是非常好的。

谢谢！
杨孙

C2000™︎ 微控制器（参考译文帖）

C2000™︎ 微控制器（参考译文帖）(Read Only)

[参考译文] TMS320F2.8335万：tanhf()和矩阵乘法的计算优化