我的dsp是dm8168,dsp核是c674x,我有段代码,如下:
for (int i = 0; i < nHeight; i++)
m_ppRecIntImgBuffer[i] = &m_pRecIntImgBuffer[i * nWidth];
用了软件流水后展开的代码如下:
4236 ;* ii = 1 Schedule found with 6 iterations in parallel
4237 ;*
4238 ;* Register Usage Table:
4239 ;* +-----------------------------------------------------------------+
4240 ;* |AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB|
4241 ;* |00000000001111111111222222222233|00000000001111111111222222222233|
4242 ;* |01234567890123456789012345678901|01234567890123456789012345678901|
4243 ;* |--------------------------------+--------------------------------|
4244 ;* 0: | ***** | * |
4245 ;* +-----------------------------------------------------------------+
4246 ;*
4247 ;* Done
4248 ;*
4249 ;* Loop will be splooped
4250 ;* Collapsed epilog stages : 0
4251 ;* Collapsed prolog stages : 0
4252 ;* Minimum required memory pad : 0 bytes
4253 ;*
4254 ;* Minimum safe trip count : 1
4255 ;* Min. prof. trip count (est.) : 2
4256 ;*
4257 ;* Mem bank conflicts/iter(est.) : { min 0.000, est 0.000, max 0.000 }
4258 ;* Mem bank perf. penalty (est.) : 0.0%
4259 ;*
4260 ;*
4261 ;* Total cycles (est.) : 5 + trip_cnt * 1
4262 ;*----------------------------------------------------------------------------*
。。。
4284 00000a0a $C$L64: ; PIPED LOOP KERNEL
4285 00000a0a $C$DW$L$_ZNK14CIntegralImage15ResizeGrayImageERS_f$13$B:
4286 .dwpsn file "../IntegralImage.cpp",line 137,column 0,is_stmt,isa 0
4287 ; EXCLUSIVE CPU CYCLES: 1
4288
4289 00000a0a 6ce6 SPMASK D1,L2
4290 00000a0c 9247 || MV .L2X A4,B4
4291 00000a10 02306275 || STW .D1T1 A4,*+A12(12) ; |134|
4292 00000a0e 2760 || ADD .L1 1,A6,A6 ; |136| (P) <0,0> ^
4293 00000a14 0218e800 || MPY32 .M1 A7,A6,A4 ; |137| (P) <0,0> ^
4294
4295 00000a18 00004000 NOP 3
4296 00000a20 01949c40 ADDAW .D1 A5,A4,A3 ; |137| (P) <0,4>
4297
4298 00000a24 5c67 SPKERNEL 5,0
4299 00000a26 0c35 || STW .D2T1 A3,*B4++ ; |137| <0,5>
我很难相信软件流水后代码里还有nop,我认为应该是此段代码没有被软件流水展开,这段代码会被拷贝在sploop buffer里,然后在这个buffer里再展开。请问是这样的吗?如果是这样的,我怎么知道这段代码的效率有多高?还是否需要进一步优化?