This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[参考译文] MSP430FR5994:MSP430上的矩阵乘法,带和不带 LEA

Guru**** 664280 points
Other Parts Discussed in Thread: MSP430FR5994
请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/1065973/msp430fr5994-matrix-multiplication-on-the-msp430-with-and-without-lea

部件号:MSP430FR5994

我正在尝试为 MSP430FR5994上的多个基质实施矩阵乘法。 我在论坛上提出了几个旧问题后,使用了上面提到的答案来编写我的实施代码。 其目的是复制神经网络层,因此,计算涉及输入矩阵的矩阵乘以另一个包含网络权重的矩阵,然后添加另一个包含神经网络偏置值的矩阵。 在执行这些操作时,我意识到需要对这些值进行量化,并在将输入,权重或偏差填入矩阵之前进行量化。 我目前遇到的问题是矩阵计算结果在存储到结果矩阵之前被右移1位15次。 我知道,这种行为与“_Q15”参数的处理方式一致,同时也查看了执行此转换的代码。 以下问题 中提供了一个消除这种转变的可行解决方案- https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/716353/msp430fr5992-msp-dsplib-msp_matrix_mpy_q15 -但是,此处未提及使用 MSP LEA 的解决方案。 我尝试了一些改变乘法函数的方法,它会使用 Int16/t/uint16_t 值而不是_Q15参数。 修改后的矩阵乘法函数-包含上述问题中提到的更改-如下所示:

/* --COPYRIGHT--,BSD
 * Copyright (c) 2016, Texas Instruments Incorporated
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * *  Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 * *  Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *
 * *  Neither the name of Texas Instruments Incorporated nor the names of
 *    its contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
 * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
 * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 * --/COPYRIGHT--*/

#include "../../include/DSPLib.h"

#if defined(MSP_USE_LEA)

msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst)
{
    uint16_t srcARows;
    uint16_t srcACols;
    uint16_t srcBRows;
    uint16_t srcBCols;
    msp_status status;
    MSP_LEA_MPYMATRIXROW_PARAMS *leaParams;

    /* Initialize the row and column sizes. */
    srcARows = params->srcARows;
    srcACols = params->srcACols;
    srcBRows = params->srcBRows;
    srcBCols = params->srcBCols;

#ifndef MSP_DISABLE_DIAGNOSTICS
    /* Check that column of A equals rows of B */
    if (srcACols != srcBRows) {
        return MSP_SIZE_ERROR;
    }

    /* Check that the data arrays are aligned and in a valid memory segment. */
    if (!(MSP_LEA_VALID_ADDRESS(srcA, 4) &
          MSP_LEA_VALID_ADDRESS(srcB, 4) &
          MSP_LEA_VALID_ADDRESS(dst, 4))) {
        return MSP_LEA_INVALID_ADDRESS;
    }

    /* Acquire lock for LEA module. */
    if (!msp_lea_acquireLock()) {
        return MSP_LEA_BUSY;
    }
#endif //MSP_DISABLE_DIAGNOSTICS

    /* Initialize LEA if it is not enabled. */
    if (!(LEAPMCTL & LEACMDEN)) {
        msp_lea_init();
    }

    /* Allocate MSP_LEA_MPYMATRIXROW_PARAMS structure. */
    leaParams = (MSP_LEA_MPYMATRIXROW_PARAMS *)msp_lea_allocMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t));

    /* Set status flag. */
    status = MSP_SUCCESS;

    /* Iterate through each row of srcA */
    while (srcARows--) {
        /* Set MSP_LEA_MPYMATRIXROW_PARAMS structure. */
        leaParams->rowSize = srcBRows;
        leaParams->colSize = srcBCols;
        leaParams->colVector = MSP_LEA_CONVERT_ADDRESS(srcB);
        leaParams->output = MSP_LEA_CONVERT_ADDRESS(dst);

        /* Load source arguments to LEA. */
        LEAPMS0 = MSP_LEA_CONVERT_ADDRESS(srcA);
        LEAPMS1 = MSP_LEA_CONVERT_ADDRESS(leaParams);

        /* Invoke the LEACMD__MPYMATRIXROW command with interrupts enabled. */
        LEAPMCB = LEACMD__MPYMATRIXROW | LEAITFLG1;

        /* Clear DSPLib flags, restore interrupts and enter LPM0. */
        msp_lea_ifg = 0;
        msp_lea_enterLPM();

#ifndef MSP_DISABLE_DIAGNOSTICS
        /* Check LEA interrupt flags for any errors. */
        if (msp_lea_ifg & LEACOVLIFG) {
            status = MSP_LEA_COMMAND_OVERFLOW;
            break;
        }
        else if (msp_lea_ifg & LEAOORIFG) {
            status = MSP_LEA_OUT_OF_RANGE;
            break;
        }
        else if (msp_lea_ifg & LEASDIIFG) {
            status = MSP_LEA_SCALAR_INCONSISTENCY;
            break;
        }
#endif //MSP_DISABLE_DIAGNOSTICS

        /* Increment srcA and dst pointers. */
        srcA += srcACols;
        dst += srcBCols;
    }

    /* Free MSP_LEA_MPYMATRIXROW_PARAMS structure. */
    msp_lea_freeMemory(sizeof(MSP_LEA_MPYMATRIXROW_PARAMS)/sizeof(uint32_t));

    /* Free lock for LEA module and return status. */
    msp_lea_freeLock();
    return status;
}

#else //MSP_USE_LEA

msp_status msp_matrix_mpy_q15(const msp_matrix_mpy_q15_params *params, const uint16_t *srcA, const uint16_t *srcB, uint16_t *dst)
{
    uint16_t cntr;
    uint16_t srcARows;
    uint16_t srcACols;
    uint16_t srcBRows;
    uint16_t srcBCols;
    uint16_t dst_row;
    uint16_t dst_col;
    uint16_t row_offset;
    uint16_t col_offset;
    uint16_t dst_row_offset;

    /* Initialize the row and column sizes. */
    srcARows = params->srcARows;
    srcACols = params->srcACols;
    srcBRows = params->srcBRows;
    srcBCols = params->srcBCols;

#ifndef MSP_DISABLE_DIAGNOSTICS
    /* Check that column of A equals rows of B */
    if (srcACols != srcBRows) {
        return MSP_SIZE_ERROR;
    }
#endif //MSP_DISABLE_DIAGNOSTICS

    /* In initialize loop counters. */
    cntr = 0;
    dst_row = 0;
    dst_col = 0;
    row_offset = 0;
    col_offset = 0;
    dst_row_offset = 0;

#if defined(__MSP430_HAS_MPY32__)
    /* If MPY32 is available save control context, set to fractional mode, set saturation mode. */
    uint16_t ui16MPYState = MPY32CTL0;
    MPY32CTL0 = MPYFRAC | MPYDLYWRTEN | MPYSAT;

    /* Loop through all srcA rows. */
    while(srcARows--) {
        /* Loop through all srcB columns. */
        while (dst_col < srcBCols) {
            /* Reset result accumulator. */
            MPY32CTL0 &= ~MPYC;
            RESLO = 0; RESHI = 0;
            
            /* Loop through all elements in srcA column and srcB row. */
            while(cntr < srcACols) {
                MACS = srcA[row_offset + cntr];
                OP2 = srcB[col_offset + dst_col];
                col_offset += srcBCols;
                cntr++;
            }
            
            /* Store the result */
            dst[dst_row_offset + dst_col] = RESHI * 32768 + RESLO;

            /* Update pointers. */
            dst_col++;
            cntr = 0;
            col_offset = 0;
        }

        /* Update pointers. */
        dst_row++;
        dst_col = 0;
        row_offset += srcACols;
        dst_row_offset += srcBCols;
    }

    /* Restore MPY32 control context, previous saturation state. */
    MPY32CTL0 = ui16MPYState;

#else //__MSP430_HAS_MPY32__
    uint32_t result;

    /* Loop through all srcA rows. */
    while(srcARows--) {
        /* Loop through all srcB columns. */
        while (dst_col < srcBCols) {
            /* Initialize accumulator. */
            result = 0;
            
            /* Loop through all elements in srcA column and srcB row. */
            while(cntr < srcACols) {
                result += (int32_t)srcA[row_offset + cntr] * (int32_t)srcB[col_offset + dst_col];
                col_offset += srcBCols;
                cntr++;
            }

            /* Saturate and store the result */
            dst[dst_row_offset + dst_col] = (int32_t)__saturate(result, INT32_MIN, INT32_MAX);

            /* Update pointers. */
            dst_col++;
            cntr = 0;
            col_offset = 0;
        }

        /* Update pointers. */
        dst_row++;
        dst_col = 0;
        row_offset += srcACols;
        dst_row_offset += srcBCols;
    }
#endif //__MSP430_HAS_MPY32__

    return MSP_SUCCESS;
}

#endif //MSP_USE_LEA

尽管将矩阵的输入类型更改为'uint16_t' ,并通过消除移动15修改结果的存储方式,但代码仍然无法正确计算整数格式的矩阵值。 矩阵乘法的完整代码如下:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <msp430.h>
#include "DSPLib.h"
#include "math.h"

#pragma DATA_SECTION(lea1, ".leaRAM")
#pragma DATA_SECTION(lea2, ".leaRAM")
#pragma DATA_SECTION(leadest, ".leaRAM")

DSPLIB_DATA(lea1, 4)
uint16_t lea1[2][2] = {{7, 2}, {1, 2}};
DSPLIB_DATA(lea2, 4)
uint16_t lea2[2][2] = {{4, 5}, {2,3}};
DSPLIB_DATA(leadest, 4)
uint16_t leadest[2][2];


volatile uint32_t cycleCount = 0;
int main()
{
        msp_status status;
        msp_matrix_mpy_q15_params mpyParams;

        WDTCTL = WDTPW + WDTHOLD;

        mpyParams.srcARows = 2;
        mpyParams.srcACols = 2;
        mpyParams.srcBRows = 2;
        mpyParams.srcBCols = 2;

        status = msp_matrix_mpy_q15(&mpyParams, *lea1, *lea2, *leadest);
        cycleCount = msp_benchmarkStop(MSP_BENCHMARK_BASE);
        msp_checkStatus(status);
        return 0;

}

我不知道如何处理正确的位移——可以删除它们,也可以更改函数,使矩阵乘法的结果是标准数学计算获得的原始整数值。 如果有人能帮助我解决一些可能的解决方案,我可以尝试并观察 MSP430的 bhevaior,这将非常有帮助。 请告诉我是否需要任何其他信息来提供更清晰的信息。 谢谢。

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    固定点乘法没有什么神秘之处。 结果将有与两个输入的总和相同的小数位数。 因此,如果用15个小数位乘两个数字,结果将是30。 因此,您必须将结果右移15位,以返回到15个小数位。

    执行该偏移后,您需要先检查溢出,然后再将32位整数转换为16。 想想在这种情况下该怎么办,因为这种情况几乎肯定会发生。 可能是在这种成倍增长的中间积累的某个地方,你不会注意到这种积累。

    例如,代码的以下部分:

                /* Loop through all elements in srcA column and srcB row. */
                while(cntr < srcACols) {
                    MACS = srcA[row_offset + cntr];
                    OP2 = srcB[col_offset + dst_col];
                    col_offset += srcBCols;
                    cntr++;
                }
                
                /* Store the result */
                dst[dst_row_offset + dst_col] = RESHI * 32768 + RESLO;
    

    这会试图(严重)将32位值填入16位孔。 甚至不移动二进制点,这意味着您感兴趣的几乎所有位都被抛出。

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    感谢对定点乘法的澄清。 但是,我一直在矩阵乘法中使用整数值,我不确定如何通过修改定义执行矩阵乘法函数的“msp_matrix 颠簸 Q15.c”文件来抵消这种变化。 我不确定是否要改变结果,因为多个值在右移15位时会得出相同的结果,而对所有整数结果应用右移15位可能不会返回原始值。 正因为如此,我想尝试修改乘法函数本身,以便从第一次调用此函数时就可以否定该位移。 但是,我提到的链接只解释了我们不使用 LEA 进行计算的情况,因为我的矩阵很大,我想使用 LEA 进行计算。 在这种情况下,更改乘法可能有什么想法,因为我只处理整数值,而不处理小数位的值?

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    我从未使用过 LEA,但在查看命令参考(sla850)后,它只支持定点类型。 我能看到的最接近您所需要的是 LEACMD_MAC,它接受 Q15作为输入,Q31作为输出。

    两个 Q15相乘的自然结果是 Q30。 因此,你必须将 Q31结果调整为正确的一位。

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    好的,我将看一下命令引用,还会检查使用此命令获得的内容。 如果我还有其他问题,我将为这些问题创建一个新的问题。 我将把这个问题标记为已解决。 感谢您的回复。