This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5716: DSP和M4核启动失败问题

Part Number: AM5716
  1. 开发环境描述:

内核版本:linux-rt-4.19.94

SDK版本:processor_sdk_rtos_am57xx_08_01_00_09

  1. 出现问题场景:

最近一段时间里,我们遇到过两次AM5716启动过程中DSP和M4核启动失败问题。

第一次遇到问题时,通过应用程序的日志发现M4核IPC通讯不上,尝试从trace日志中寻找错误信息,结果发现没有生成trace日志。

怀疑M4核启动失败,查看M4核固件文件大小是否正常。

Kill掉lad_dra7xx进程,手动执行unbind和bind重新尝试拉起M4核。发现M4核可以正常启动。

  

  1. 分析过程:

出问题时的部分dmesg信息如下:

 

故障设备重启后的部分dmesg信息如下:

       DSP和M4核固件均存放在linux文件系统 /lib/firmware目录下。通过对比两次启动日志,怀疑出问题时,根文件系统尚未挂载导致实时核加载固件失败。

      

第二次遇到问题时,现象也是trace日志没有生成,初步判断是DSP核启动失败。区别是现场当时尝试重新拉起DSP核,仍然拉不起来。由于设备已经重启,未保留dmesg信息,仅保留了lad_dra7xx进程生成的lad.txt。

[0.446170]
Initializing LAD... [0.446524] NameServer_setup: entered, refCount=0
[0.446582] NameServer_setup: creating listener thread
[0.446699] NameServer_setup: exiting, refCount=1
[0.446774] NameServer_create(): 'GateMP'
[0.446799]
opening FIFO: /tmp/LAD/LADCMDS
[0.449104] listener_cb: Entered Listener thread.
[0.449124] NameServer: waiting for unblockFd: 0, and socks: maxfd: 0
[5.467336] Retrieving command...
[5.467621]
LAD_CONNECT:
[5.467634] client FIFO name = /tmp/LAD/689
[5.467641] client PID = 689
[5.467648] assigned client handle = 0
[5.467680] FIFO /tmp/LAD/689 created
[5.467724]
WARN: Client /tmp/LAD/689 has not yet opened, will retry
[5.467965] FIFO /tmp/LAD/689 opened for writing
[5.468005] sent response
[5.468013] DONE
[5.468019] Retrieving command...
[5.468133] Sending response...
[5.468153] Retrieving command...
[5.468208] LAD_MULTIPROC_GETCONFIG: calling MultiProc_getConfig()...
[5.468221] MultiProc_getConfig() - 5 procs
[5.468227] # processors in cluster: 5
[5.468233] cluster baseId: 0
[5.468238] ProcId 0 - "HOST"
[5.468244] ProcId 1 - "IPU2"
[5.468250] ProcId 2 - "IPU1"
[5.468256] ProcId 3 - "DSP2"
[5.468261] ProcId 4 - "DSP1"
[5.468267] status = 0
[5.468273] DONE
[5.468278] Sending response...
[5.468292] Retrieving command...
[5.468347] LAD_NAMESERVER_SETUP: calling NameServer_setup()...
[5.468357] NameServer_setup: entered, refCount=1
[5.468364] NameServer_setup: already setup
[5.468369] NameServer_setup: exiting, refCount=2
[5.468375] status = 1
[5.468381] DONE
[5.468386] Sending response...
[5.468400] Retrieving command...
[5.468451] LAD_MESSAGEQ_GETCONFIG: calling MessageQ_getConfig()...
[5.468461] status = 0
[5.468467] DONE
[5.468472] Sending response...
[5.468486] Retrieving command...
[5.468536] LAD_MESSAGEQ_SETUP: calling MessageQ_setup()...
[5.468546] MessageQ_setup: entered, refCount=0
[5.468553] NameServer_create(): 'MessageQ'
[5.468564] MessageQ_setup: exiting, refCount=1
[5.468571] status = 0
[5.468576] DONE
[5.468581] Sending response...
[5.468596] Retrieving command...
[5.468876] NameServer_attach: --> procId=1, refCount=0
[5.468905] NameServer_attach: created send socket: 5, procId 1
[5.468935] NameServer_attach: connect failed: procId=1, errno=22 (Invalid argument)
[5.468953] closing send socket: 5
[5.468979] NameServer_attach: <-- refCount=0, status=-1
[5.468987] Sending response...
[5.469005] Retrieving command...
[5.469055] NameServer_attach: --> procId=2, refCount=0
[5.469074] NameServer_attach: created send socket: 5, procId 2
[5.469085] NameServer_attach: connect failed: procId=2, errno=22 (Invalid argument)
[5.469094] closing send socket: 5
[5.469113] NameServer_attach: <-- refCount=0, status=-1
[5.469121] Sending response...
[5.469137] Retrieving command...
[5.469184] NameServer_attach: --> procId=3, refCount=0
[5.469202] NameServer_attach: created send socket: 5, procId 3
[5.469212] NameServer_attach: connect failed: procId=3, errno=22 (Invalid argument)
[5.469221] closing send socket: 5
[5.469240] NameServer_attach: <-- refCount=0, status=-1
[5.469248] Sending response...
[5.469263] Retrieving command...
[5.469310] NameServer_attach: --> procId=4, refCount=0
[5.469327] NameServer_attach: created send socket: 5, procId 4
[5.469337] NameServer_attach: connect failed: procId=4, errno=22 (Invalid argument)
[5.469346] closing send socket: 5
[5.469364] NameServer_attach: <-- refCount=0, status=-1
[5.469373] Sending response...
[5.469387] Retrieving command...
[6.111141] LAD_MESSAGEQ_CREATE: calling MessageQ_create(0x2d7b0, 0x2d7d0)...
[6.111162] MessageQ_create: creating 'Master:MsgQ:00'
[6.111173] MessageQ_create: returning obj=0x32640, qid=0x80
[6.111180] status = 0
[6.111187] DONE
[6.111192] Sending response...
[6.111210] Retrieving command...
[6.688838] LAD_MESSAGEQ_ANNOUNCE: calling MessageQ_announce(0x2d7b0, 0x32640)...
[6.688856] MessageQ_announce: announcing 0x32640
[6.688868] NameServer_add: Entered key: 'Master:MsgQ:00', data: 0x80
[6.688877] status = 0
[6.688883] DONE
[6.688888] Sending response...
[6.688927] Retrieving command...
[6.689214] LAD_MESSAGEQ_CREATE: calling MessageQ_create(0x2d7b0, 0x2d7d0)...
[6.689227] MessageQ_create: creating 'Goose:MsgQ:01'
[6.689279] MessageQ_create: returning obj=0x326b0, qid=0x81
[6.689288] status = 0
[6.689294] DONE
[6.689299] Sending response...
[6.689314] Retrieving command...
[6.689586] LAD_MESSAGEQ_ANNOUNCE: calling MessageQ_announce(0x2d7b0, 0x326b0)...
[6.689597] MessageQ_announce: announcing 0x326b0
[6.689605] NameServer_add: Entered key: 'Goose:MsgQ:01', data: 0x81
[6.689612] status = 0
[6.689618] DONE
[6.689623] Sending response...
[6.689638] Retrieving command...
[6.689734] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[6.689747] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[6.689755] NameServer_getRemote: no socket connection to processor 1
[6.689761] NameServer_getRemote: no socket connection to processor 2
[6.689767] NameServer_getRemote: no socket connection to processor 3
[6.689773] NameServer_getRemote: no socket connection to processor 4
[6.689779] value = 0x81
[6.689806] status = -5
[6.689813] DONE
[6.689818] Sending response...
[6.689832] Retrieving command...
[7.690062] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[7.690088] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[7.690096] NameServer_getRemote: no socket connection to processor 1
[7.690103] NameServer_getRemote: no socket connection to processor 2
[7.690109] NameServer_getRemote: no socket connection to processor 3
[7.690115] NameServer_getRemote: no socket connection to processor 4
[7.690120] value = 0x81
[7.690126] status = -5
[7.690132] DONE
[7.690137] Sending response...
[7.690155] Retrieving command...
[8.690389] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[8.690425] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[8.690434] NameServer_getRemote: no socket connection to processor 1
[8.690441] NameServer_getRemote: no socket connection to processor 2
[8.690447] NameServer_getRemote: no socket connection to processor 3
[8.690453] NameServer_getRemote: no socket connection to processor 4
[8.690459] value = 0x81
[8.690465] status = -5
[8.690470] DONE
[8.690476] Sending response...
[8.690492] Retrieving command...
[9.690703] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[9.690728] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[9.690737] NameServer_getRemote: no socket connection to processor 1
[9.690744] NameServer_getRemote: no socket connection to processor 2
[9.690751] NameServer_getRemote: no socket connection to processor 3
[9.690757] NameServer_getRemote: no socket connection to processor 4
[9.690762] value = 0x81
[9.690768] status = -5
[9.690774] DONE
[9.690779] Sending response...
[9.690818] Retrieving command...
[10.691047] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[10.691073] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[10.691082] NameServer_getRemote: no socket connection to processor 1
[10.691089] NameServer_getRemote: no socket connection to processor 2
[10.691096] NameServer_getRemote: no socket connection to processor 3
[10.691102] NameServer_getRemote: no socket connection to processor 4
[10.691108] value = 0x81
[10.691114] status = -5
[10.691119] DONE
[10.691124] Sending response...
[10.691142] Retrieving command...
[11.691368] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[11.691393] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[11.691401] NameServer_getRemote: no socket connection to processor 1
[11.691409] NameServer_getRemote: no socket connection to processor 2
[11.691415] NameServer_getRemote: no socket connection to processor 3
[11.691421] NameServer_getRemote: no socket connection to processor 4
[11.691427] value = 0x81
[11.691432] status = -5
[11.691438] DONE
[11.691443] Sending response...
[11.691461] Retrieving command...
[12.691749] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[12.691776] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[12.691808] NameServer_getRemote: no socket connection to processor 1
[12.691818] NameServer_getRemote: no socket connection to processor 2
[12.691824] NameServer_getRemote: no socket connection to processor 3
[12.691831] NameServer_getRemote: no socket connection to processor 4
[12.691837] value = 0x81
[12.691844] status = -5
[12.691850] DONE
[12.691856] Sending response...
[12.691873] Retrieving command...
[13.692110] LAD_NAMESERVER_GETUINT32: calling NameServer_getUInt32(0x32558, 'DSP1:Master:00')...
[13.692135] NameServer_getLocal: entry key: 'DSP1:Master:00' not found!
[13.692143] NameServer_getRemote: no socket connection to processor 1
[13.692151] NameServer_getRemote: no socket connection to processor 2
[13.692157] NameServer_getRemote: no socket connection to processor 3
[13.692162] NameServer_getRemote: no socket connection to processor 4
[13.692169] value = 0x81
[13.692175] status = -5
[13.692180] DONE

  1. 疑问点

第一次出现实时核启动失败的背景是开发人员替换应用程序,操作步骤是:(1)先kill运行的应用程序;(2)vscode下通过scp将应用拷贝到设备,替换应用程序;(3)reboot。

第二次出现实时核启动失败的背景是测试人员进行应用升级,操作步骤是:(1)在web页面上通过http进行升级包上传;(2)升级包解压;(3)重启。

两次的操作有相似的地方,即进行应用程序的替换,怀疑是同一个问题。疑问点是,相比于正常情况下,为何根文件系统延迟了接近10秒的时间才开始挂载?且该问题不容易复现,TI的工程师之前是否处理过类似的问题?