This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[参考译文] AM62L:可以初始化内核恐慌

Guru**** 2652575 points

Other Parts Discussed in Thread: AM62L

请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1586568/am62l-can-init-kernel-panics

器件型号: AM62L

您好:

我们在自定义 som/Devboard 上看到 AM62L32 当可以在启动时初始化时出现内核错误。


内核:

https://github.com/phytec/linux-phytec-ti/blob/v6.12.49-11.02.02-phy/arch/arm64/boot/dts/ti/k3-am62l3-phyflex-libra-rdk.dts

U-Boot:

https://github.com/phytec/u-boot-phytec-ti/tree/v2025.01-11.02.02-phy/board/phytec/am62lx-phyflex-fpsc-g

TF-A 就是添加了 DDR4 配置的 TI 货叉。

(MT40A1G16TB-062E 2GB DDR4 或 AS4C512M16D4A-62BIN 1GB DDR4)

我们将 MAIN_MCAN0 和 MAIN_MCAN1 与 TCAN1042 收发器配合使用。

由于以下 TI TF-A 提交、我们 在 systemd 正在初始化 CAN 时引导时偶尔会看到内核出现错误 (100 到 500 引导中~1)。 然后、系统卡住、必须进行下电上电。

https://github.com/TexasInstruments/arm-trusted-firmware/pull/34/commits/e3500f2bb713ea8044c5943f9bbf1486ec7a16e8

[    9.711664] Internal error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMP
[    9.719870] Modules linked in: ti_am335x_adc(+) kfifo_buf crct10dif_ce phy_can_transceiver tps65219_pwrbutton rtc_rv3028 tmp102 rtc_ti_k3(+) k3_j72xx_bandgap leds_pca9532 dthev2 md5 crypto_engine m_can_platform m_can ti_am335x_tscadc can_dev lm75 at24 cfg80211 rfkill cryptodev(O) fuse ipv6
[    9.745576] CPU: 1 UID: 0 PID: 173 Comm: (udev-worker) Tainted: G   M       O       6.12.49-g5c3b6790-g5c3b67907416 #1
[    9.756275] Tainted: [M]=MACHINE_CHECK, [O]=OOT_MODULE
[    9.761401] Hardware name: phyFLEX-AM62L Libra Rapid Development Kit (DT)
[    9.768172] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    9.775119] pc : iomap_read_reg+0x8/0x20 [m_can_platform]
[    9.780526] lr : m_can_get_berr_counter+0x3c/0xe4 [m_can]
[    9.785927] sp : ffff80008272b2b0
[    9.789229] x29: ffff80008272b2b0 x28: 0000000000000224 x27: ffff8000797f3058
[    9.796360] x26: 0000000000000000 x25: ffff00000549f800 x24: 0000000000000000
[    9.803487] x23: ffff00000549fa24 x22: ffff00000386dc10 x21: ffff0000089c0000
[    9.810614] x20: ffff0000089c0980 x19: ffff80008272b2ec x18: ffffffffffffffff
[    9.817741] x17: 0000000000000000 x16: 0000000000000000 x15: 2f74656e2f6e6163
[    9.824868] x14: 0000000000000000 x13: 0000000100000200 x12: 0000000100000080
[    9.831994] x11: 0000000000000040 x10: ffff000003b32138 x9 : ffff000003b32130
[    9.839121] x8 : ffff00000404bdd0 x7 : 0000000000000000 x6 : 0000000000000080
[    9.846247] x5 : ffff000003b4aef8 x4 : 0000000000000000 x3 : 0000000000000001
[    9.853374] x2 : ffff80007981210c x1 : 0000000000000040 x0 : ffff80008211d040
[    9.860501] Call trace:
[    9.862939]  iomap_read_reg+0x8/0x20 [m_can_platform]
[    9.867989]  can_fill_info+0x108/0x52c [can_dev]
[    9.872620]  rtnl_fill_ifinfo.isra.0+0xac8/0x121c
[    9.877330]  rtmsg_ifinfo_build_skb+0xc4/0x140
[    9.881767]  rtnetlink_event+0xb0/0xd8
[    9.885509]  raw_notifier_call_chain+0x54/0x74
[    9.889946]  call_netdevice_notifiers_info+0x58/0xa4
[    9.894907]  dev_change_name+0x17c/0x348
[    9.898824]  do_setlink+0xc18/0xec8
[    9.902307]  rtnl_setlink+0x120/0x1d8
[    9.905962]  rtnetlink_rcv_msg+0x128/0x390
[    9.910051]  netlink_rcv_skb+0x60/0x130
[    9.913882]  rtnetlink_rcv+0x18/0x24
[    9.917453]  netlink_unicast+0x324/0x3a8
[    9.921367]  netlink_sendmsg+0x17c/0x3cc
[    9.925282]  __sys_sendto+0x110/0x178
[    9.928940]  __arm64_sys_sendto+0x28/0x38
[    9.932946]  invoke_syscall+0x48/0x10c
[    9.936690]  el0_svc_common.constprop.0+0xc0/0xe0
[    9.941385]  do_el0_svc+0x1c/0x28
[    9.944693]  el0_svc+0x28/0x98
[    9.947746]  el0t_64_sync_handler+0x120/0x12c
[    9.952093]  el0t_64_sync+0x190/0x194
[    9.955754] Code: 52800000 d65f03c0 f942fc00 8b21c000 (b9400000)
[    9.961835] ---[ end trace 0000000000000000 ]---

另一个版本如下所示:

[    9.948162] SError Interrupt on CPU1, code 0x00000000bf000000 -- SError
[    9.948193] CPU: 1 UID: 996 PID: 207 Comm: systemd-network Tainted: G           O       6.12.35-gb9b94b26-01037-gb9b94b267b88 #1
[    9.948204] Tainted: [O]=OOT_MODULE
[    9.948207] Hardware name: PHYTEC Libra AM62L RDK FPSC (DT)
[    9.948212] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    9.948219] pc : iomap_write_fifo+0x0/0x38 [m_can_platform]
[    9.948243] lr : m_can_init_ram+0x78/0xb8 [m_can]
[    9.948261] sp : ffff8000827bb480
[    9.948264] x29: ffff8000827bb490 x28: ffff0000049f0000 x27: 0000000000000003
[    9.948281] x26: ffff8000797e6ec8 x25: ffff0000056402d0 x24: 0000000000000001
[    9.948291] x23: ffff0000081f0344 x22: 0000000000040080 x21: 0000000000004400
[    9.948301] x20: ffff0000081f0980 x19: 0000000000000008 x18: ffffffffffffffff
[    9.948311] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000827bb270
[    9.948320] x14: ffff8001027bb3fd x13: 007473696c5f7974 x12: 696e696666615f65
[    9.948333] x11: 0000000000000040 x10: ffff800080ed6210 x9 : 0000000000000000
[    9.948342] x8 : ffff000003c88d80 x7 : 0000000000000000 x6 : ffff800081566290
[    9.948352] x5 : ffff000003c88c88 x4 : ffff80007981a0bc x3 : 0000000000000001
[    9.948361] x2 : ffff8000827bb484 x1 : 0000000000000008 x0 : ffff0000081f0980
[    9.948373] Kernel panic - not syncing: Asynchronous SError Interrupt
[    9.948378] CPU: 1 UID: 996 PID: 207 Comm: systemd-network Tainted: G           O       6.12.35-gb9b94b26-01037-gb9b94b267b88 #1
[    9.948387] Tainted: [O]=OOT_MODULE
[    9.948390] Hardware name: PHYTEC Libra AM62L RDK FPSC (DT)
[    9.948394] Call trace:
[    9.948398]  dump_backtrace+0x90/0xe8
[    9.948417]  show_stack+0x18/0x24
[    9.948426]  dump_stack_lvl+0x34/0x8c
[    9.948438]  dump_stack+0x18/0x24
[    9.948446]  panic+0x390/0x3a4
[    9.948457]  nmi_panic+0x40/0x8c
[    9.948465]  arm64_serror_panic+0x64/0x70
[    9.948475]  do_serror+0x3c/0x70
[    9.948484]  el1h_64_error_handler+0x30/0x48
[    9.948495]  el1h_64_error+0x64/0x68
[    9.948502]  iomap_write_fifo+0x0/0x38 [m_can_platform]
[    9.948512]  m_can_start+0x24/0x580 [m_can]
[    9.948521]  m_can_open+0x6c/0x264 [m_can]
[    9.948531]  __dev_open+0x120/0x1dc
[    9.948542]  __dev_change_flags+0x194/0x20c
[    9.948551]  dev_change_flags+0x24/0x6c
[    9.948559]  do_setlink+0x27c/0xec8
[    9.948569]  rtnl_setlink+0x120/0x1d8
[    9.948577]  rtnetlink_rcv_msg+0x128/0x390
[    9.948585]  netlink_rcv_skb+0x60/0x130
[    9.948597]  rtnetlink_rcv+0x18/0x24
[    9.948605]  netlink_unicast+0x318/0x380
[    9.948612]  netlink_sendmsg+0x17c/0x3c8
[    9.948620]  __sys_sendto+0x110/0x178
[    9.948630]  __arm64_sys_sendto+0x28/0x38
[    9.948642]  invoke_syscall+0x48/0x10c
[    9.948652]  el0_svc_common.constprop.0+0xc0/0xe0
[    9.948660]  do_el0_svc+0x1c/0x28
[    9.948668]  el0_svc+0x28/0x98
[    9.948676]  el0t_64_sync_handler+0x120/0x12c
[    9.948687]  el0t_64_sync+0x190/0x194
[    9.948695] SMP: stopping secondary CPUs
[    9.948711] Kernel Offset: disabled
[    9.948714] CPU features: 0x00,00000080,00200000,4200420b
[    9.948720] Memory Limit: none
[   10.229502] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

除了偶尔出现的启动错误、我们没有任何问题、并且 CAN (FD) 也没有任何功能问题。 系统也通过了从–40°C 到+85°C 环境温度的所有测试。


然而、在 TI-TFA 中、我们在每次启动时都会看到这些恐慌:

https://github.com/TexasInstruments/arm-trusted-firmware/commit/58bfb476c908d4c220d8f0bae88536f457452f06

现在、系统仍然引导至 Linux Promt、但与网络相关的一切都已损坏。


尽管受影响的 SoC 是 AM62x、它在单独的内核上具有 DM、但这两个 e2e-threads 中的问题看起来与我们的问题类似。

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1309832/am625-crash-internal-error-synchronous-external-abort-in-mcan-driver-at-low-temperatures-20-degrees-celsius

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1354528/am625-boot-stall-and-crashes?pifragment-323307=2#pifragment-323307=1

但是、我们没有看到与温度之间的相关性。 我将检查向时钟处理添加延迟是否也会影响我们的问题。


此致

Dominik

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    尊敬的 Dominik:

    您链接的 TF-A 贴片用于低功耗模式支持、因此我预计它不会影响 MCAN。

    MCAN 驱动程序是内置在内核还是模块?

    如果从器件树中删除 MCAN、是否仍会看到错误? 如果是、则可能与中断路由有关、可能不特定于 MCAN。

    谢谢、

    Anshu

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    尊敬的 Anshu:

    我也不这么认为。

    我认为这是一些奇怪的竞态条件、因为功率或时钟管理会影响 CAN。

    使用不同的编译器构建 Yocto 之外的 BL31 时、奇怪的是不会发生错误。

    MCAN 被构建为可加载模块。 但是、将涉及的所有内容配置为内置只会在启动时提前几秒钟移动内核错误。

    在 DTS 中禁用 CAN 后、错误消失了。

    当向 m_can 运行时 pm 添加延迟时、与在涉及 AM62x 的另一个线程[1]中添加延迟时、我们的 AM62L 设置中也会消除该错误。

    +++ b/drivers/net/can/m_can/m_can_platform.c
    @@ -8,6 +8,7 @@
     #include <linux/hrtimer.h>
     #include <linux/phy/phy.h>
     #include <linux/platform_device.h>
    +#include <linux/delay.h>
     
     #include "m_can.h"
     
    @@ -209,6 +210,9 @@ static int __maybe_unused m_can_runtime_resume(struct device *dev)
            if (err)
                    clk_disable_unprepare(mcan_class->hclk);
     
    +       printk("delay 50ms....");
    +       mdelay(50);
    +       printk("end delay....");
            return err;
     }

    [   10.863746] delay 50ms....
    [   10.969583] end delay....
    [   10.977588] m_can_platform 20701000.can: m_can device registered (irq=223, version=32)
    [  OK  ] Finished OpenSSH Key Generation.
    [   11.003345] delay 50ms....
    [   11.096743] end delay....
    [   11.105388] m_can_platform 20711000.can: m_can device registered (irq=224, version=32)
    [   11.412264] am65-cpsw-nuss 8000000.ethernet end0: PHY [8000f00.mdio:00] driver [TI DP83867] (irq=41)
    [   11.430688] am65-cpsw-nuss 8000000.ethernet end0: configuring for phy/rgmii-rxid link mode
    [  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
    [   11.527976] am65-cpsw-nuss 8000000.ethernet end1: PHY [8000f00.mdio:01] driver [TI DP83867] (irq=POLL)
    [   11.545146] am65-cpsw-nuss 8000000.ethernet end1: configuring for phy/rgmii-rxid link mode
    [  OK  ] Started User Manager for UID 1000.
    [   11.635960] delay 50ms....
    [   11.737645] end delay....
    [  OK  ] Started Session c1 of User weston.
    [   11.742295] delay 50ms....
    [   11.802674] end delay....
    [   11.805920] delay 50ms....
    [   11.863017] end delay....
    [   11.867096] delay 50ms....
    [   11.920428] end delay....
    [   11.920777] delay 50ms....
    [   11.997853] end delay....
    [   12.001194] m_can_platform 20701000.can main_mcan0: renamed from can0
    [   12.015206] delay 50ms....
    [   12.065265] end delay....
    [   12.071850] delay 50ms....
    [   12.161617] end delay....
    [   12.164722] delay 50ms....
    [   12.222836] end delay....
    [   12.226648] m_can_platform 20711000.can main_mcan1: renamed from can1
    [   12.237783] delay 50ms....
    [   12.296198] end delay....
    [  OK  ] Started Weston, a Wayland compositor, as a system service.
    [   12.303554] delay 50ms....
    [   12.402396] rtc-ti-k3 2b1f0000.rtc: registered as rtc1
    [  OK  ] Started PHYTEC's Qt6 reference demo implementation.
    [  OK  ] Reached target Graphical Interface.
    [   12.444527] end delay....
    [   12.498888] delay 50ms....
    [   12.683504] end delay....
    [   12.692096] delay 50ms....
    [   12.761608] end delay....
    [   12.765175] delay 50ms....
    [   12.823269] end delay....
    [   12.828588] delay 50ms....
    [   12.925726] end delay....
    [   12.931201] delay 50ms....
    [   13.055909] end delay....
    [   13.074119] delay 50ms....
    [   13.172501] end delay....
    [   13.197690] delay 50ms....
    [   13.259366] end delay....
    [   13.270486] delay 50ms....
    

    此致

    Dominik

    [1] e2e.ti.com/.../am625-boot-stall-and-crashes

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    尊敬的 Dominik:

    很高兴您找到了一个解决方案。

    使用不同的编译器使用 Yocto 构建 BL31 时、奇怪的是不会发生错误。

    您能进一步解释一下吗?

    谢谢、

    Anshu

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    尊敬的 Anshu:

    当试图找出哪个提交带来了内核恐慌,我建立了 tispl.bin 和所有组件,包括 bl31 手动从 Yocto。

    最终,使用 我的本地 Ubuntu 24.04 中的工具链构建的 bl31 二进制文件不会触发内核恐慌。

    使用使用我们的 Yocto scarthgap 设置中使用的工具链构建的 bl31 二进制文件确实会触发内核错误。

    所以除了一些编译器标志外、它们是相同的。

    此致

    Dominik

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    感谢更新 Dominik。