This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[参考译文] AM6442:如何调试内核严重错误? (异步出错中断)

Guru**** 1805680 points
Other Parts Discussed in Thread: AM6442
请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1428363/am6442-how-to-debug-a-kernel-panic-error-asynchronous-serror-interrupt

器件型号:AM6442

工具与软件:

您好!

在使用 AM6442的定制电路板上、我遇到内核严重错误的情况。 它始终与"异步错误中断"相关。

内核严重错误的日志示例:

日志1:

[ 1199.521390] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[ 1199.521419] CPU: 0 PID: 12 Comm: ktimers/0 Not tainted 6.1.80-rt26 #1
[ 1199.521430] Hardware name: ---
[ 1199.521436] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1199.521445] pc : 0xffff8000080ac250
[ 1199.521452] lr : 0xffff8000080ac450
[ 1199.521455] sp : ffff0000000d3c60
[ 1199.521458] x29: ffff0000000d3c60 x28: 0000000000000020 x27: ffff0000000d3d28
[ 1199.521474] x26: ffff00001bf7ca10 x25: 00000001000db9c1 x24: 0000000000000000
[ 1199.521484] x23: dead000000000122 x22: ffff00001bf7ca00 x21: ffff00001bf7ca50
[ 1199.521495] x20: 00000001400db9c0 x19: ffff800008b68000 x18: 0000000000000000
[ 1199.521506] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000016d
[ 1199.521515] x14: 00000000b123f581 x13: 0000000000000000 x12: 0000000000000000
[ 1199.521525] x11: ffff00001bf7ca98 x10: 0000000000000001 x9 : 00000000000000a7
[ 1199.521535] x8 : 0000100000000000 x7 : ffff0000000d3d30 x6 : ffff00001bf7ca50
[ 1199.521545] x5 : ffff0000000d3d30 x4 : 0000000000000002 x3 : 0000100000000000
[ 1199.521555] x2 : 00000000000000c0 x1 : 00000001000db9c1 x0 : ffff00001bf7ca00
[ 1199.521570] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1199.521575] CPU: 0 PID: 12 Comm: ktimers/0 Not tainted 6.1.80-rt26 #1
[ 1199.521583] Hardware name: ---
[ 1199.521587] Call trace:
[ 1199.521592]  0xffff800008018154
[ 1199.521595]  0xffff8000080181a4
[ 1199.521598]  0xffff80000881731c
[ 1199.521601]  0xffff800008817348
[ 1199.521604]  0xffff800008809160
[ 1199.521607]  0xffff80000803647c
[ 1199.521610]  0xffff800008019228
[ 1199.521613]  0xffff800008019300
[ 1199.521616]  0xffff800008818f5c
[ 1199.521618]  0xffff80000801133c
[ 1199.521621]  0xffff8000080ac250
[ 1199.521624]  0xffff8000080ac450
[ 1199.521627]  0xffff8000080accb8
[ 1199.521630]  0xffff800008010104
[ 1199.521633]  0xffff80000803b720
[ 1199.521636]  0xffff80000805dad8
[ 1199.521639]  0xffff800008057c04
[ 1199.521642]  0xffff800008014cb0
[ 1199.707695] SMP: stopping secondary CPUs
[ 1199.707706] Kernel Offset: disabled
[ 1199.707709] CPU features: 0x00000,00800004,0000400b
[ 1199.707715] Memory Limit: none

日志2:

[ 1371.051212] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
[ 1371.051242] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 6.1.80-rt26 #1
[ 1371.051253] Hardware name: ---
[ 1371.051259] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1371.051269] pc : 0xffff800008825d18
[ 1371.051275] lr : 0xffff8000080acef4
[ 1371.051278] sp : ffff0000002e7d00
[ 1371.051281] x29: ffff0000002e7d00 x28: 0000000000000000 x27: ffff800008c44c00
[ 1371.051296] x26: ffff00001bf7ca00 x25: 0000000000000000 x24: ffff800008b68000
[ 1371.051307] x23: 00000000ffffffff x22: 00000001001059bc x21: ffff00001bf7ca00
[ 1371.051317] x20: ffff00001bf7ca00 x19: ffff0000002e7da0 x18: 0000000000000000
[ 1371.051328] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000132
[ 1371.051338] x14: 0000000000000009 x13: 0000000000000000 x12: 0000000000000000
[ 1371.051348] x11: 0000000000000002 x10: 00000000000008f0 x9 : ffff0000002e7cf0
[ 1371.051358] x8 : 0100000000000000 x7 : 0000000000000001 x6 : ffff00001bf7ca50
[ 1371.051367] x5 : 0000000000000001 x4 : 000000001e000000 x3 : 00000001001059c0
[ 1371.051377] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00001bf7ca00
[ 1371.051391] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 1371.051397] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 6.1.80-rt26 #1
[ 1371.051403] Hardware name: ---
[ 1371.051408] Call trace:
[ 1371.051412]  0xffff800008018154
[ 1371.051416]  0xffff8000080181a4
[ 1371.051418]  0xffff80000881731c
[ 1371.051421]  0xffff800008817348
[ 1371.051424]  0xffff800008809160
[ 1371.051427]  0xffff80000803647c
[ 1371.051430]  0xffff800008019228
[ 1371.051433]  0xffff800008019300
[ 1371.051436]  0xffff800008818f5c
[ 1371.051439]  0xffff80000801133c
[ 1371.051442]  0xffff800008825d18
[ 1371.051444]  0xffff800008824fc8
[ 1371.051447]  0xffff80000810a1b4
[ 1371.051450]  0xffff800008057c04
[ 1371.051453]  0xffff800008014cb0
[ 1371.228305] SMP: stopping secondary CPUs
[ 1371.228317] Kernel Offset: disabled
[ 1371.228320] CPU features: 0x00000,00800004,0000400b
[ 1371.228328] Memory Limit: none

日志3:

[ 3962.368990] SError Interrupt on CPU1, code 0x00000000bf000002 -- SError
[ 3962.369024] CPU: 1 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
[ 3962.369034] Hardware name: ---
[ 3962.369040] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3962.369050] pc : 0xffff8000087e0f60
[ 3962.369055] lr : 0xffff8000083277d0
[ 3962.369057] sp : ffff000001a07960
[ 3962.369060] x29: ffff000001a07960 x28: 0001000000000000 x27: 0000000000000000
[ 3962.369076] x26: ffff000001a07d20 x25: ffff000001a07d10 x24: ffff000001a07d20
[ 3962.369087] x23: 00000000000007d8 x22: ffff000001a07dd8 x21: ffff00000194e000
[ 3962.369097] x20: 00000000000007d8 x19: 0000000000000000 x18: 0000000000000000
[ 3962.369107] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00000194e000
[ 3962.369116] x14: 20656c62616c6961 x13: 7661206f6e203a5d x12: 322072656375646f
[ 3962.369127] x11: 72505b205d013838 x10: 355b017265767265 x9 : 732d3335612d636d
[ 3962.369137] x8 : 34303a323230320a x7 : 6174616420425355 x6 : 0000000022d819df
[ 3962.369147] x5 : 0000000022d81fe7 x4 : 0000000000000000 x3 : 00000000000007d8
[ 3962.369157] x2 : 0000000000000598 x1 : ffff00000194e210 x0 : 0000000022d8180f
[ 3962.369171] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 3962.369176] CPU: 1 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
[ 3962.369183] Hardware name: ---
[ 3962.369187] Call trace:
[ 3962.369192]  0xffff800008018154
[ 3962.369195]  0xffff8000080181a4
[ 3962.369198]  0xffff80000881731c
[ 3962.369201]  0xffff800008817348
[ 3962.369204]  0xffff800008809160
[ 3962.369207]  0xffff80000803647c
[ 3962.369210]  0xffff800008019228
[ 3962.369213]  0xffff800008019300
[ 3962.369216]  0xffff800008818f5c
[ 3962.369219]  0xffff80000801133c
[ 3962.369222]  0xffff8000087e0f60
[ 3962.369226]  0xffff8000086aad50
[ 3962.369229]  0xffff8000086aabec
[ 3962.369232]  0xffff8000086aad2c
[ 3962.369234]  0xffff8000087b8098
[ 3962.369238]  0xffff8000087bc2f0
[ 3962.369241]  0xffff8000087bca58
[ 3962.369244]  0xffff8000086950d4
[ 3962.369246]  0xffff800008697d78
[ 3962.369250]  0xffff800008698490
[ 3962.369253]  0xffff800008698500
[ 3962.369255]  0xffff80000801dcac
[ 3962.369258]  0xffff80000801dda0
[ 3962.369261]  0xffff800008817b70
[ 3962.369264]  0xffff800008819024
[ 3962.369267]  0xffff800008011488

如何调试它并确定内核的哪个部分导致了问题?

谢谢!

Stephane

PS:此系统使用 ti 内核 09.02.00.009-RT 运行

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    Stephane、您好!

    首先、请在内核配置中启用 CONFIG_KALLSYMS、它会在内核跟踪日志中添加符号、这可能会提供一些关于崩溃的提示。

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    您好!

    谢谢您的建议、我已在内核构建中添加该配置。 很遗憾、在我上次进行的测试中、我仍然看到内核崩溃(控制台不再响应)、但调试 UART 上没有显示任何消息... 本周我将尝试在有和没有  CONFIG_KALLSYMS 的情况下多次重复测试、查看这是否是调试消息有所不同的地方。

    此致、

    Stephane

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    Stephane、您好!

    CONFIG_KALLSYMS 不能解决内核问题、而是在任何内核崩溃日志中提供更多信息、这可能会引发调试问题。

    期待下一次测试结果/日志。

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    您好!

    今天下午我终于得到了新的结果、出现了内核恐慌:

    日志1 (新的内核恐慌类型:"内部错误:同步外部中止"):

    [  906.730994] Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT_RT SMP
    [  906.731024] Modules linked in: rpmsg_ctrl rpmsg_char bulk_1_usb cdc_acm bulk_2_usb bulk_3_usb bulk_4_usb bulk_5_usb bulk_6_usb xhci_plat_hcd cdns3 cdns_usb_common at25 crct10dif_ce ti_k3_r5_remoteproc cdns3_ti virtio_rpmsg_bus rti_wdt rpmsg_ns sa2ul spi_omap2_mcspi at24 overlay fuse ipv6
    [  906.731112] CPU: 0 PID: 1209 Comm: systemd-tmpfile Not tainted 6.1.80-rt26 #1
    [  906.731122] Hardware name: ---
    [  906.731128] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [  906.731138] pc : copy_page+0x64/0xd0
    [  906.731163] lr : copy_user_highpage+0x34/0x4c
    [  906.731180] sp : ffff0000016d7c20
    [  906.731184] x29: ffff0000016d7c20 x28: ffff0000015d3770 x27: 0000000000000000
    [  906.731195] x26: ffff0000011c8000 x25: ffff0000015d3770 x24: fffffc00001313c0
    [  906.731206] x23: 0000000000000a55 x22: ffff000003f5a800 x21: fffffc00001f6140
    [  906.731216] x20: 0000ffff89862000 x19: fffffc00001f6140 x18: 000000000000ef96
    [  906.731226] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [  906.731235] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    [  906.731245] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    [  906.731254] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    [  906.731263] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000022b7e0
    [  906.731272] x2 : 0000000000000000 x1 : ffff000004c4f700 x0 : ffff000007d85780
    [  906.731283] Call trace:
    [  906.731288]  copy_page+0x64/0xd0
    [  906.731295]  wp_page_copy+0x8c/0x660
    [  906.731311]  do_wp_page+0xc4/0x4a0
    [  906.731321]  __handle_mm_fault+0x59c/0x95c
    [  906.731328]  handle_mm_fault+0xa0/0x154
    [  906.731333]  do_page_fault+0x118/0x3ac
    [  906.731343]  do_mem_abort+0x40/0x90
    [  906.731352]  el0_da+0x38/0x110
    [  906.731364]  el0t_64_sync_handler+0xf0/0x130
    [  906.731374]  el0t_64_sync+0x148/0x14c
    [  906.731388] Code: a9421c26 a8332408 a9432428 a8342c0a (a9442c2a)
    [  906.914987] ---[ end trace 0000000000000000 ]---
    [  911.692563] Internal error: synchronous external abort: 0000000086000210 [#2] PREEMPT_RT SMP
    [  911.692592] Modules linked in: rpmsg_ctrl rpmsg_char bulk_1_usb cdc_acm bulk_2_usb bulk_3_usb bulk_4_usb bulk_5_usb bulk_6_usb xhci_plat_hcd cdns3 cdns_usb_common at25 crct10dif_ce ti_k3_r5_remoteproc cdns3_ti virtio_rpmsg_bus rti_wdt rpmsg_ns sa2ul spi_omap2_mcspi at24 overlay fuse ipv6
    [  911.692676] CPU: 0 PID: 781 Comm: myapp-a53-server Tainted: G      D            6.1.80-rt26 #1
    [  911.692688] Hardware name: ---
    [  911.692694] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [  911.692704] pc : cgroup_post_fork+0x2c/0x27c
    [  911.692727] lr : copy_process+0xea8/0x11e0
    [  911.692742] sp : ffff000001bc3be0
    [  911.692745] x29: ffff000001bc3be0 x28: ffff0000017b4800 x27: ffff000001bc3db0
    [  911.692756] x26: ffff000000f15800 x25: ffff800008d95f70 x24: 0000000000004100
    [  911.692766] x23: ffff00000165b400 x22: ffff800008d6d038 x21: ffff000000f15400
    [  911.692777] x20: ffff000001bc3db0 x19: ffff000000f15400 x18: 000000000000ef96
    [  911.692787] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000000000dc
    [  911.692796] x14: 0000000000004111 x13: 0000000080000000 x12: 0000ffff93c07c38
    [  911.692806] x11: 0000000000000040 x10: 00000003a3145385 x9 : ffff000000f154c0
    [  911.692816] x8 : ffff0000017b4c00 x7 : ffff0000017b0430 x6 : 000000361f9512d7
    [  911.692826] x5 : ffff000003c21b40 x4 : ffff800008c750a0 x3 : 0000000000000000
    [  911.692836] x2 : ffff0000017b4800 x1 : ffff000001bc3db0 x0 : ffff000000f15400
    [  911.692847] Call trace:
    [  911.692851]  cgroup_post_fork+0x2c/0x27c
    [  911.692859]  copy_process+0xea8/0x11e0
    [  911.692867]  kernel_clone+0x78/0x330
    [  911.692876]  __do_sys_clone+0x5c/0x74
    [  911.692885]  __arm64_sys_clone+0x1c/0x24
    [  911.692895]  el0_svc_common.constprop.0+0x5c/0x134
    [  911.692905]  do_el0_svc+0x1c/0x2c
    [  911.692911]  el0_svc+0x20/0x100
    [  911.692927]  el0t_64_sync_handler+0xb4/0x130
    [  911.692937]  el0t_64_sync+0x148/0x14c
    [  927.204110] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command
    [  927.397948] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
    [  927.405945] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
    [  927.408428] Bulk message returned -110
    [  927.408527] Bulk message returned -110
    [  927.421535] usb 1-1: USB disconnect, device number 2
    [  927.426547] usb 1-1.1: USB disconnect, device number 3
    

    日志2 (与上周相同的"异步错误中断"):

    [ 1659.516725] SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
    [ 1659.516752] CPU: 0 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
    [ 1659.516762] Hardware name: ---
    [ 1659.516768] pstate: 80000000 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [ 1659.516777] pc : 0000ffff9e5116c8
    [ 1659.516779] lr : 0000ffff9e5116a4
    [ 1659.516782] sp : 0000ffffe7df6db0
    [ 1659.516785] x29: 0000ffffe7df6df0 x28: 0000ffffe7df6fe0 x27: 0000ffffe7df6e60
    [ 1659.516801] x26: 0000ffff9d6a8620 x25: 0000ffffe7df72e0 x24: 0000000000000000
    [ 1659.516812] x23: 0000000000000000 x22: 62a1255053723ec2 x21: 0000000000000000
    [ 1659.516822] x20: 00000000000001c0 x19: 00000000136a26b0 x18: 0000000000000000
    [ 1659.516831] x17: 0000ffff9e21aa40 x16: 0000ffff9e60ee00 x15: ea0ba3ff2dd83ac4
    [ 1659.516842] x14: 00000000000cb778 x13: ea0ba3ff2dd83ac4 x12: 00000000000cb778
    [ 1659.516851] x11: 48cbbb06f8f33437 x10: 00000000001285b0 x9 : d4959e675cb764a0
    [ 1659.516861] x8 : 0000000000128540 x7 : 074271864aac32a7 x6 : 00000000001284d8
    [ 1659.516871] x5 : 0000ffff9d6a87e0 x4 : 0000ffffe7df6fe0 x3 : 0000ffff9d6a8760
    [ 1659.516881] x2 : fffffffffffffff0 x1 : 0000ffffe7df6fa0 x0 : 0000000062ea25ee
    [ 1659.516895] Kernel panic - not syncing: Asynchronous SError Interrupt
    [ 1659.516900] CPU: 0 PID: 145 Comm: systemd-journal Not tainted 6.1.80-rt26 #1
    [ 1659.516907] Hardware name: ---
    [ 1659.516911] Call trace:
    [ 1659.516916]  dump_backtrace.part.0+0xb4/0xc0
    [ 1659.516942]  show_stack+0x14/0x20
    [ 1659.516951]  dump_stack_lvl+0x64/0x7c
    [ 1659.516965]  dump_stack+0x14/0x2c
    [ 1659.516974]  panic+0x14c/0x324
    [ 1659.516986]  nmi_panic+0x6c/0x70
    [ 1659.516998]  arm64_serror_panic+0x68/0x74
    [ 1659.517008]  is_valid_bugaddr+0x0/0x20
    [ 1659.517017]  __el0_error_handler_common+0x38/0x114
    [ 1659.517028]  el0t_64_error_handler+0xc/0x14
    [ 1659.517038]  el0t_64_error+0x148/0x14c
    [ 1659.692301] SMP: stopping secondary CPUs
    [ 1659.692316] Kernel Offset: disabled
    [ 1659.692319] CPU features: 0x00000,00800004,0000400b
    [ 1659.692327] Memory Limit: none
    

    在一种情况下、调用跟踪 将我们带到  el0t_64_SYNC 在第二个案例中、我们可以看到  el0t_64_error . 这是否有助于开始追逐崩溃?

    此致、

    Stephane

  • 请注意,本文内容源自机器翻译,可能存在语法或其它翻译错误,仅供参考。如需获取准确内容,请参阅链接中的英语原文或自行翻译。

    Stephane、您好!

    感谢新日志。 这两个日志似乎无关,基本上意味着崩溃似乎是"随机的"。 定制电路板上此类行为的常见问题是不稳定的 DDR 或电源。 要缩小范围、首先您可以停止您的应用、但在电路板上的 Linux 中运行"memtester"测试、查看它是否报告了任何 DDR 错误?