工具与软件:
您好、TI 专家
在运行 ARRECORD 时、我们观察到系统遇到以下内核 IRQ 崩溃错误消息:
[ 5637.696595] ti-udma 485c0100.dma-controller: chan2 teardown timeout! Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 48000 Hz, Channels 4 [ 5660.061570] rcu: INFO: rcu_preempt self-detected stall on CPU [ 5660.061607] rcu: 0-....: (5457 ticks this GP) idle=7acc/1/0x4000000000000000 softirq=0/0 fqs=2574 rcuc=5455 jiffies(starved) [ 5660.061623] (t=5251 jiffies g=35081 q=283 ncpus=4) [ 5660.061644] CPU: 0 PID: 901 Comm: irq/95-485c0100 Tainted: G O 6.1.33-rt11-g685e771524 #1 [ 5660.061654] Hardware name: Texas Instruments AM625 SK (DT) [ 5660.061661] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5660.061669] pc : _raw_spin_unlock_irq+0x18/0x70 [ 5660.061716] lr : irq_finalize_oneshot.part.0+0x68/0x110 [ 5660.061755] sp : ffff800009b5bd90 [ 5660.061760] x29: ffff800009b5bd90 x28: 0000000000000000 x27: 0000000000000000 [ 5660.061792] x26: ffff8000080ac380 x25: ffff8000080ac5d0 x24: ffff000004397980 [ 5660.061805] x23: ffff000001344c00 x22: ffff000001344c60 x21: ffff000001344cdc [ 5660.061816] x20: ffff000004397980 x19: ffff000001344c00 x18: 0000000000000000 [ 5660.061827] x17: 0000000000000001 x16: 0000000000000001 x15: 0000b6762e2321a2 [ 5660.061838] x14: 02da0938abd6dca0 x13: 000058fa38cd37d2 x12: 01641952b170bc26 [ 5660.061850] x11: 00000000000002ef x10: 000000000000b67e x9 : 0000000000000001 [ 5660.061861] x8 : ffff00000719cc58 x7 : ffff00000719cc68 x6 : ffffffffffffffe0 [ 5660.061871] x5 : ffff000001372680 x4 : 0000000000000000 x3 : ffffffffffffffe0 [ 5660.061883] x2 : ffff800009f00000 x1 : ffff00000719c880 x0 : 0000000100000001 [ 5660.061900] Call trace: [ 5660.061916] _raw_spin_unlock_irq+0x18/0x70 [ 5660.061930] irq_forced_thread_fn+0x70/0xd0 [ 5660.061945] irq_thread+0x178/0x23c [ 5660.061953] kthread+0x124/0x12c [ 5660.061975] ret_from_fork+0x10/0x20
操作环境:SK-AM62x EVM
SDK 版本:TI-PROCESSOR-SDK-LINUX-RT-am62xx-evm-09.00.00.03.tgz
引导方法:从 SDK 中使用 tisdk-default-image-am62xx-evm.wic.xz 创建 SD 卡以便在 EVM 上进行引导
重现内核崩溃的步骤:
1. 为简化测试条件,我们首先从 SD 卡启动后禁用了以下服务,然后重新启动。
systemctl stop irqbalanced.servic systemctl disable irqbalanced.service systemctl stop weston.service systemctl disable weston.service systemctl stop docker.service systemctl disable docker.service systemctl stop startwlanap.service systemctl stop startwlansta.service systemctl stop strongswan-starter.service systemctl disable strongswan-starter.service systemctl disable startwlansta.service systemctl disable startwlanap.service systemctl stop atd.service systemctl disable atd.service systemctl stop bluetooth.service systemctl stop bt-enable.service systemctl disable bluetooth.service systemctl disable bt-enable.service systemctl stop ti-apps-launcher.service systemctl disable ti-apps-launcher.service
2.创建了音频测试文件 audio.sh、其中包含以下内容:
#!/bin/bash while : do arecord -f S16_LE -r 48000 -c 4 > /dev/null done
3.创建了一个崩溃测试文件 test.sh、内容如下:
#!/bin/bash modprobe -r tidss ./audio.sh & sleep 3 memtester 256M > /dev/null & sleep 3 memtester 256M > /dev/null & sleep 3 memtester 256M > /dev/null & sleep 3 memtester 256M > /dev/null & sleep 3 memtester 256M > /dev/null &
4.执行以下命令:
chmod 777 test.sh audio.sh ./test.sh &
5.大约三个小时后发生碰撞。
完整日志:
e2e.ti.com/.../am62x_5F00_evm_5F00_kernel_5F00_irq_5F00_crash.log
运行五个 memtester 实例旨在加速问题、而移除 tidss 驱动程序旨在避免以下可能会使问题复杂化的问题:
AM625:关于 CPU 上的 tidss RCU_PREMPTE 自检测失速的问题-处理器论坛-处理器- TI E2E 支持论坛
AM625:运行时出现 CPU 错误时的 RCU_PREMPTE 自检测失速-处理器论坛-处理器- TI E2E 支持论坛
我们注意到、最近的补丁似乎解决了上述 TIDSS IRQ 淹没问题:
不过、我们既没有使用与 tidss 相关的函数、也没有安装相关的驱动程序、不过我们也会遇到类似的问题。
经测试、如果 arecord 未运行、则不会发生此 IRQ 崩溃。
此外、当发生崩溃时、如果有机会成功运行 cat /proc/interrupts、您可以看到与崩溃相关的 IRQ 的中断计数显著增加、如下所示:
您可以帮助检查导致此崩溃的原因吗? 谢谢你。