微信配网的内核异常

在用微信配网时有时出现内核异常,终端信息如下:
//////////////////////////////////////////////////////////////////////////////
/etc/config# aac
[ 2006.360000] init queue success
[ 2009.470000] Reserved instruction in kernel code[#3]:
[ 2009.470000] CPU: 0 PID: 2375 Comm: iwpriv Tainted: G D 3.18.29 #12
[ 2009.470000] task: 879321e8 ti: 8784a000 task.ti: 8784a000
[ 2009.470000] $ 0 : 00000000 00000001 00000000 00000000
[ 2009.470000] $ 4 : 775ea000 00000000 00000000 00000001
[ 2009.470000] $ 8 : 00000014 80008f3c 00000200 00000000
[ 2009.470000] $12 : 00000000 fffffffc 00000000 00000000
[ 2009.470000] $16 : 775d2000 87390900 87375200 00000000
[ 2009.470000] $20 : 87385960 873756e0 87390800 87375500
[ 2009.470000] $24 : 00000000 8008d84c
[ 2009.470000] $28 : 8784a000 8784be28 8784be28 800dce10
[ 2009.470000] Hi : 00000000
[ 2009.470000] Lo : 00000010
[ 2009.470000] epc : 800dc044 padzero+0x6c/0x70
[ 2009.470000] Tainted: G D
[ 2009.470000] ra : 800dce10 load_elf_binary+0x9ec/0x11a8
[ 2009.470000] Status: 1100e403 KERNEL EXL IE
[ 2009.470000] Cause : 10800028
[ 2009.470000] PrId : 00019655 (MIPS 24KEc)
[ 2009.470000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt mt_wifi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ipv6 leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 2009.470000] Process iwpriv (pid: 2375, threadinfo=8784a000, task=879321e8, tls=771e4440)
[ 2009.470000] Stack : 00000000 87afdf20 7fff7f41 00000017 00002012 00000006 00000010 00000001
00000000 00000011 775e912a 775e91e0 87375600 00016ffc 00000001 00000007
00400000 00400000 0040ba90 0041ba90 0041bbe7 87375200 87375200 fffffff8
80320be8 00000001 80320000 803208c0 fffffff8 802c1608 0080a44c 800a56f4
810d9480 879db900 86ca4000 00000f2d 87375200 87375200 879321e8 00000947

[ 2009.470000] Call Trace:
[ 2009.470000] [<800dc044>] padzero+0x6c/0x70
[ 2009.470000]
[ 2009.470000]
Code: 8fbf0004 27bd0008 03e00008 <00000000> 27bdffe8 afbf0014 afb00010 00808021 0c04e4a4
[ 2009.690000] —[ end trace c3643a3c4c1c62e8 ]—
Segmentation fault
[ 2043.620000] Reserved instruction in kernel code[#4]:
[ 2043.620000] CPU: 0 PID: 0 Comm: swapper Tainted: G D 3.18.29 #12
[ 2043.620000] task: 80312ad0 ti: 8030c000 task.ti: 8030c000
[ 2043.620000] $ 0 : 00000000 00000000 000000b0 00000000
[ 2043.620000] $ 4 : c039d4cb 87baa84a 00000006 000000b0
[ 2043.620000] $ 8 : 00060000 000001f8 000001f8 00000000
[ 2043.620000] $12 : 00000000 77a783a0 00000000 00000000
[ 2043.620000] $16 : 87baa84a c039d4a8 c029e000 87baa840
[ 2043.620000] $20 : 8030dcd4 87baa84a 87b036c0 872c0000
[ 2043.620000] $24 : 00000018 8001ff4c
[ 2043.620000] $28 : 8030c000 8030dc50 872c0000 8723bd38
[ 2043.620000] Hi : 0000001f
[ 2043.620000] Lo : 00000000
[ 2043.620000] epc : 8013cf80 memcmp+0x1c/0x30
[ 2043.620000] Tainted: G D
[ 2043.620000] ra : 8723bd38 MacTableLookup+0x7c/0xa0 [mt_wifi]
[ 2043.620000] Status: 1100e403 KERNEL EXL IE
[ 2043.620000] Cause : 10800028
[ 2043.620000] PrId : 00019655 (MIPS 24KEc)
[ 2043.620000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt mt_wifi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ipv6 leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[ 2043.620000] Process swapper (pid: 0, threadinfo=8030c000, task=80312ad0, tls=00000000)
[ 2043.620000] Stack : 87b036c0 a7154240 c029e000 87217770 8736d840 8030dcc8 c03de000 8723a4c8
8030dc7c 8030dc78 06a31040 86a31040 00000000 0000000f c029e000 c029e000
c03de000 87b036c0 87baa840 8030dcd4 00000101 10020180 872c0000 8723afac
d1d71780 8005c4a4 8030dd2c 86bfdeb0 00000000 803119b0 07baa800 c1af0000
00000000 000000c2 8030dcd4 00000000 87baa800 87baa840 87b036c0 87baa840

[ 2043.620000] Call Trace:
[ 2043.620000] [<8013cf80>] memcmp+0x1c/0x30
[ 2043.620000] [<8723bd38>] MacTableLookup+0x7c/0xa0 [mt_wifi]
[ 2043.620000] [<8723a4c8>] dev_rx_data_frm+0x88/0x698 [mt_wifi]
[ 2043.620000] [<8723afac>] rtmp_rx_done_handle+0x4d4/0x4f8 [mt_wifi]
[ 2043.620000] [<87272da8>] mt_mac_int_4_tasklet+0xfcc/0x10ac [mt_wifi]
[ 2043.620000]
[ 2043.620000]
Code: 90470000 00a31021 90420000 <00e21023> 1040fff8 24630001 03e00008 00000000 27bdfff8
[ 2043.870000] —[ end trace c3643a3c4c1c62e9 ]—
[ 2043.880000] Kernel panic - not syncing: Fatal exception in interrupt
[ 2043.880000] Rebooting in 3 seconds…

Widora by mango,V1.0.8

Board: Ralink APSoC DRAM: 64 MB
relocate_code Pointer at: 83fb4000


Software System Reset Occurred


flash manufacture id: ef, device id 40 19
find flash: W25Q256FV

@mangogeek
有分析结果吗?

@mangogeek
静候结果

的确,只有那个板子出问题!
春节回来后我会委托焊接厂把主控换掉试试,让他们协助分析芯片具体有什么异常

@mangogeek
我感觉是某种硬件工作状态刚好撞到了驱动的某个微小bug,
而那个问题板刚好工作时处于这种触发状态

还是软件的细微缺陷

@jansin_shaw 是的,MT76开源驱动会自动读取factory

@mangogeek
换Openwrt18.06及驱动后,应该还能沿用原存储里面的驱动校准参数吧?

@jansin_shaw 今晚我回去后整理整理思路后对比测试!
我可能也掉坑里了

@mangogeek

除了把问题板寄给你实验,我这又多买了几块板实验,前后只在寄给你的那块问题板上发现崩溃异常,其它几块目前没发现同类现象!

@jansin_shaw
等等,我一直在测试的都是你们寄回的那个板子?!?

@mangogeek
音频 我们目前倒是没用到,
还请在那问题板上跑一下Openwrt18.06,看下运行的情况会不会异常!

另外:
我们后来又买了2块同型号BIT5板,前面那系统跑很长时间都没出现同样异常!

@jansin_shaw 到目前为止,的确还没找到规律。:white_frowning_face:
暂定认为联发科私有驱动在将底层数据包dump到应用层时,容易不稳定(因为这是我们自己添加进代码实现的这种方式,联发科并没提供这方面的支持)
有没有考虑过这种方式,就是使用Openwrt18.06配合开源驱动MT76

  1. 以前我测试过开源驱动是可以直接打开monitor接口来空中抓包的
  2. 另外MT76-master的稳定性已经很不错:详见: https://widora.io/topic/533/openwrt18-06-mt76-master-driver
  3. 涉及到修改airkiss抓包部分程序,由以前的iwpriv接口修改为标准的monitor网卡抓包接口。
  4. Openwrt18.06目前在7688上的音频部分我们还没调试通,如果你们不需要音频部分,那其他都差不多,并且有完美的LUCI界面支持。

解决了吗?

@jansin_shaw 还没找到明显规律

有新进展吗?

@jansin_shaw OK,下午我继续测试

!

我只是把lan改了dhcp

即使改成了静态ip也会出异常

不过我觉得即使这些配置有误就引起内核都异常崩溃 是不应该的!

且之前我发现把WIFI关掉就很长时间没出现报错!

初步感觉和
config interface ‘lan’
option ifname ‘eth0.1’
option force_link ‘1’
option macaddr ‘0c:ef:af:d1:dc:07’
option type ‘bstatic’
option proto ‘dhcp’
中的dhcp有关系,这部分改成dhcp的确是没有任何道理

收到板子,默认板子带了固件root@Rise,做一些记录如下:

/etc/config/network

config interface 'lan'
        option ifname 'eth0.1'
        option force_link '1'
        option macaddr '0c:ef:af:d1:dc:07'
        option type 'bridge'
        option proto 'dhcp'
        option ipaddr '192.168.8.18'
        option netmask '255.255.255.0'
        option ip6assign '60'

config interface 'wan'
        option ifname 'eth0.2'
        option force_link '1'
        option macaddr '0c:ef:af:d1:dc:06'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'eth0.2'
        option proto 'dhcpv6'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '1 2 3 4 6t'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option ports '0 6t'

config interface 'wwan'
        option proto 'dhcp'
        option ifname 'apcli0'

/etc/config/wireless

config wifi-device 'radio0'
        option type 'ralink'
        option variant 'mt7628'
        option country 'CN'
        option hwmode '11bgn'
        option htmode 'HT40'
        option channel 'auto'

config wifi-iface 'ap'
        option device 'radio0'
        option mode 'ap'
        option network 'lan'
        option ifname 'ra0'
        option ssid 'RSCTn_D1DC06'
        option encryption 'psk2'
        option key '123456789'
        option hidden '0'

config wifi-iface 'sta'
        option device 'radio0'
        option mode 'sta'
        option network 'wwan'
        option ifname 'apcli0'
        option ssid 'sssssssss'
        option key 'sstststsstst'

等待连接上AP后,运行aac命令

root@Rise:/etc/config# aac
[  531.920000] [MSC] enter monitor mode: filter:0x0, chan_id:1, width:2, chan_flags:0x0, mon0
[  531.940000] init queue success
[  532.690000] ApCliIfMonitor: IF(apcli0) - no Beacon is received from Root-AP.
[  532.690000] APCLI LINK DOWN - IF(apcli0)
[  532.700000] WLAN:STA d4:5f:25:fd:07:34(dev:ra0 rate:135Mbps singnal:-34dBm) disconnect 
[  537.690000] AP-Client probe response: SSID=ziroom304, BSSID=d4:5f:25:fd:07:34
[  537.990000] APCLI LINK UP - IF(apcli0) AuthMode(7)=WPA2PSK, WepStatus(6)=AES!
[  538.100000] WLAN:STA d4:5f:25:fd:07:34(dev:ra0 rate:135Mbps singnal:-36dBm) disconnect 
[  538.330000] APCLI LINK DOWN - IF(apcli0)
[  556.090000] AP-Client probe response: SSID=ziroom304, BSSID=d4:5f:25:fd:07:34
[  556.100000] APCLI LINK UP - IF(apcli0) AuthMode(7)=WPA2PSK, WepStatus(6)=AES!
[  570.950000] BUG: Bad page state in process sh  pfn:028a6
[  570.950000] page:810514c0 count:8560 mapcount:0 mapping:  (null) index:0x0
[  570.960000] flags: 0x0()
[  570.960000] page dumped because: nonzero _count
[  570.970000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 n
[  571.030000] CPU: 0 PID: 3404 Comm: sh Tainted: G    B          3.18.29 #23
[  571.040000] Stack : 00000000 00000000 00000000 00000000 803541f2 0000003e 00000000 0000000
          00000001 8108b4b0 802b7694 803129e3 00000d4c 80353420 8297ede0 8108b4b0
          00020200 00467000 810514d4 800476c0 00000003 80024410 802be600 8108b4b0
          802bab98 828e5b24 00000000 00000000 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[  571.080000] Call Trace:
[  571.080000] [<80014240>] show_stack+0x50/0x84
[  571.080000] [<8006f6c8>] bad_page+0xe8/0x118
[  571.090000] [<80071cb4>] get_page_from_freelist+0x41c/0x5c0
[  571.090000] [<80071f60>] __alloc_pages_nodemask+0x108/0x68c
[  571.100000] [<800724fc>] __get_free_pages+0x18/0x4c
[  571.110000] [<800886c4>] __tlb_remove_page+0x64/0xbc
[  571.110000] [<800896dc>] unmap_single_vma+0x4b8/0x710
[  571.120000] [<8008aa74>] unmap_vmas+0x54/0x74
[  571.120000] [<8008f870>] exit_mmap+0x70/0x16c
[  571.120000] [<800223b4>] mmput+0x3c/0xd4
[  571.130000] [<800a6188>] flush_old_exec+0x4b8/0x5ec
[  571.130000] [<800dc734>] load_elf_binary+0x310/0x11a8
[  571.140000] [<800a56f4>] search_binary_handler+0x88/0x1c8
[  571.140000] [<800a69b4>] do_execve+0x32c/0x4c0
[  571.150000] [<80006b5c>] handle_sys+0x11c/0x140
[  571.150000] 
^C^C[  577.380000] [MSC] leave monitor mode.
[  577.400000] deinit queue success

目前情况,在连接上级AP后进行aac,很容易崩溃。

接下来清除所有配置,不连接上级AP,进行aac测试:(仍然出现错误!)

[  293.680000] |--------------------------------------------------------|
[  294.320000] BUG: Bad page state in process ralink.sh  pfn:02f96
[  294.330000] page:8105f2c0 count:37940 mapcount:-65535 mapping:  (null) index:0xffff
[  294.330000] flags: 0x0()
[  294.340000] page dumped because: nonzero _count
[  294.340000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 n
[  294.410000] CPU: 0 PID: 2001 Comm: ralink.sh Not tainted 3.18.29 #23
[  294.410000] Stack : 00000000 00000000 00000000 00000000 803541f2 00000038 00000000 0000000
          00000001 8108b4b0 802b7694 803129e3 000007d1 80353420 82cf4720 8108b4b0
          000204d0 00989000 8105f2d4 800476c0 00000003 80024484 802be600 8108b4b0
          802bab98 82dcbc6c 00000000 00000000 00000000 00000000 00000000 00000000
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[  294.450000] Call Trace:
[  294.450000] [<80014240>] show_stack+0x50/0x84
[  294.460000] [<8006f6c8>] bad_page+0xe8/0x118
[  294.460000] [<80071cb4>] get_page_from_freelist+0x41c/0x5c0
[  294.470000] [<80071f60>] __alloc_pages_nodemask+0x108/0x68c
[  294.470000] [<80088afc>] __pte_alloc+0x34/0x184
[  294.480000] [<8008a620>] copy_page_range+0x108/0x508
[  294.480000] [<80023034>] copy_process.part.77+0x9ac/0x111c
[  294.490000] [<8002387c>] do_fork+0xc0/0x2c0
[  294.490000] [<80006b5c>] handle_sys+0x11c/0x140
[  294.500000] 
[  294.500000] Disabling lock debugging due to kernel taint
[  294.510000] CPU 0 Unable to handle kernel paging request at virtual address 00100104, epc0
[  294.510000] Oops[#1]:
[  294.510000] CPU: 0 PID: 2059 Comm: dhcpv6.script Tainted: G    B          3.18.29 #23
[  294.510000] task: 82cf4000 ti: 82d56000 task.ti: 82d56000
[  294.510000] $ 0   : 00000000 0041d21c 8108b4bc 00100100
[  294.510000] $ 4   : 00000000 8108b4b8 00000134 00000024
[  294.510000] $ 8   : 00000002 00000000 00000000 80320be8
[  294.510000] $12   : 00000001 77b5c068 00000000 000033e5
[  294.510000] $16   : 80312100 00000141 00000000 00000001
[  294.510000] $20   : 81059da0 8108b4b0 000200d0 0045da30
[  294.510000] $24   : 00000000 77abb8f0                  

初步怀疑可能固件有问题,刷

Ver:0.1.8-20180813 1546691381958-openwrt-ramips-mt7688-widora3264-squashfs-sysupgrade.bin (6 MB) 后进行测试,默认不配置情况下连续五次aac,未出现问题。

随后配置上sta,再进行aac,重试5次,未崩溃,但有一次一直获取不到ssid和key。

刷wiki中的固件Ver:0.1.8-20180430 by WIDORA

默认在无配置sta,和配置sta之后,aac各进行5次,没出现崩溃。

将wiki固件的network配置也按照root@Rise来改,测试:

出现了两次崩溃,随后又死活不复现了,还没找到规律

@jansin_shaw OK,已发邮件,我来检查