На группе серваков непонятные ребуты - никаких записей в логах о причинах ребута и за 20 минут до него, хотя в конфигах rsyslog'а всё нормально. За 2 недели работы netconsole - одна запись, хотя ребутов было штук 20. Проблема точно не в железе - такое же железо на всех остальных серверах. Единственное отличие - на этой группе LA прыгает до 30(высокая нагрузка на проц), по словам девелоперов - это нормальная ситуация. Гугл по bnx2 выдаёт много подобных ошибок, но все они относятся к ядру 2.6. Какие могут быть причины ребутов?
------------[ cut here ]------------
WARNING: CPU: 0 PID: 142 at /build/buildd/linux-lts-utopic-3.16.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
NETDEV WATCHDOG: eth0 (bnx2): transmit queue 1 timed out
Modules linked in:
CPU: 0 PID: 142 Comm: khugepaged Not tainted 3.16.0-43-generic #58~14.04.1-Ubuntu
Hardware name: IBM System x3550 M3 -[7944AC1]-/00D4062, BIOS -[D6E162AUS-1.20]- 05/07/2014
0000000000000009
ffff88046fc03dd0
ffff8804699d5940
Call Trace:
<IRQ>
dump_stack+0x45/0x56
[<ffffffff8106de3d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff8106deac>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff81686616>] dev_watchdog+0x276/0x280
[<ffffffff816863a0>] ? dev_graft_qdisc+0x80/0x80
[<ffffffff8107a616>] call_timer_fn+0x36/0x100
[<ffffffff816863a0>] ? dev_graft_qdisc+0x80/0x80
[<ffffffff8107bdcf>] run_timer_softirq+0x20f/0x310
[<ffffffff81073185>] __do_softirq+0xf5/0x2e0
[<ffffffff810614b0>] ? leave_mm+0x80/0x80
[<ffffffff81073645>] irq_exit+0x105/0x110
[<ffffffff81771015>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff8176f0bd>] apic_timer_interrupt+0x6d/0x80
<EOI>
? smp_call_function_many+0x20e/0x270
[<ffffffff810615ee>] native_flush_tlb_others+0x2e/0x30
[<ffffffff810616ab>] flush_tlb_mm_range+0x5b/0x180
[<ffffffff8119ec9e>] pmdp_clear_flush+0x3e/0x50
[<ffffffff811bfeb5>] khugepaged_scan_mm_slot+0x725/0xcd0
[<ffffffff811c06af>] khugepaged+0x24f/0x460
[<ffffffff810b50b0>] ? prepare_to_wait_event+0x100/0x100
[<ffffffff811c0460>] ? khugepaged_scan_mm_slot+0xcd0/0xcd0
[<ffffffff81091572>] kthread+0xd2/0xf0
[<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
[<ffffffff8176e098>] ret_from_fork+0x58/0x90
[<ffffffff810914a0>] ? kthread_create_on_node+0x1c0/0x1c0
---[ end trace e29f378c56100a08 ]---
bnx2 0000:0b:00.0 eth0: <--- start FTQ dump --->
bnx2 0000:0b:00.0 eth0: RV2P_PFTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: RV2P_TFTQ_CTL 00020000
bnx2 0000:0b:00.0 eth0: RV2P_MFTQ_CTL 00004000
bnx2 0000:0b:00.0 eth0: TBDR_FTQ_CTL 00004002
bnx2 0000:0b:00.0 eth0: TDMA_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TXP_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TXP_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TPAT_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: RXP_CFTQ_CTL 00008000
bnx2 0000:0b:00.0 eth0: RXP_FTQ_CTL 00100000
bnx2 0000:0b:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000
bnx2 0000:0b:00.0 eth0: COM_COMQ_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: CP_CPQ_FTQ_CTL 00004000
bnx2 0000:0b:00.0 eth0: CPU states:
bnx2 0000:0b:00.0 eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc
bnx2 0000:0b:00.0 eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a4c pc 80007b8 instr 10400016
bnx2 0000:0b:00.0 eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c14 pc 8004c1c instr 10e00088
bnx2 0000:0b:00.0 eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a8c pc 8000a8c instr 8821
bnx2 0000:0b:00.0 eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 8009c08 pc 800d924 instr 34420001
bnx2 0000:0b:00.0 eth0: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000cb0 pc 8000928 instr 8f870048
bnx2 0000:0b:00.0 eth0: <--- end FTQ dump --->
bnx2 0000:0b:00.0 eth0: <--- start TBDC dump --->
bnx2 0000:0b:00.0 eth0: TBDC free cnt: 32
bnx2 0000:0b:00.0 eth0: LINE CID BIDX CMD VALIDS
bnx2 0000:0b:00.0 eth0: 00 001300 a188 00 [0]
bnx2 0000:0b:00.0 eth0: 01 001300 a188 00 [0]
bnx2 0000:0b:00.0 eth0: 02 001000 dcb0 00 [0]
bnx2 0000:0b:00.0 eth0: 03 001000 dac8 00 [0]
bnx2 0000:0b:00.0 eth0: 04 001000 dad0 00 [0]
bnx2 0000:0b:00.0 eth0: 05 001000 a3f0 00 [0]
bnx2 0000:0b:00.0 eth0: 06 001000 8668 00 [0]
bnx2 0000:0b:00.0 eth0: 07 001000 79d8 00 [0]
bnx2 0000:0b:00.0 eth0: 08 001300 4fb0 00 [0]
bnx2 0000:0b:00.0 eth0: 09 001000 4e98 00 [0]
bnx2 0000:0b:00.0 eth0: 0a 001180 2000 00 [0]
bnx2 0000:0b:00.0 eth0: 0b 001300 7b58 00 [0]
bnx2 0000:0b:00.0 eth0: 0c 001300 7b68 00 [0]
bnx2 0000:0b:00.0 eth0: 0d 001080 99a0 00 [0]
bnx2 0000:0b:00.0 eth0: 0e 001080 99a8 00 [0]
bnx2 0000:0b:00.0 eth0: 0f 001080 99b8 00 [0]
bnx2 0000:0b:00.0 eth0: 10 001080 99c0 00 [0]
bnx2 0000:0b:00.0 eth0: 11 001080 99c8 00 [0]
bnx2 0000:0b:00.0 eth0: 12 001200 b568 00 [0]
bnx2 0000:0b:00.0 eth0: 13 128f00 7ee8 b4 [0]
bnx2 0000:0b:00.0 eth0: 14 1dff80 c558 bf [0]
bnx2 0000:0b:00.0 eth0: 15 1bf680 ffd8 d7 [0]
bnx2 0000:0b:00.0 eth0: 16 1bff80 fff0 a6 [0]
bnx2 0000:0b:00.0 eth0: 17 16ef80 ae38 cf [0]
bnx2 0000:0b:00.0 eth0: 18 1ff380 33b8 61 [0]
bnx2 0000:0b:00.0 eth0: 19 1ffa80 7fb8 17 [0]
bnx2 0000:0b:00.0 eth0: 1a 1ffd80 e978 af [0]
bnx2 0000:0b:00.0 eth0: 1b 1fdf80 bb68 67 [0]
bnx2 0000:0b:00.0 eth0: 1c 053b80 3ff8 ef [0]
bnx2 0000:0b:00.0 eth0: 1d 175b00 fff8 fa [0]
bnx2 0000:0b:00.0 eth0: 1e 0bff00 d5f0 df [0]
bnx2 0000:0b:00.0 eth0: 1f 1ffe80 ffe0 ff [0]
bnx2 0000:0b:00.0 eth0: <--- end TBDC dump --->
bnx2 0000:0b:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
bnx2 0000:0b:00.0 eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
bnx2 0000:0b:00.0 eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
bnx2 0000:0b:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
bnx2 0000:0b:00.0 eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01f5000a]
bnx2 0000:0b:00.0 eth0: DEBUG: PBA[00000000]
bnx2 0000:0b:00.0 eth0: <--- start MCP states dump --->
bnx2 0000:0b:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
bnx2 0000:0b:00.0 eth0: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500]
bnx2 0000:0b:00.0 eth0: DEBUG: pc[0800d2f0] pc[0800b0b4] instr[0441010a]
bnx2 0000:0b:00.0 eth0: DEBUG: shmem states:
bnx2 0000:0b:00.0 eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f]
drv_pulse_mb[00003622]
bnx2 0000:0b:00.0 eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254]
condition[0003610e]
bnx2 0000:0b:00.0 eth0: DEBUG: 000001c0: 01005254 42530084 0003610e 00000000
bnx2 0000:0b:00.0 eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a14
bnx2 0000:0b:00.0 eth0: DEBUG: 000003dc: 0004ffff 00000000 00000000 00000000
bnx2 0000:0b:00.0 eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000000
bnx2 0000:0b:00.0 eth0: DEBUG: 0x3fc[0000ffff]
bnx2 0000:0b:00.0 eth0: <--- end MCP states dump --->
bnx2 0000:0b:00.0 eth0: <--- start FTQ dump --->
bnx2 0000:0b:00.0 eth0: RV2P_PFTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: RV2P_TFTQ_CTL 00020000
bnx2 0000:0b:00.0 eth0: RV2P_MFTQ_CTL 00004000
bnx2 0000:0b:00.0 eth0: TBDR_FTQ_CTL 00004002
bnx2 0000:0b:00.0 eth0: TDMA_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TXP_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TXP_FTQ_CTL 00010002
bnx2 0000:0b:00.0 eth0: TPAT_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: RXP_CFTQ_CTL 00008000
bnx2 0000:0b:00.0 eth0: RXP_FTQ_CTL 00100000
bnx2 0000:0b:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000
bnx2 0000:0b:00.0 eth0: COM_COMQ_FTQ_CTL 00010000
bnx2 0000:0b:00.0 eth0: CP_CPQ_FTQ_CTL 00004000
bnx2 0000:0b:00.0 eth0: CPU states:
bnx2 0000:0b:00.0 eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001290 instr 38640001
bnx2 0000:0b:00.0 eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a5c pc 8000a4c instr 1440fffc
bnx2 0000:0b:00.0 eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c14 pc 8004c18 instr 32070001
bnx2 0000:0b:00.0 eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a9c pc 8000b28 instr 8c420020
bnx2 0000:0b:00.0 eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 8000b00 pc 80009d8 instr 30820800
bnx2 0000:0b:00.0 eth0: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c6c pc 8000920 instr 27bd0018
bnx2 0000:0b:00.0 eth0: <--- end FTQ dump --->
bnx2 0000:0b:00.0 eth0: <--- start TBDC dump --->
bnx2 0000:0b:00.0 eth0: TBDC free cnt: 32
bnx2 0000:0b:00.0 eth0: LINE CID BIDX CMD VALIDS
bnx2 0000:0b:00.0 eth0: 00 001300 a1d8 00 [0]
bnx2 0000:0b:00.0 eth0: 01 001300 a1d8 00 [0]
bnx2 0000:0b:00.0 eth0: 02 001000 dcb0 00 [0]
bnx2 0000:0b:00.0 eth0: 03 001000 dac8 00 [0]
bnx2 0000:0b:00.0 eth0: 04 001000 dad0 00 [0]
bnx2 0000:0b:00.0 eth0: 05 001000 a3f0 00 [0]
bnx2 0000:0b:00.0 eth0: 06 001000 8668 00 [0]
bnx2 0000:0b:00.0 eth0: 07 001000 79d8 00 [0]
bnx2 0000:0b:00.0 eth0: 08 001300 4fb0 00 [0]
bnx2 0000:0b:00.0 eth0: 09 001000 4e98 00 [0]
bnx2 0000:0b:00.0 eth0: 0a 001180 2000 00 [0]
bnx2 0000:0b:00.0 eth0: 0b 001300 7b58 00 [0]
bnx2 0000:0b:00.0 eth0: 0c 001300 7b68 00 [0]
bnx2 0000:0b:00.0 eth0: 0d 001080 99a0 00 [0]
bnx2 0000:0b:00.0 eth0: 0e 001080 99a8 00 [0]
bnx2 0000:0b:00.0 eth0: 0f 001080 99b8 00 [0]
bnx2 0000:0b:00.0 eth0: 10 001080 99c0 00 [0]
bnx2 0000:0b:00.0 eth0: 11 001080 99c8 00 [0]
bnx2 0000:0b:00.0 eth0: 12 001200 b568 00 [0]
bnx2 0000:0b:00.0 eth0: 13 128f00 7ee8 b4 [0]
bnx2 0000:0b:00.0 eth0: 14 1dff80 c558 bf [0]
bnx2 0000:0b:00.0 eth0: 15 1bf680 ffd8 d7 [0]
bnx2 0000:0b:00.0 eth0: 16 1bff80 fff0 a6 [0]
bnx2 0000:0b:00.0 eth0: 17 16ef80 ae38 cf [0]
bnx2 0000:0b:00.0 eth0: 18 1ff380 33b8 61 [0]
bnx2 0000:0b:00.0 eth0: 19 1ffa80 7fb8 17 [0]
bnx2 0000:0b:00.0 eth0: 1a 1ffd80 e978 af [0]
bnx2 0000:0b:00.0 eth0: 1b 1fdf80 bb68 67 [0]
bnx2 0000:0b:00.0 eth0: 1c 053b80 3ff8 ef [0]
bnx2 0000:0b:00.0 eth0: 1d 175b00 fff8 fa [0]
bnx2 0000:0b:00.0 eth0: 1e 0bff00 d5f0 df [0]
bnx2 0000:0b:00.0 eth0: 1f 1ffe80 ffe0 ff [0]
bnx2 0000:0b:00.0 eth0: <--- end TBDC dump --->
bnx2 0000:0b:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
bnx2 0000:0b:00.0 eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
bnx2 0000:0b:00.0 eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
bnx2 0000:0b:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
bnx2 0000:0b:00.0 eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01f5000a]
bnx2 0000:0b:00.0 eth0: DEBUG: PBA[00000000]
bnx2 0000:0b:00.0 eth0: <--- start MCP states dump --->
bnx2 0000:0b:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
bnx2 0000:0b:00.0 eth0: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500]
bnx2 0000:0b:00.0 eth0: DEBUG: pc[08008fac] pc[0800b05c] instr[8f420020]
bnx2 0000:0b:00.0 eth0: DEBUG: shmem states:
bnx2 0000:0b:00.0 eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f]
drv_pulse_mb[00003628]
bnx2 0000:0b:00.0 eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254]
condition[0003610e]
bnx2 0000:0b:00.0 eth0: DEBUG: 000001c0: 01005254 42530083 0003610e 00000000
bnx2 0000:0b:00.0 eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a14
root@servername:/root# lspci | grep Net
0b:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
0b:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)