Discussion:
mlx4 weird error "Failed to map EQ context memory" after update
(too old to reply)
Greg V
2018-01-18 13:11:11 UTC
Permalink
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108)
to today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
and my Mellanox ConnectX-2 network card stopped working:

mlx4_core0: <mlx4_core> mem 0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff
irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12


Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually does
work fine!

Reverting all mlx4 changes between then and now (no big changes, mostly
just the 1 << 31 thing from D13858) and rebuilding the mlx4 module with
CC=clang50 does not help.

What happened?!
Hans Petter Selasky
2018-01-19 09:54:27 UTC
Permalink
Post by Greg V
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108) to
today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
mlx4_core0: <mlx4_core> mem 0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff
irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12
Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually does
work fine!
Reverting all mlx4 changes between then and now (no big changes, mostly
just the 1 << 31 thing from D13858) and rebuilding the mlx4 module with
CC=clang50 does not help.
What happened?!
Hi,

Can you do:

objdump -Dx /boot/kernel/mlx4.ko > mlx4.ko.txt
objdump -Dx /boot/kernel/mlx4en.ko > mlx4en.ko.txt

And diff the text result between working and non-working ko's.

Can you also make sure that /boot/modules does not contain anything *mlx4* ?

--HPS
Greg V
2018-01-19 23:17:19 UTC
Permalink
Post by Hans Petter Selasky
Post by Greg V
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108)
to today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
mlx4_core0: <mlx4_core> mem
0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12
Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually does
work fine!
Reverting all mlx4 changes between then and now (no big changes,
mostly just the 1 << 31 thing from D13858) and rebuilding the mlx4
module with CC=clang50 does not help.
What happened?!
Hi,
objdump -Dx /boot/kernel/mlx4.ko > mlx4.ko.txt
objdump -Dx /boot/kernel/mlx4en.ko > mlx4en.ko.txt
And diff the text result between working and non-working ko's.
That results in 180883 lines (9.2 megabytes) of diff for mlx4.ko. The
CC=clang50 one is only a bit better at 7.6 MB :(
Post by Hans Petter Selasky
Can you also make sure that /boot/modules does not contain anything *mlx4* ?
Yeah, it did not contain that.
Hans Petter Selasky
2018-01-20 09:18:57 UTC
Permalink
Post by Greg V
Post by Hans Petter Selasky
Post by Greg V
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108)
to today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
mlx4_core0: <mlx4_core> mem
0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12
Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually does
work fine!
Reverting all mlx4 changes between then and now (no big changes,
mostly just the 1 << 31 thing from D13858) and rebuilding the mlx4
module with CC=clang50 does not help.
What happened?!
Hi,
objdump -Dx /boot/kernel/mlx4.ko > mlx4.ko.txt
objdump -Dx /boot/kernel/mlx4en.ko > mlx4en.ko.txt
And diff the text result between working and non-working ko's.
That results in 180883 lines (9.2 megabytes) of diff for mlx4.ko. The
CC=clang50 one is only a bit better at 7.6 MB :(
Can you open this diff using "meld". And look for instructions which
have changed, not only their location.

--HPS
Greg V
2018-02-17 13:51:04 UTC
Permalink
Post by Greg V
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108)
to today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
mlx4_core0: <mlx4_core> mem
0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12
Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually
does work fine!
Reverting all mlx4 changes between then and now (no big changes,
mostly just the 1 << 31 thing from D13858) and rebuilding the mlx4
module with CC=clang50 does not help.
What happened?!
Upgraded CURRENT again today, the problem went away :)
Hans Petter Selasky
2018-02-17 14:19:28 UTC
Permalink
Post by Greg V
Post by Greg V
Hi. I've upgraded CURRENT from December 19
(https://github.com/freebsd/freebsd/commit/fd53ccf393f4f8ac1948e97eca108)
to today
(https://github.com/freebsd/freebsd/commit/391a83c86bb91ae3840cf37b7de478f42cc97e2a)
mlx4_core0: <mlx4_core> mem
0xfe100000-0xfe1fffff,0xf0800000-0xf0ffffff irq 32 at device 0.0 on pci7
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
mlx4_core0: command 0xffa failed: fw status = 0x1
mlx4_core0: Failed to map EQ context memory, aborting
device_attach: mlx4_core0 attach returned 12
Loading the OLD mlx4.ko and mlx4en.ko on the NEW kernel actually
does work fine!
Reverting all mlx4 changes between then and now (no big changes,
mostly just the 1 << 31 thing from D13858) and rebuilding the mlx4
module with CC=clang50 does not help.
What happened?!
Upgraded CURRENT again today, the problem went away :)
OK, nice to know.

--HPS

Loading...