Discussion:
panic: Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed at
(too old to reply)
Ronald Klop
2018-05-31 16:03:48 UTC
Permalink
cross-post to -current to get more feedback

------- Forwarded message -------
From: "Ronald Klop" <ronald-***@klop.ws>
To: freebsd-***@freebsd.org
Cc:
Subject: panic: Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed at
/usr/src/sys/kern/sched_ule.c:2137
Date: Thu, 31 May 2018 17:53:12 +0200

I had a crash on mu RPI-3B+ running freebsd 12/aarch64 from the 29 may
snapshot.
It was building world -j4 and portsnap extract.

I will keep it in the debugger for about an hour so if I need to provide
more information, please mail me.


db> show panic
panic: Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed at
/usr/src/sys/kern/sched_ule.c:2137

db> bt
Tracing pid 20 tid 100063 td 0xfffffd00014fe000
db_trace_self() at db_stack_trace+0xf0
pc = 0xffff0000006680f4 lr = 0xffff0000000d8b18
sp = 0xffff000053972e10 fp = 0xffff000053972e40

db_stack_trace() at db_command+0x220
pc = 0xffff0000000d8b18 lr = 0xffff0000000d879c
sp = 0xffff000053972e50 fp = 0xffff000053972f30

db_command() at db_command_loop+0x60
pc = 0xffff0000000d879c lr = 0xffff0000000d8560
sp = 0xffff000053972f40 fp = 0xffff000053972f60

db_command_loop() at db_trap+0xf4
pc = 0xffff0000000d8560 lr = 0xffff0000000db678
sp = 0xffff000053972f70 fp = 0xffff000053973190

db_trap() at kdb_trap+0x1d8
pc = 0xffff0000000db678 lr = 0xffff0000003beca0
sp = 0xffff0000539731a0 fp = 0xffff000053973250

kdb_trap() at do_el1h_sync+0xf0
pc = 0xffff0000003beca0 lr = 0xffff000000683094
sp = 0xffff000053973260 fp = 0xffff000053973290

do_el1h_sync() at handle_el1h_sync+0x74
pc = 0xffff000000683094 lr = 0xffff00000066a074
sp = 0xffff0000539732a0 fp = 0xffff0000539733b0

handle_el1h_sync() at kdb_enter+0x34
pc = 0xffff00000066a074 lr = 0xffff0000003be34c
sp = 0xffff0000539733c0 fp = 0xffff000053973450

kdb_enter() at vpanic+0x1c4
pc = 0xffff0000003be34c lr = 0xffff00000037a3a4
sp = 0xffff000053973460 fp = 0xffff000053973510

vpanic() at kassert_panic+0x1bc
pc = 0xffff00000037a3a4 lr = 0xffff00000037a134
sp = 0xffff000053973520 fp = 0xffff0000539735d0

kassert_panic() at sched_switch+0x994
pc = 0xffff00000037a134 lr = 0xffff0000003a3d1c
sp = 0xffff0000539735e0 fp = 0xffff0000539736c0

sched_switch() at mi_switch+0x1a0
pc = 0xffff0000003a3d1c lr = 0xffff000000385044
sp = 0xffff0000539736d0 fp = 0xffff0000539736f0

mi_switch() at uma_reclaim_locked+0x1cc
pc = 0xffff000000385044 lr = 0xffff0000006169b4
sp = 0xffff000053973700 fp = 0xffff000053973750

uma_reclaim_locked() at uma_reclaim+0x34
pc = 0xffff0000006169b4 lr = 0xffff0000006167cc
sp = 0xffff000053973760 fp = 0xffff000053973770

uma_reclaim() at vm_pageout_worker+0x3e8
pc = 0xffff0000006167cc lr = 0xffff000000636d08
sp = 0xffff000053973780 fp = 0xffff000053973b10

vm_pageout_worker() at vm_pageout+0x140
pc = 0xffff000000636d08 lr = 0xffff000000635b58
sp = 0xffff000053973b20 fp = 0xffff000053973b50

vm_pageout() at fork_exit+0x7c
pc = 0xffff000000635b58 lr = 0xffff00000033bb14
sp = 0xffff000053973b60 fp = 0xffff000053973b90

fork_exit() at fork_trampoline+0x10
pc = 0xffff00000033bb14 lr = 0xffff000000682e14
sp = 0xffff000053973ba0 fp = 0x0000000000000000



Accidently I had this in a xterm: (da4s1b is the label/usbswap)

dT: 1.008s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
18 1 0 0 0.0 1 32 430.5 42.7| mmcsd0
0 0 0 0 0.0 0 0 0.0 0.0| da0
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s1
18 1 0 0 0.0 1 32 430.5 42.7| mmcsd0s2
0 0 0 0 0.0 0 0 0.0 0.0|
msdosfs/MSDOSBOOT
18 1 0 0 0.0 1 32 430.5 42.7| mmcsd0s2a
18 1 0 0 0.0 1 32 430.6 42.7| ufs/rootfs
0 0 0 0 0.0 0 0 0.0 0.0| da1
0 0 0 0 0.0 0 0 0.0 0.0| da2
0 0 0 0 0.0 0 0 0.0 0.0| da3
6 315 5 52 963.9 310 2158 11.6 92.6| da4
0 0 0 0 0.0 0 0 0.0 0.0| da2s1
0 0 0 0 0.0 0 0 0.0 0.0| da2s2
6 314 5 52 963.9 309 2158 11.6 92.6| da4s1
0 0 0 0 0.0 0 0 0.0 0.0| da2s2a
1 0 0 0 0.0 0 0 0.0 0.0| da4s1a
5 314 5 52 963.9 309 2158 11.6 92.6| da4s1b
0 0 0 0 0.0 0 0 0.0 0.0|
ufs/oldsdrootfs
1 0 0 0 0.0 0 0 0.0 0.0|
ufs/usbrootfs
5 314 5 52 963.9 309 2158 11.6 92.6|
label/usbswap





───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
last pid: 74521; load averages: 3.31, 3.34,
3.72 up 0+17:21:58
17:19:13
63 processes: 3 running, 60 sleeping
CPU 0: 3.9% user, 45.5% nice, 22.0% system, 2.4% interrupt, 26.3% idle
CPU 1: 1.2% user, 51.8% nice, 13.3% system, 0.4% interrupt, 33.3% idle
CPU 2: 2.0% user, 51.0% nice, 12.9% system, 0.0% interrupt, 34.1% idle
CPU 3: 1.2% user, 50.6% nice, 14.5% system, 0.8% interrupt, 32.9% idle
Mem: 522M Active, 132K Inact, 210M Laundry, 156M Wired, 60M Buf, 12M Free
Swap: 984M Total, 390M Used, 594M Free, 39% Inuse, 204K In, 3440K Out

PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU
COMMAND
73952 ronald 1 100 15 160M 97092K CPU3 3 0:39 82.81%
c++
73953 ronald 1 103 15 165M 106M CPU0 0 0:39 80.99%
c++
72549 ronald 1 52 15 517M 54008K swread 3 3:31 34.70%
c++
71551 ronald 1 52 15 448M 62524K swread 3 6:14 23.43%
c++
2806 ronald 1 21 0 30928K 2004K select 0 5:12 2.58%
tmux
74521 root 1 52 0 15092K 5692K biord 1 0:00 2.18%
bsdtar
94955 root 1 52 0 12660K 464K wait 1 5:03 0.71% sh
53784 root 1 20 0 14220K 2568K nanslp 3 0:24 0.48%
gstat
67541 ronald 1 20 0 13912K 2600K CPU1 1 0:06 0.30%
top
53277 ronald 1 20 0 20992K 1464K select 0 0:07 0.18%
sshd
94954 root 1 20 0 10820K 1800K pipewr 1 0:00 0.03%
cat
746 root 1 20 0 17916K 18000K select 1 0:14 0.02%
ntpd
5027 ronald 1 35 15 11048K 184K select 0 0:02 0.01%
make
62849 ronald 1 35 15 14772K 468K select 0 0:06 0.01%
make
2887 ronald 1 35 15 10348K 180K select 0 0:02 0.01%
make
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-***@freebsd.org"
Ronald Klop
2018-06-02 11:40:27 UTC
Permalink
This happened again. Now I have a crashdump.
Anyone interested?

Some more answers below.
I had a crash on my RPI-3B+ running freebsd 12/aarch64 from the 29 may
snapshot.
It was building world -j4 and portsnap extract.
I will keep it in the debugger for about an hour so if I need to provide
more information, please mail me.
db> show panic
panic: Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed at
/usr/src/sys/kern/sched_ule.c:2137
This used to be a fairly common error on my RPI3, but I haven't seen
it in some time.
As a matter of curiosity, have you been able to run a -j4 buildworld
to completion on your RPI3? If so, please post your setup. For several
months now all my attempts at j4 buildworld have failed with "killed,
out of swap" even though top seems to report minimal swap usage. It
even happens with no -j value set at all, but usually restarting with
-DNO_CLEAN allowed progress to pick up where it left off.
Alas, as I write this, the kernel just reported
Jun 1 08:34:34 www kernel: pid 19399 (c++), uid 0, was killed: out of
swap space
top is showing
Swap: 4096M Total, 17M Used, 4079M Free
but since top is alive the value is stale. Nonetheless, I have seen top
killed while still showing minimal swap usage.
It seems as if the "out of swap" kills are being issued in error..
Thanks for reading,
bob prohaska
I just received my RPI3 this week. It is running GENERIC with debug
options still. Building a -NODEBUG kernel from src didn't succeed yet
because of the mentioned panic.
I'm now building with -j2 to be on the safe side.

After I have the -NODEBUG kernel it will be a bit faster and I will try
-j4 again.

Regards,
Ronald.
Ronald Klop
2018-06-06 18:55:39 UTC
Permalink
Post by Ronald Klop
This happened again. Now I have a crashdump.
Anyone interested?
Some more answers below.
I had a crash on my RPI-3B+ running freebsd 12/aarch64 from the 29 may
snapshot.
It was building world -j4 and portsnap extract.
I will keep it in the debugger for about an hour so if I need to provide
more information, please mail me.
db> show panic
panic: Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed at
/usr/src/sys/kern/sched_ule.c:2137
This used to be a fairly common error on my RPI3, but I haven't seen
it in some time.
As a matter of curiosity, have you been able to run a -j4 buildworld
to completion on your RPI3? If so, please post your setup. For several
months now all my attempts at j4 buildworld have failed with "killed,
out of swap" even though top seems to report minimal swap usage. It
even happens with no -j value set at all, but usually restarting with
-DNO_CLEAN allowed progress to pick up where it left off.
How do you ever run a -j4 buildworld? My RPI3 starts building clang/llvm
with sometimes 500 MB+ per process so everything starts swapping like hell
and takes forever to run.

Regards,
Ronald.
Post by Ronald Klop
Alas, as I write this, the kernel just reported
Jun 1 08:34:34 www kernel: pid 19399 (c++), uid 0, was killed: out of
swap space
top is showing
Swap: 4096M Total, 17M Used, 4079M Free
but since top is alive the value is stale. Nonetheless, I have seen top
killed while still showing minimal swap usage.
It seems as if the "out of swap" kills are being issued in error..
Thanks for reading,
bob prohaska
I just received my RPI3 this week. It is running GENERIC with debug
options still. Building a -NODEBUG kernel from src didn't succeed yet
because of the mentioned panic.
I'm now building with -j2 to be on the safe side.
After I have the -NODEBUG kernel it will be a bit faster and I will try
-j4 again.
Regards,
Ronald.
bob prohaska
2018-06-06 23:58:01 UTC
Permalink
Post by Ronald Klop
How do you ever run a -j4 buildworld? My RPI3 starts building clang/llvm
with sometimes 500 MB+ per process so everything starts swapping like hell
and takes forever to run.
Lately, never 8-)

When I started playing with an RPI3, in late 2016, -j4 buildworlds
worked usably well. Early in 2018 problems appeared, including
Assertion td->td_lock == TDQ_LOCKPTR(tdq) failed, among others.

Things didn't really go to pot until somewhat later when the swap frenzy
issue reared its head and haven't improved much.

Sadly, when the swap frenzy workaround of using
sysctl vm.pageout_update_period=0 was suggested,
a -j4 buildworld then resorted to the old td_lock
issue, so it looks as if both bugs are alive and
kicking.

Just to complicate matters, I was in the habit of
using a USB flash drive as both an outboard file
system (/usr/, /var/ and /tmp/) and as a swap device.
A very common reaction was to blame the flash device
for the trouble, though so far as I can tell a Sandisk
Extreme USB flash drive isn't much slower, if any, than
a mechanical hard disk for random writes. The same USB
flash devices on a Pi2 running 11-Stable seems to be fine.

However, turning off the USB flash swap device does seem
to reduce the number of "indefinite wait buffer" messages
on the console (they're usually not fatal) so I think there
is still something amiss. Whether it's the flash, the USB
or the VM system is unclear to me.

For now the workarounds are to run buildworld with no explicit
-j value (presumably equivalent to -j1), to use only swap on
the microSD card and to use the -DNO_CLEAN option for most
buildworld sessions, doing an explict "make clean" or
"rm -rf /usr/obj/usr" when necessary. In a few cases it
seemed helpful to start with "make kernel-toolchain" then
follow with make -DNO_CLEAN buildworld" but I didn't keep
good enough records to be certain of the benefits.

Apologies for the length, HTH

bob prohaska
Mark Millard
2018-06-07 05:28:58 UTC
Permalink
bob prohaska fbsd at www.zefox.net wrote on
Post by bob prohaska
On Sat, 02 Jun 2018 13:40:27 +0200, Ronald Klop <ronald-lists at klop.ws
How do you ever run a -j4 buildworld? My RPI3 starts building clang/llvm
with sometimes 500 MB+ per process so everything starts swapping like hell
and takes forever to run.
Lately, never 8-)
Looks like there has been another stab at avoiding some
unnecessary Out Of Memory killing of processes:

Author: alc
Date: Thu Jun 7 02:54:11 2018
New Revision: 334752
URL: https://svnweb.freebsd.org/changeset/base/334752


Log:
. . . One visible
effect of this error was that processes were being killed by the
virtual memory system's OOM killer when in fact there was plentiful
free memory.


(I do not have access to an appropriate test context at
this time.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
bob prohaska
2018-06-08 00:13:45 UTC
Permalink
Post by Mark Millard
Looks like there has been another stab at avoiding some
Author: alc
Date: Thu Jun 7 02:54:11 2018
New Revision: 334752
URL: https://svnweb.freebsd.org/changeset/base/334752
. . . One visible
effect of this error was that processes were being killed by the
virtual memory system's OOM killer when in fact there was plentiful
free memory.
An RPI3 kernel at 334800 still reported
Jun 7 16:28:21 www kernel: pid 71329 (c++), uid 0, was killed: out of swap space

during a -j4 buildworld.

I wasn't watching top at the time, so I don't know how much swap was in
use. Total available was 4 GB, which certainly seems like it ought to be
enough. The swap was on both microSD and USB flash.

I've run make clean in /usr/src/lib/clang/libllvm and restarted
a -j4 buildworld with the -DNO_CLEAN option, and also set
sysctl vm.pageout_update_period=0 to see what would happen.

Within a few minutes buildworld stopped, the tail of the log file
contained


--- X86GenEVEX2VEXTables.inc ---
llvm-tblgen -gen-x86-EVEX2VEX-tables -I /usr/src/contrib/llvm/include -I /usr/src/contrib/llvm/lib/Target/X86 -d X86GenEVEX2VEXTables.inc.d -o X86GenEVEX2VEXTables.inc /usr/src/contrib/llvm/lib/Target/X86/X86.td
--- X86GenFastISel.inc ---
llvm-tblgen -gen-fast-isel -I /usr/src/contrib/llvm/include -I /usr/src/contrib/llvm/lib/Target/X86 -d X86GenFastISel.inc.d -o X86GenFastISel.inc /usr/src/contrib/llvm/lib/Target/X86/X86.td
--- X86GenDAGISel.inc ---
Killed
*** [X86GenDAGISel.inc] Error code 137

make[6]: stopped in /usr/src/lib/clang/libllvm
1 error

make[6]: stopped in /usr/src/lib/clang/libllvm
*** [all_subdir_lib/clang/libllvm] Error code 2

make[5]: stopped in /usr/src/lib/clang

I'll try again, this time with USB swap turned off.



Thanks for reading!

bob prohaska
bob prohaska
2018-06-08 01:55:45 UTC
Permalink
Post by bob prohaska
I'll try again, this time with USB swap turned off.
The circle closed, back to the original panic in the subject line.
Console, top and buildworld.log files are at
http://www.zefox.net/~fbsd/rpi3/crashes/20180607

Thanks for reading,

bob prohaska

Loading...