Discussion:
amd64 head -r329465 (non-debug build, but with symbols): "panic: spin
(too old to reply)
Mark Millard
2018-02-18 01:39:39 UTC
Permalink
This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
Threadripper 1950X). No other Hyper-V use.

This happened during:

# ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
Checking for old files
(Hand typed from a picture of a window's content
at slighly different times, expect typos:)

KDB: enter: panic
[thread pid 42170 tid 100752 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db> call doadump
Dumping 4825 out of 110559 MB: ... (omitted) ...
Dump complete
= 0


(The "pid 42170" identification as the process reporting the
issue does not seem to appear in the core.txt.0 file.)


# ls -lTdt /var/crash/*
-rw-r--r-- 1 root wheel 100792 Feb 17 16:09:18 2018 /var/crash/core.txt.0
lrwxr-xr-x 1 root wheel 8 Feb 17 16:09:08 2018 /var/crash/vmcore.last -> vmcore.0
lrwxr-xr-x 1 root wheel 6 Feb 17 16:09:08 2018 /var/crash/info.last -> info.0
-rw------- 1 root wheel 5060296704 Feb 17 16:09:08 2018 /var/crash/vmcore.0
-rw------- 1 root wheel 392 Feb 17 16:08:59 2018 /var/crash/info.0
-rw-r--r-- 1 root wheel 2 Feb 17 16:08:59 2018 /var/crash/bounds
-rw-r--r-- 1 root wheel 5 Nov 22 04:34:36 2017 /var/crash/minfree

From /var/crash/core.txt.0 :

Unread portion of the kernel message buffer:
spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long
panic: spin lock held too long
cpuid = 5
time = 1518911834
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f10094d0
vpanic() at vpanic+0x18d/frame 0xfffffe00f1009530
panic() at panic+0x43/frame 0xfffffe00f1009590
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f10095a0
thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1009610
statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1009650
handleevents() at handleevents+0x113/frame 0xfffffe00f10096a0
timercb() at timercb+0xa9/frame 0xfffffe00f10096f0
lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1009730
timerint_u() at timerint_u+0x96/frame 0xfffffe00f1009810
thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1009880
fork1() at fork1+0x1b9f/frame 0xfffffe00f1009930
sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1009980
amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1009ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffcfc0
KDB: enter: panic

__curthread () at ./machine/pcpu.h:230
230 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0 __curthread () at ./machine/pcpu.h:230
#1 doadump (textdump=-2122191464) at /usr/src/sys/kern/kern_shutdown.c:347
#2 0xffffffff8040b42c in db_fncall_generic (addr=<optimized out>,
rv=<optimized out>, nargs=<optimized out>, args=<optimized out>)
at /usr/src/sys/ddb/db_command.c:609
#3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>,
dummy3=<optimized out>, dummy4=<optimized out>)
at /usr/src/sys/ddb/db_command.c:657
#4 0xffffffff8040af79 in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=<optimized out>)
at /usr/src/sys/ddb/db_command.c:481
#5 0xffffffff8040acf4 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:534
#6 0xffffffff8040df9f in db_trap (type=<optimized out>, code=<optimized out>)
at /usr/src/sys/ddb/db_main.c:250
#7 0xffffffff80b370e3 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
at /usr/src/sys/kern/subr_kdb.c:697
#8 0xffffffff80fa2c5c in trap (frame=0xfffffe00f1009400)
at /usr/src/sys/amd64/amd64/trap.c:547
#9 <signal handler called>
#10 kdb_enter (why=0xffffffff811f280b "panic", msg=<optimized out>)
at /usr/src/sys/kern/subr_kdb.c:479
#11 0xffffffff80aef17a in vpanic (fmt=<optimized out>, ap=0xfffffe00f1009570)
at /usr/src/sys/kern/kern_shutdown.c:801
#12 0xffffffff80aeefc3 in panic (fmt=0x0)
at /usr/src/sys/kern/kern_shutdown.c:739
#13 0xffffffff80acfa31 in _mtx_lock_indefinite_check (m=<optimized out>,
ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224
#14 0xffffffff80acfb9b in thread_lock_flags_ (td=0xfffff818719d1000,
opts=<optimized out>, file=<optimized out>, line=<optimized out>)
at /usr/src/sys/kern/kern_mutex.c:913
#15 0xffffffff80a89d6c in statclock_cnt (cnt=1, usermode=<optimized out>)
at /usr/src/sys/kern/kern_clock.c:768
#16 0xffffffff810d0003 in handleevents (now=43230207690178, fake=0)
at /usr/src/sys/kern/kern_clocksource.c:196
#17 0xffffffff810d0709 in timercb (et=0xffffffff81c528e8 <lapic_et>,
arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353
#18 0xffffffff8110dad7 in lapic_handle_timer (frame=0xfffffe00f1009740)
at /usr/src/sys/x86/x86/local_apic.c:1305
#19 0xffffffff80f849d0 in timerint_u ()
at /usr/src/sys/amd64/amd64/apic_vector.S:132
#20 0xfffffe00f1009828 in ?? ()
#21 0x000000000000b6b1 in ?? ()
#22 0x0000000000002000 in ?? ()
#23 0x00000000ffffdfff in ?? ()
#24 0x00c11c08e43e7fd5 in ?? ()
#25 0x00000000000003e8 in ?? ()
#26 0x00000000fffff1eb in ?? ()
#27 0xfffffe00f1009828 in ?? ()
#28 0xfffffe00f1009810 in ?? ()
#29 0x00000000800e6d01 in ?? ()
#30 0x0000000000000064 in ?? ()
#31 0xfffff8011d936560 in ?? ()
#32 0xfffffe00f1009828 in ?? ()
#33 0xffffffff81771014 in mtx_delay ()
#34 0x0000000000000000 in ?? ()
(kgdb)

. . .
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
. . .
0 1110 1102 0 33 0 12024 3076 ttyin D+ - 0:00.00 [sh]
0 1120 1044 0 20 0 18572 7936 select Ds - 0:00.00 [sshd]
1001 1123 1120 0 20 0 18936 8044 select D - 0:00.00 [sshd]
1001 1124 1123 0 34 0 12120 3196 wait Ds - 0:00.00 [sh]
0 1134 1124 0 22 0 12060 3148 wait D - 0:00.00 [su]
0 1135 1134 0 20 0 12312 3244 wait D - 0:00.00 [sh]
0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh]
0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh]
0 42075 42072 0 20 0 10928 2480 select D+ - 0:00.00 [script]
0 42076 42075 0 52 0 10160 1396 wait Ds+ - 0:00.00 [make]
0 42108 42076 0 52 0 12236 3224 wait D+ - 0:00.00 [make]
0 42168 42108 0 52 0 11496 3068 wait D+ - 0:00.00 [sh]
0 42169 42168 0 52 0 12608 3568 pipewr D+ - 0:00.00 [make]
0 42170 42168 0 72 0 10956 2284 - R+ - 0:00.00 [xargs]
0 42171 42168 0 35 0 11500 3064 piperd D+ - 0:00.00 [sh]
0 46769 42170 0 72 0 10956 2284 - ?VL+ - 0:00.00 [xargs]
. . .




Context details:

# uname -apKU
FreeBSD FBSDFSSD 12.0-CURRENT FreeBSD 12.0-CURRENT r329465M amd64 amd64 1200058 1200058

# svnlite status /usr/src | sort
? /usr/src/sys/amd64/conf/GENERIC-DBG
? /usr/src/sys/amd64/conf/GENERIC-NODBG
? /usr/src/sys/arm/conf/GENERIC-DBG
? /usr/src/sys/arm/conf/GENERIC-NODBG
? /usr/src/sys/arm64/conf/GENERIC-DBG
? /usr/src/sys/arm64/conf/GENERIC-NODBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
M /usr/src/crypto/openssl/crypto/armcap.c
M /usr/src/lib/libkvm/kvm_powerpc.c
M /usr/src/lib/libkvm/kvm_private.c
M /usr/src/stand/defs.mk
M /usr/src/stand/powerpc/boot1.chrp/Makefile
M /usr/src/stand/powerpc/kboot/Makefile
M /usr/src/sys/arm64/arm64/identcpu.c
M /usr/src/sys/conf/kmod.mk
M /usr/src/sys/conf/ldscript.powerpc
M /usr/src/sys/kern/subr_pcpu.c
M /usr/src/sys/powerpc/aim/mmu_oea64.c
M /usr/src/sys/powerpc/ofw/ofw_machdep.c
M /usr/src/sys/powerpc/powerpc/interrupt.c
M /usr/src/sys/powerpc/powerpc/mp_machdep.c
M /usr/src/sys/powerpc/powerpc/trap.c
M /usr/src/sys/vm/uma_core.c
M /usr/src/usr.bin/top/machine.c

The GENERIC* files include GENERIC and then set explicit
debug status choices of mine. Most of the rest is tied to
historical powerpc and powerpc64 investigations. I also
have top report the maximum swap-in-use figure that it
sees during its run.

# svnlite diff /usr/src/sys/vm/uma_core.c
Index: /usr/src/sys/vm/uma_core.c
===================================================================
--- /usr/src/sys/vm/uma_core.c (revision 329465)
+++ /usr/src/sys/vm/uma_core.c (working copy)
@@ -3428,7 +3428,7 @@
void
uma_reclaim_wakeup(void)
{
-
+printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
wakeup(uma_reclaim);
}



Side note: It took the automatic fsck and 2 manual fsck
runs to get back to a clean status (before I could get to
multi-user).

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)
Trond Endrestøl
2018-02-18 12:25:17 UTC
Permalink
Post by Mark Millard
This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
Threadripper 1950X). No other Hyper-V use.
# ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
Checking for old files
(Hand typed from a picture of a window's content
at slighly different times, expect typos:)
KDB: enter: panic
[thread pid 42170 tid 100752 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db> call doadump
Dumping 4825 out of 110559 MB: ... (omitted) ...
Dump complete
= 0
(The "pid 42170" identification as the process reporting the
issue does not seem to appear in the core.txt.0 file.)
# ls -lTdt /var/crash/*
-rw-r--r-- 1 root wheel 100792 Feb 17 16:09:18 2018 /var/crash/core.txt.0
lrwxr-xr-x 1 root wheel 8 Feb 17 16:09:08 2018 /var/crash/vmcore.last -> vmcore.0
lrwxr-xr-x 1 root wheel 6 Feb 17 16:09:08 2018 /var/crash/info.last -> info.0
-rw------- 1 root wheel 5060296704 Feb 17 16:09:08 2018 /var/crash/vmcore.0
-rw------- 1 root wheel 392 Feb 17 16:08:59 2018 /var/crash/info.0
-rw-r--r-- 1 root wheel 2 Feb 17 16:08:59 2018 /var/crash/bounds
-rw-r--r-- 1 root wheel 5 Nov 22 04:34:36 2017 /var/crash/minfree
spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long
panic: spin lock held too long
cpuid = 5
time = 1518911834
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f10094d0
vpanic() at vpanic+0x18d/frame 0xfffffe00f1009530
panic() at panic+0x43/frame 0xfffffe00f1009590
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f10095a0
thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1009610
statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1009650
handleevents() at handleevents+0x113/frame 0xfffffe00f10096a0
timercb() at timercb+0xa9/frame 0xfffffe00f10096f0
lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1009730
timerint_u() at timerint_u+0x96/frame 0xfffffe00f1009810
thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1009880
fork1() at fork1+0x1b9f/frame 0xfffffe00f1009930
sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1009980
amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1009ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffcfc0
KDB: enter: panic
__curthread () at ./machine/pcpu.h:230
230 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0 __curthread () at ./machine/pcpu.h:230
#1 doadump (textdump=-2122191464) at /usr/src/sys/kern/kern_shutdown.c:347
#2 0xffffffff8040b42c in db_fncall_generic (addr=<optimized out>,
rv=<optimized out>, nargs=<optimized out>, args=<optimized out>)
at /usr/src/sys/ddb/db_command.c:609
#3 db_fncall (dummy1=<optimized out>, dummy2=<optimized out>,
dummy3=<optimized out>, dummy4=<optimized out>)
at /usr/src/sys/ddb/db_command.c:657
#4 0xffffffff8040af79 in db_command (last_cmdp=<optimized out>,
cmd_table=<optimized out>, dopager=<optimized out>)
at /usr/src/sys/ddb/db_command.c:481
#5 0xffffffff8040acf4 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:534
#6 0xffffffff8040df9f in db_trap (type=<optimized out>, code=<optimized out>)
at /usr/src/sys/ddb/db_main.c:250
#7 0xffffffff80b370e3 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
at /usr/src/sys/kern/subr_kdb.c:697
#8 0xffffffff80fa2c5c in trap (frame=0xfffffe00f1009400)
at /usr/src/sys/amd64/amd64/trap.c:547
#9 <signal handler called>
#10 kdb_enter (why=0xffffffff811f280b "panic", msg=<optimized out>)
at /usr/src/sys/kern/subr_kdb.c:479
#11 0xffffffff80aef17a in vpanic (fmt=<optimized out>, ap=0xfffffe00f1009570)
at /usr/src/sys/kern/kern_shutdown.c:801
#12 0xffffffff80aeefc3 in panic (fmt=0x0)
at /usr/src/sys/kern/kern_shutdown.c:739
#13 0xffffffff80acfa31 in _mtx_lock_indefinite_check (m=<optimized out>,
ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224
#14 0xffffffff80acfb9b in thread_lock_flags_ (td=0xfffff818719d1000,
opts=<optimized out>, file=<optimized out>, line=<optimized out>)
at /usr/src/sys/kern/kern_mutex.c:913
#15 0xffffffff80a89d6c in statclock_cnt (cnt=1, usermode=<optimized out>)
at /usr/src/sys/kern/kern_clock.c:768
#16 0xffffffff810d0003 in handleevents (now=43230207690178, fake=0)
at /usr/src/sys/kern/kern_clocksource.c:196
#17 0xffffffff810d0709 in timercb (et=0xffffffff81c528e8 <lapic_et>,
arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353
#18 0xffffffff8110dad7 in lapic_handle_timer (frame=0xfffffe00f1009740)
at /usr/src/sys/x86/x86/local_apic.c:1305
#19 0xffffffff80f849d0 in timerint_u ()
at /usr/src/sys/amd64/amd64/apic_vector.S:132
#20 0xfffffe00f1009828 in ?? ()
#21 0x000000000000b6b1 in ?? ()
#22 0x0000000000002000 in ?? ()
#23 0x00000000ffffdfff in ?? ()
#24 0x00c11c08e43e7fd5 in ?? ()
#25 0x00000000000003e8 in ?? ()
#26 0x00000000fffff1eb in ?? ()
#27 0xfffffe00f1009828 in ?? ()
#28 0xfffffe00f1009810 in ?? ()
#29 0x00000000800e6d01 in ?? ()
#30 0x0000000000000064 in ?? ()
#31 0xfffff8011d936560 in ?? ()
#32 0xfffffe00f1009828 in ?? ()
#33 0xffffffff81771014 in mtx_delay ()
#34 0x0000000000000000 in ?? ()
(kgdb)
. . .
UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND
. . .
0 1110 1102 0 33 0 12024 3076 ttyin D+ - 0:00.00 [sh]
0 1120 1044 0 20 0 18572 7936 select Ds - 0:00.00 [sshd]
1001 1123 1120 0 20 0 18936 8044 select D - 0:00.00 [sshd]
1001 1124 1123 0 34 0 12120 3196 wait Ds - 0:00.00 [sh]
0 1134 1124 0 22 0 12060 3148 wait D - 0:00.00 [su]
0 1135 1134 0 20 0 12312 3244 wait D - 0:00.00 [sh]
0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh]
0 42072 1135 0 25 0 11464 3060 wait D+ - 0:00.00 [sh]
0 42075 42072 0 20 0 10928 2480 select D+ - 0:00.00 [script]
0 42076 42075 0 52 0 10160 1396 wait Ds+ - 0:00.00 [make]
0 42108 42076 0 52 0 12236 3224 wait D+ - 0:00.00 [make]
0 42168 42108 0 52 0 11496 3068 wait D+ - 0:00.00 [sh]
0 42169 42168 0 52 0 12608 3568 pipewr D+ - 0:00.00 [make]
0 42170 42168 0 72 0 10956 2284 - R+ - 0:00.00 [xargs]
0 42171 42168 0 35 0 11500 3064 piperd D+ - 0:00.00 [sh]
0 46769 42170 0 72 0 10956 2284 - ?VL+ - 0:00.00 [xargs]
. . .
# uname -apKU
FreeBSD FBSDFSSD 12.0-CURRENT FreeBSD 12.0-CURRENT r329465M amd64 amd64 1200058 1200058
# svnlite status /usr/src | sort
? /usr/src/sys/amd64/conf/GENERIC-DBG
? /usr/src/sys/amd64/conf/GENERIC-NODBG
? /usr/src/sys/arm/conf/GENERIC-DBG
? /usr/src/sys/arm/conf/GENERIC-NODBG
? /usr/src/sys/arm64/conf/GENERIC-DBG
? /usr/src/sys/arm64/conf/GENERIC-NODBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
M /usr/src/crypto/openssl/crypto/armcap.c
M /usr/src/lib/libkvm/kvm_powerpc.c
M /usr/src/lib/libkvm/kvm_private.c
M /usr/src/stand/defs.mk
M /usr/src/stand/powerpc/boot1.chrp/Makefile
M /usr/src/stand/powerpc/kboot/Makefile
M /usr/src/sys/arm64/arm64/identcpu.c
M /usr/src/sys/conf/kmod.mk
M /usr/src/sys/conf/ldscript.powerpc
M /usr/src/sys/kern/subr_pcpu.c
M /usr/src/sys/powerpc/aim/mmu_oea64.c
M /usr/src/sys/powerpc/ofw/ofw_machdep.c
M /usr/src/sys/powerpc/powerpc/interrupt.c
M /usr/src/sys/powerpc/powerpc/mp_machdep.c
M /usr/src/sys/powerpc/powerpc/trap.c
M /usr/src/sys/vm/uma_core.c
M /usr/src/usr.bin/top/machine.c
The GENERIC* files include GENERIC and then set explicit
debug status choices of mine. Most of the rest is tied to
historical powerpc and powerpc64 investigations. I also
have top report the maximum swap-in-use figure that it
sees during its run.
# svnlite diff /usr/src/sys/vm/uma_core.c
Index: /usr/src/sys/vm/uma_core.c
===================================================================
--- /usr/src/sys/vm/uma_core.c (revision 329465)
+++ /usr/src/sys/vm/uma_core.c (working copy)
@@ -3428,7 +3428,7 @@
void
uma_reclaim_wakeup(void)
{
-
+printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
wakeup(uma_reclaim);
}
Side note: It took the automatic fsck and 2 manual fsck
runs to get back to a clean status (before I could get to
multi-user).
The same thing happened to me on a VirtualBox VM running base/head
r329515. This VM previously ran r329101 and I can easily revert to
this version where the problem doesn't exist.

See https://ximalas.info/~trond/base-head-r329515/ for core.txt.0 and
info.0. Crash dump of roughly 1 GiB is available on request.

I achieved an uptime of roughly 26 minutes before the panic set in. I
was in the process of upgrading print/texinfo using portmaster when
the crash occurred. xargs was the running process.

Unread portion of the kernel message buffer:
spin lock 0xffffffff80e76e80 (sched lock 0) held by 0xfffff800c58d4000 (tid 100746) too long
panic: spin lock held too long
cpuid = 0
time = 1518955069
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff804d281b = db_trace_self_wrapper+0x2b/frame 0xfffffe004b204610
vpanic() at 0xffffffff807106cd = vpanic+0x18d/frame 0xfffffe004b204670
panic() at 0xffffffff80710533 = panic+0x43/frame 0xfffffe004b2046d0
_mtx_lock_indefinite_check() at 0xffffffff806f0fa1 = _mtx_lock_indefinite_check+0x71/frame 0xfffffe004b2046e0
thread_lock_flags_() at 0xffffffff806f110b = thread_lock_flags_+0xdb/frame 0xfffffe004b204750
statclock_cnt() at 0xffffffff806ab0dc = statclock_cnt+0xdc/frame 0xfffffe004b204790
handleevents() at 0xffffffff80a72d73 = handleevents+0x113/frame 0xfffffe004b2047e0
timercb() at 0xffffffff80a73479 = timercb+0xa9/frame 0xfffffe004b204830
lapic_handle_timer() at 0xffffffff80ad6237 = lapic_handle_timer+0xa7/frame 0xfffffe004b204870
timerint_u() at 0xffffffff80a1f080 = timerint_u+0x96/frame 0xfffffe004b204950
thread_lock_flags_() at 0xffffffff806f10f1 = thread_lock_flags_+0xc1/frame 0xfffffe004b2049c0
fork1() at 0xffffffff806d124f = fork1+0x1b9f/frame 0xfffffe004b204a70
sys_vfork() at 0xffffffff806d17ec = sys_vfork+0x4c/frame 0xfffffe004b204ac0
amd64_syscall() at 0xffffffff80a3d6c8 = amd64_syscall+0xa48/frame 0xfffffe004b204bf0
fast_syscall_common() at 0xffffffff80a1e03d = fast_syscall_common+0x101/frame 0x7fffffffe000
Uptime: 26m5s
Dumping 1003 out of 6123 MB:..2%..12%..21%..31%..42%..52%..61%..71%..82%..91%

__curthread () at ./machine/pcpu.h:230
230 ./machine/pcpu.h: No such file or directory.
(kgdb) #0 __curthread () at ./machine/pcpu.h:230
#1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:347
#2 0xffffffff80710200 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:416
#3 0xffffffff8071072d in vpanic (fmt=<optimized out>, ap=0xfffffe004b2046b0)
at /usr/src/sys/kern/kern_shutdown.c:812
#4 0xffffffff80710533 in panic (fmt=<unavailable>)
at /usr/src/sys/kern/kern_shutdown.c:739
#5 0xffffffff806f0fa1 in _mtx_lock_indefinite_check (m=<optimized out>,
ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224
#6 0xffffffff806f110b in thread_lock_flags_ (td=0xfffff8006205d000,
opts=<optimized out>, file=<optimized out>, line=<optimized out>)
at /usr/src/sys/kern/kern_mutex.c:913
#7 0xffffffff806ab0dc in statclock_cnt (cnt=1, usermode=<optimized out>)
at /usr/src/sys/kern/kern_clock.c:768
#8 0xffffffff80a72d73 in handleevents (now=6721246917305, fake=0)
at /usr/src/sys/kern/kern_clocksource.c:196
#9 0xffffffff80a73479 in timercb (et=0xffffffff80f80ad8 <lapic_et>,
arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353
#10 0xffffffff80ad6237 in lapic_handle_timer (frame=0xfffffe004b204880)
at /usr/src/sys/x86/x86/local_apic.c:1305
#11 0xffffffff80a1f080 in timerint_u ()
at /usr/src/sys/amd64/amd64/apic_vector.S:132
#12 0xfffffe004b204968 in ?? ()
#13 0x0000000000001b8d in ?? ()
#14 0x0000000000000800 in ?? ()
#15 0x00000000fffff7ff in ?? ()
#16 0x0000000080000034 in ?? ()
#17 0x00000000000001f4 in ?? ()
#18 0x00000000fffff8eb in ?? ()
#19 0xfffffe004b204968 in ?? ()
#20 0xfffffe004b204950 in ?? ()
#21 0x000000008000ba2f in ?? ()
#22 0x0000000000000032 in ?? ()
#23 0xfffff800c58d4000 in ?? ()
#24 0xfffffe004b204968 in ?? ()
#25 0xffffffff80c41014 in mtx_delay ()
#26 0x0000000000000000 in ?? ()

Looking at the commit logs, I suspect r329419, r329420, r329422,
and/or r320449 being possible candidates.

I'll try revert my source tree to r329418, and take it from there.
--
Trond.
Mark Millard
2018-02-18 17:35:16 UTC
Permalink
[Some more information added, from /usr/libexec/kgdb use.]
Post by Mark Millard
This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
Threadripper 1950X). No other Hyper-V use.
Trond's report seems to be for a "4 core" Intel i7 context (as seen
by FreeBSD in virtual box). So Ryzen seems to be non-essential for
reproduction.

Both of our reports are from some form of using FreeBSD in a virtual
machine (Hyper-V and VirtualBox). I do not know if that is a required
type of context or not.
Post by Mark Millard
# ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
Checking for old files
Deleting stale files in build tree...
cd /usr/src; MACHINE_ARCH=powerpc64 MACHINE=powerpc CPUTYPE= BUILD_TOOLS_META=.NOMETA CC="cc -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/" CXX="c++ -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/" CPP="cpp -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/" AS="/usr/local/powerpc64-unknown-freebsd12.0/bin/as" AR="/usr/local/powerpc64-unknown-freebsd12.0/bin/ar" LD="/usr/local/powerpc64-unknown-freebsd12.0/bin/ld" LLVM_LINK="" NM=/usr/local/powerpc64-unknown-freebsd12.0/bin/nm OBJCOPY="/usr/local/powerpc64-unknown-freebsd12.0/bin/objcopy" RANLIB=/usr/local/powerpc64-unknown-
freebsd12.0/bin/ranlib STRINGS=/usr/local/bin/powerpc64-unknown-freebsd12.0-strings SIZE="/usr/local/powerpc64-unknown-freebsd12.0/bin/size" INSTALL="sh /usr/src/tools/install.sh" PATH=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/usr/sbin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/usr/bin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/bin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/sbin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin SYSROOT=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp make -f Makefile.inc1 BWPHASE=worldtmp DESTDIR=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -DBATCH_DELETE_OLD_FILES delete-old d
elete-old-libs >/dev/null

load: 0.68 cmd: make 62180 [select] 25.15r 0.00u 0.00s 0% 1468k
make: Working in: /usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64
packet_write_wait: Connection to 192.168.1.165 port 22: Broken pipe


(I noticed the long pause and got the ^T in before the panic.)

Yet again it is xargs related fork activity that gets the problem (from core.txt.1 ):

561 Thread 100836 (PID=69982: xargs) fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:840
. . .
* 559 Thread 100811 (PID=62304: xargs) doadump (textdump=-2122191464) at pcpu.h:230

spin lock 0xffffffff81b3cf00 (sched lock 24) held by 0xfffff806aa6d5000 (tid 100836) too long
panic: spin lock held too long
cpuid = 24
time = 1518974055
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f11304d0
vpanic() at vpanic+0x18d/frame 0xfffffe00f1130530
panic() at panic+0x43/frame 0xfffffe00f1130590
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f11305a0
thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1130610
statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1130650
handleevents() at handleevents+0x113/frame 0xfffffe00f11306a0
timercb() at timercb+0xa9/frame 0xfffffe00f11306f0
lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1130730
timerint_u() at timerint_u+0x96/frame 0xfffffe00f1130810
thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1130880
fork1() at fork1+0x1b9f/frame 0xfffffe00f1130930
sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1130980
amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1130ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffc5a0




===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)
Trond Endrestøl
2018-02-18 20:38:55 UTC
Permalink
Note: -r329448 was reverted in -r329461 : racy.
True. I got a crash when compiling r329451 while running r329449.
I've now booted the r329422 ZFS BE and I'm attempting to build
r329529.
--
Trond.
Mateusz Guzik
2018-02-18 21:33:27 UTC
Permalink
On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
Post by Trond Endrestøl
Note: -r329448 was reverted in -r329461 : racy.
True. I got a crash when compiling r329451 while running r329449.
I've now booted the r329422 ZFS BE and I'm attempting to build
r329529.
Looking around strongly suggests r329448 is the culprit. If you can verify
329447 works fine we are mostly done here.

Note the revision got reverted and different variant got in in r329531.

That said, if r329447 works then the issue should be already fixed and in
particular fresh head should work fine.
--
Mateusz Guzik <mjguzik gmail.com>
Trond Endrestøl
2018-02-18 22:24:51 UTC
Permalink
Post by Mateusz Guzik
On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
Post by Trond Endrestøl
Note: -r329448 was reverted in -r329461 : racy.
True. I got a crash when compiling r329451 while running r329449.
I've now booted the r329422 ZFS BE and I'm attempting to build
r329529.
Looking around strongly suggests r329448 is the culprit. If you can verify
329447 works fine we are mostly done here.
I noticed no errors in r329447. When r329529 is built and installed,
I'll try to incrementally build and install r329531.
Post by Mateusz Guzik
Note the revision got reverted and different variant got in in r329531.
That said, if r329447 works then the issue should be already fixed and in
particular fresh head should work fine.
--
Trond.
Mateusz Guzik
2018-02-18 22:25:13 UTC
Permalink
Post by Mateusz Guzik
Post by Mateusz Guzik
On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
Post by Trond Endrestøl
Note: -r329448 was reverted in -r329461 : racy.
True. I got a crash when compiling r329451 while running r329449.
I've now booted the r329422 ZFS BE and I'm attempting to build
r329529.
Looking around strongly suggests r329448 is the culprit. If you can
verify
Post by Mateusz Guzik
329447 works fine we are mostly done here.
Note the revision got reverted and different variant got in in r329531.
That said, if r329447 works then the issue should be already fixed and
in
Post by Mateusz Guzik
particular fresh head should work fine.
My initial problem was with -r329465, which is after -r329461 reverted
-r329488 . Trond reported in one note that he had problems with
-r329464 , also after -r329488 was reverted. Trond has also reported
-r329449 failed.
Dumb typos above: I meant -r329448 instead of -r329488 both times.
Ok, I think I see the bug:

exit1 does:
PROC_SLOCK(p);
p->p_state = PRS_ZOMBIE;
/* work continues */

pre-patch proc_to_reap does an equivalent of:
if (p->p_state == PRS_ZOMBIE) {
PROC_SLOCK(p);
PROC_SUNLOCK(p);
.... reap;
}

It is possible the exiting thread will be caught just after setting the
state to PRS_ZOMBIE.

With the slock/sunlock cycle we guarantee the reaping thread will
wait for it to finish.

Without the cycle we can end up reaping the still exiting thread.

I'll fix it soon(tm).
--
Mateusz Guzik <mjguzik gmail.com>
Loading...