Discussion:
head -r331499 amd64/threadripper panic in vm_page_free_prep during
(too old to reply)
Mark Millard
2018-03-25 17:41:38 UTC
Permalink
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
to the problem. This was after 14 hours or so of building:

. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16

So I've no clue if or how to repeat this.

Unfortunately dump was unsuccessful. So all I have is the
backtrace. Hand typed from a screen shot of the console
window:

cpuid = 18
time = 1521986594
KDB: stack backtrace:
db_trace_self_srapper() at db_trace_self_srapper+0x2b/frame 0xfffffe00f2e132a0
vpanic() at vpanic+0x18d/frame 0xfffffe00f2e13300
panic() at panic+0c43/frame 0xfffffe00f2e13360
vm_page_free_prep() at vm_page_free_prep+0x174/frame 0xfffffe00f2e13390
vm_page_free_toq() at vm_page_free_toq+0x11/frame 0xfffffe00f2e133b0
unlock_and_deallocate() at unlock_and_deallocate+0xbb/frame 0xfffffe00f2e133d0
vm_fault_hold() at vm_fault_hold+0x1d04/frame 0xfffffe00f2e13500
proc_rwmem() at proc_rwmem+0x8d/frame 0xfffffe00f2e13570
proc_readmem() at proc_readmem+0x46/frame 0xfffffe00f2e135d0
get_proc_vector() at get_proc_vector+0x16e/frame 0xfffffe00f2e13660
proc_getauxv() at proc_getauxv+0x26/frame 0xfffffe00f2e136a0
elf64_note_procstat_auxv() at elf64_note_procstat_auxv+0x1ee/frame 0xfffffe00f2e136f0
elf64_coredump() at elf64coredump+0x57c7/frame 0xfffffe00f2e137c0
sigexit() at sigexit+0x76f/frame 0xfffffe00f2e139b0
postsig() at postsig+0x289/frame 0xfffffe00f2e13a70
ast() at ast+0x357/frame 0xfffffe00f2e13ab0
doreti_ast() at doreti_ast+0x1f/frame 0x706d6f6320432041
KBD: enter: panic
[ thread pid 61836 tid 101063 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why


The Hyper-V/Ryzen-Threadripper context was/is:

FreeBSD 12.0-CURRENT r331499M amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
SRAT: Ignoring memory at addr 0x1b28200000
VT(vga): text 80x25
Hyper-V Version: 10.0.16299 [SP0]
Features=0x2e7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ>
PM Features=0x0 [C2]
Features3=0xbed7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH,NPIEP>
Timecounter "Hyper-V" frequency 10000000 Hz quality 2000
CPU: AMD Ryzen Threadripper 1950X 16-Core Processor (3393.73-MHz K8-class CPU)
Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1
Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT>
Features2=0xfed83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
AMD Features2=0x3f3<LAHF,CMP,CR8,ABM,SSE4A,MAS,Prefetch,OSVW>
Structured Extended Features=0x201c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,SHA>
XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
AMD Extended Feature Extensions ID EBX=0x4<XSaveErPtr>
Hypervisor: Origin = "Microsoft Hv"
real memory = 115964116992 (110592 MB)
avail memory = 112847249408 (107619 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <VRTUAL MICROSFT>
FreeBSD/SMP: Multiprocessor System Detected: 29 CPUs
FreeBSD/SMP: 1 package(s) x 29 core(s)

(I leave 3 hardware threads and some of the 128 GiBytes
of memory for Windows 10 Pro x64.)

FreeBSD and its swap are directly on NVMe SSDs, not in
NTFS file(s).


The M in -r331499M is for powerpc64/powerpc/arm64/armv7
related experiments, not amd64:

# svnlite status /usr/src/ | sort
? /usr/src/nohup.out
? /usr/src/sys/amd64/conf/GENERIC-DBG
? /usr/src/sys/amd64/conf/GENERIC-NODBG
? /usr/src/sys/arm/conf/GENERIC-DBG
? /usr/src/sys/arm/conf/GENERIC-NODBG
? /usr/src/sys/arm64/conf/GENERIC-DBG
? /usr/src/sys/arm64/conf/GENERIC-NODBG
? /usr/src/sys/dts/arm/a83t.dtsi
? /usr/src/sys/dts/arm/sinovoip-bpi-m3.dts
? /usr/src/sys/dts/arm/sun8i-a83t-sinovoip-bpi-m3.dts
? /usr/src/sys/dts/arm/sun8i-a83t.dtsi
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
M /usr/src/crypto/openssl/crypto/armcap.c
M /usr/src/lib/libkvm/kvm_powerpc.c
M /usr/src/lib/libkvm/kvm_private.c
M /usr/src/stand/defs.mk
M /usr/src/stand/powerpc/boot1.chrp/Makefile
M /usr/src/stand/powerpc/kboot/Makefile
M /usr/src/sys/arm64/arm64/identcpu.c
M /usr/src/sys/conf/kmod.mk
M /usr/src/sys/conf/ldscript.powerpc
M /usr/src/sys/kern/subr_pcpu.c
M /usr/src/sys/modules/dtb/allwinner/Makefile
M /usr/src/sys/powerpc/aim/mmu_oea64.c
M /usr/src/sys/powerpc/ofw/ofw_machdep.c
M /usr/src/sys/powerpc/powerpc/interrupt.c
M /usr/src/sys/powerpc/powerpc/mp_machdep.c
M /usr/src/sys/powerpc/powerpc/trap.c
M /usr/src/usr.bin/top/machine.c

That last is because I've modified top to also report
for swap:

Maximum Observed Used

(abbreviated: MaxObsUsed) when it is positive. For
example:

Swap: 256G Total, 483M Used, 483M MaxObsUsed, 256G Free



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Johnston
2018-03-25 18:34:21 UTC
Permalink
Post by Mark Millard
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
So I've no clue if or how to repeat this.
Unfortunately dump was unsuccessful.
What happened?
Post by Mark Millard
So all I have is the
backtrace. Hand typed from a screen shot of the console
Do you know what the panic message was? There are multiple calls to
panic() in vm_page_free_prep().
Mark Millard
2018-03-25 19:32:09 UTC
Permalink
Post by Mark Johnston
Post by Mark Millard
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
So I've no clue if or how to repeat this.
Unfortunately dump was unsuccessful.
What happened?
It reported:

(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.

** DUMP FAILED (ERROR 5) **
= 0x5
Post by Mark Johnston
Post by Mark Millard
So all I have is the
backtrace. Hand typed from a screen shot of the console
Do you know what the panic message was? There are multiple calls to
panic() in vm_page_free_prep().
No. I listed what I could see. The console screen does not have many
lines or rows and I was sleeping when the panic happened.

I redid a buildworld buildkernel installkernel installworld sequence
since then and it looks like the detailed addresses changed (as seen
in objdump now vs. what was on the console). But the relative offset
in vm_page_free_prep seem to be a match, at least for the instruction
after the "callq panic".

Looking at the kernel code I see:

. . .
<vm_page_free_prep+0x10> mov 0xffffffff81843690,%rax
<vm_page_free_prep+0x18> mov $0xffffffff81d6d880,%rcx
<vm_page_free_prep+0x1f> sub %rcx,%rax
<vm_page_free_prep+0x22> addq $0x1,%gs:(%rax)
<vm_page_free_prep+0x27> mov 0x54(%rbx),%eax
<vm_page_free_prep+0x2f> and $0x1,%eax
<vm_page_free_prep+0x32> jne <vm_page_free_prep+0x15a>
. . .
(several paths reach +0x106)
<vm_page_free_prep+0x106> movw $0x0,0x64(%rbx)
<vm_page_free_prep+0x10c> cmpl $0x0,0x50(%rbx)
<vm_page_free_prep+0x110> jne <vm_page_free_prep+0x163>
. . .
<vm_page_free_prep+0x15a> mov $0xffffffff8116628b,%rdi
<vm_page_free_prep+0x161> jmp <vm_page_free_prep+0x16a>
<vm_page_free_prep+0x163> mov $0xffffffff8120ca97,%rdi
<vm_page_free_prep+0x16a> xor %eax,%eax
<vm_page_free_prep+0x16c> mov %rbx,%rsi
<vm_page_free_prep+0x16f> callq <panic>
<vm_page_free_prep+0x174> nopw %cs:0x0(%rax,%rax,1)

No KASSERTS present (a non-debug build). That leaves:

if (vm_page_sbusied(m))
panic("vm_page_free: freeing busy page %p", m);
and:

if (m->wire_count != 0)
panic("vm_page_free: freeing wired page %p", m);

I do not have anything that lets me differentiate which
occurred based on the above detail. Sorry.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Johnston
2018-03-25 20:09:34 UTC
Permalink
Post by Mark Millard
Post by Mark Johnston
Post by Mark Millard
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
So I've no clue if or how to repeat this.
Unfortunately dump was unsuccessful.
What happened?
(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.
** DUMP FAILED (ERROR 5) **
= 0x5
Thanks. Do you happen to know if this occurs consistently under Hyper-V?
Post by Mark Millard
Post by Mark Johnston
Post by Mark Millard
So all I have is the
backtrace. Hand typed from a screen shot of the console
Do you know what the panic message was? There are multiple calls to
panic() in vm_page_free_prep().
No. I listed what I could see. The console screen does not have many
lines or rows and I was sleeping when the panic happened.
For future reference, you should be able to use "show panic" at the DDB
prompt to get the panic message.
Mark Millard
2018-03-25 20:48:14 UTC
Permalink
Post by Mark Johnston
Post by Mark Millard
Post by Mark Johnston
Post by Mark Millard
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
So I've no clue if or how to repeat this.
Unfortunately dump was unsuccessful.
What happened?
(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.
** DUMP FAILED (ERROR 5) **
= 0x5
Thanks. Do you happen to know if this occurs consistently under Hyper-V?
For both "this" being (A) the panic and (B) the attempt
to dump to the Optane SSD that holds the swap/page partition:

First ever occurrence of the activity, so nothing to compare
with.

The system sat at the db> prompt for a notable time while I
was sleeping. It kept its "cores" busy while I slept. (Hardware
threads being very active is visible from Windows 10 Pro x64's
Task Manager.)

It is rare that I try such a large bulk build. I do such mostly
just to test how well the Ryzen Threadripper context seems to be
doing or to otherwise test something about FreeBSD stability.
I do buildworld buildkernel for such testing as well. Sometimes
both poudriere ports-building and FreeBSD-building in parallel
for a time.

I have started "poudriere bulk -j<NAME> -w -a" again, letting
it continue from where it left off.
Post by Mark Johnston
Post by Mark Millard
Post by Mark Johnston
Post by Mark Millard
So all I have is the
backtrace. Hand typed from a screen shot of the console
Do you know what the panic message was? There are multiple calls to
panic() in vm_page_free_prep().
No. I listed what I could see. The console screen does not have many
lines or rows and I was sleeping when the panic happened.
For future reference, you should be able to use "show panic" at the DDB
prompt to get the panic message.
Dahhhh. Too obvious of a thing for me to think of checking for such on
my own. At least now I know. (It is not the first time that I could have
used that command.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard
2018-03-26 13:35:29 UTC
Permalink
[Unfortunately, I'd not be able to get back to this
for many hours. I do not want to leave the machine
at the db> prompt that long. So this is all there
will be.]

It got a different crash last night, after a little over 12
hours of poudriere bulk -a activity, again while I was
sleeping. Hand typed:

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 0d
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80b70867
stack pointer = 0x28:0xfffffe00ebab8880
frame pointer = 0x28:0xfffffe00ebab8890
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 44 (dom0)
[ thread pid 44 tid 100277 ]
Stopped at turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx

(So an offset from a null pointer, apparently.)

bt shows:

Tracing pid 44 tid 100277 td 0xfffff8010f938560
turnstile_broadcast() at turnstile_broadcast+0x47/frame 0xfffffe00ebab8890
__mtx_unlock_sleep() at __mtx_unlock_sleep+0xb9/frame 0xfffffe00ebab88c0
vm_pageout_page_lock() at vm_pageout_page_lock+0x179/frame 0xfffffe00ebab8960
vm_pageout_worker() at vm_pageout_worker+0xd3a/frame 0xfffffe00ebab8a50
vm_pageout() at vm_pageout+0x133/frame 0xfffffe00ebab8a70
fork_exit() at fork_exit+0x83/frame 0xfffffe00ebab8ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ebab8ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Dump again failed, the same way but with some byte
value differences.

(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 39 8c c7 00 00 08 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.

** DUMP FAILED (ERROR 5) **
Cannot dump: unknown error (error=5)

So this appears to be repeatable (for the Optane
swap/page partition?).

show reg:

cs 0x20
ds 0x3b ll+0x1a
es 0x3b ll+0x1a
fs 0x13
gs 0x1b
ss 0x28 ll+0x7
rax 0
rcx 0xfffff8010f938501
rdx 0xfffff8010f938501
rbx 0xfffffe00ebab8880
rsp 0xfffffe00ebab8800
rsi 0
rdi 0
r8 0
r9 0
r10 0
r11 0
r12 0
r13 0xfffff8010f938560
r14 0
r15 0xffffffff81d67998 vm_dom+0x18
rip 0xffffffff80b70867 turnstile_broadcast+0x47
rflags 0x10056
turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx

Around where rbx points:

0xfffffe00ebab8872: ab eb 0 fe ff ff 28 0 0 0 0 0 0 0
0xfffffe00ebab8880: 0 0 0 0 0 0 0 0 80 79 d6 81 ff ff
0xfffffe00ebab888e: ff ff c0 88 ab eb 0 fe ff ff 9 20 af 80
0xfffffe00ebab889c: ff ff ff ff 0 7b 2 d8 f f8 ff ff 98 79

And it looks like we have that null pointer above.

And I'm afraid that is it: I need to be off doing other things.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Mark Millard
2018-04-06 02:05:55 UTC
Permalink
Post by Mark Millard
[Unfortunately, I'd not be able to get back to this
for many hours. I do not want to leave the machine
at the db> prompt that long. So this is all there
will be.]
It got a different crash last night, after a little over 12
hours of poudriere bulk -a activity, again while I was
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 0d
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80b70867
stack pointer = 0x28:0xfffffe00ebab8880
frame pointer = 0x28:0xfffffe00ebab8890
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 44 (dom0)
[ thread pid 44 tid 100277 ]
Stopped at turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx
(So an offset from a null pointer, apparently.)
Tracing pid 44 tid 100277 td 0xfffff8010f938560
turnstile_broadcast() at turnstile_broadcast+0x47/frame 0xfffffe00ebab8890
__mtx_unlock_sleep() at __mtx_unlock_sleep+0xb9/frame 0xfffffe00ebab88c0
vm_pageout_page_lock() at vm_pageout_page_lock+0x179/frame 0xfffffe00ebab8960
vm_pageout_worker() at vm_pageout_worker+0xd3a/frame 0xfffffe00ebab8a50
vm_pageout() at vm_pageout+0x133/frame 0xfffffe00ebab8a70
fork_exit() at fork_exit+0x83/frame 0xfffffe00ebab8ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00ebab8ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Dump again failed, the same way but with some byte
value differences.
(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 39 8c c7 00 00 08 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.
** DUMP FAILED (ERROR 5) **
Cannot dump: unknown error (error=5)
So this appears to be repeatable (for the Optane
swap/page partition?).
cs 0x20
ds 0x3b ll+0x1a
es 0x3b ll+0x1a
fs 0x13
gs 0x1b
ss 0x28 ll+0x7
rax 0
rcx 0xfffff8010f938501
rdx 0xfffff8010f938501
rbx 0xfffffe00ebab8880
rsp 0xfffffe00ebab8800
rsi 0
rdi 0
r8 0
r9 0
r10 0
r11 0
r12 0
r13 0xfffff8010f938560
r14 0
r15 0xffffffff81d67998 vm_dom+0x18
rip 0xffffffff80b70867 turnstile_broadcast+0x47
rflags 0x10056
turnstile_broadcast+0x47: movq 0x20(%rbx,%rax,1),%rcx
0xfffffe00ebab8872: ab eb 0 fe ff ff 28 0 0 0 0 0 0 0
0xfffffe00ebab8880: 0 0 0 0 0 0 0 0 80 79 d6 81 ff ff
0xfffffe00ebab888e: ff ff c0 88 ab eb 0 fe ff ff 9 20 af 80
0xfffffe00ebab889c: ff ff ff ff 0 7b 2 d8 f f8 ff ff 98 79
And it looks like we have that null pointer above.
And I'm afraid that is it: I need to be off doing other things.
3 rounds of bulk -a spanning over 126 hours total
and I've not had any more failures. Between rounds
I updated /usr/src/ and did buildworld/buildkernel/install
sequences so I'd not be far behind head.

I'm giving up on directly trying to replicate either of
the two types of failures that I'd reported.

At least I know to "show panic" now.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard
2018-03-25 20:15:08 UTC
Permalink
[Just an added note about where in the sequence panic
messages are sent to the console vs. could potentially
be sent to the console.]
Post by Mark Millard
Post by Mark Johnston
Post by Mark Millard
FreeBSD panic'd while attempting to see if a "poudriere bulk -w -a"
would get the "unnecessary swapping" problem in my UFS-only context,
-r331499 (non-debug but with symbols), under Hyper-V. This is a
Ryzen Threadripper context, but I've no clue if that is important
. . .
[14:22:05] [18] [00:01:16] Finished devel/p5-Test-HTML-Tidy | p5-Test-HTML-Tidy-1.00_1: Success
[14:22:08] [18] [00:00:00] Building devel/ocaml-camlp5 | ocaml-camlp5-6.16
So I've no clue if or how to repeat this.
Unfortunately dump was unsuccessful.
What happened?
(da1:strovsc1:0:0:0) WRITE(10). CDB 2a 00 35 24 37 c7 00 00 0 00
(da1:storvsc1:0:0:0) CAM status Command timeout
(da1:storvsc1:0:0:0) Error 5, Retries exhausted
Aborting dump to to I/O error.
** DUMP FAILED (ERROR 5) **
= 0x5
Post by Mark Johnston
Post by Mark Millard
So all I have is the
backtrace. Hand typed from a screen shot of the console
Do you know what the panic message was? There are multiple calls to
panic() in vm_page_free_prep().
No. I listed what I could see. The console screen does not have many
lines or rows and I was sleeping when the panic happened.
I sometimes wonder if panic should repeat the panic message at the
end of the backtrace in order to deal with keeping it visible in
row-restricted console contexts.
Post by Mark Millard
I redid a buildworld buildkernel installkernel installworld sequence
since then and it looks like the detailed addresses changed (as seen
in objdump now vs. what was on the console). But the relative offset
in vm_page_free_prep seem to be a match, at least for the instruction
after the "callq panic".
. . .
<vm_page_free_prep+0x10> mov 0xffffffff81843690,%rax
<vm_page_free_prep+0x18> mov $0xffffffff81d6d880,%rcx
<vm_page_free_prep+0x1f> sub %rcx,%rax
<vm_page_free_prep+0x22> addq $0x1,%gs:(%rax)
<vm_page_free_prep+0x27> mov 0x54(%rbx),%eax
<vm_page_free_prep+0x2f> and $0x1,%eax
<vm_page_free_prep+0x32> jne <vm_page_free_prep+0x15a>
. . .
(several paths reach +0x106)
<vm_page_free_prep+0x106> movw $0x0,0x64(%rbx)
<vm_page_free_prep+0x10c> cmpl $0x0,0x50(%rbx)
<vm_page_free_prep+0x110> jne <vm_page_free_prep+0x163>
. . .
<vm_page_free_prep+0x15a> mov $0xffffffff8116628b,%rdi
<vm_page_free_prep+0x161> jmp <vm_page_free_prep+0x16a>
<vm_page_free_prep+0x163> mov $0xffffffff8120ca97,%rdi
<vm_page_free_prep+0x16a> xor %eax,%eax
<vm_page_free_prep+0x16c> mov %rbx,%rsi
<vm_page_free_prep+0x16f> callq <panic>
<vm_page_free_prep+0x174> nopw %cs:0x0(%rax,%rax,1)
if (vm_page_sbusied(m))
panic("vm_page_free: freeing busy page %p", m);
if (m->wire_count != 0)
panic("vm_page_free: freeing wired page %p", m);
I do not have anything that lets me differentiate which
occurred based on the above detail. Sorry.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Loading...