Discussion:
Fatal trap 12: page fault on Acer Chromebook 720 (peppy)
(too old to reply)
Michael Gmelin
2018-06-03 12:48:40 UTC
Permalink
Hi,

After upgrading CURRENT to r333992 (from something at least a year
old, quite some changes in mp_machdep.c since), this machine crashes
on boot:

Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
***@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM>
Structured Extended Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT>
VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <CORE COREBOOT>
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection violation
instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq %rax,(%rsi)
db>

Any key press in the debugger will reboot the machine.

Booting with kern.smp.disabled=1 works.

Any ideas?

-m
--
Michael Gmelin
Konstantin Belousov
2018-06-03 13:21:10 UTC
Permalink
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least a year
old, quite some changes in mp_machdep.c since), this machine crashes
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM>
Structured Extended Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT>
VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection violation
instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq %rax,(%rsi)
Look up the source line number for this address.
Post by Michael Gmelin
db>
Any key press in the debugger will reboot the machine.
Booting with kern.smp.disabled=1 works.
Any ideas?
-m
--
Michael Gmelin
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
Michael Gmelin
2018-06-03 14:55:00 UTC
Permalink
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least a year
old, quite some changes in mp_machdep.c since), this machine crashes
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
1994 The Regents of the University of California. All rights
reserved. FreeBSD is a registered trademark of The FreeBSD
Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based
on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced
performance. VT(vga): resolution 640x480
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45
Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance
statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered with
SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
(didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr), called by
native_start_all_aps. Any additional hints how I can track it down?

Thanks,
Michael
Post by Konstantin Belousov
Post by Michael Gmelin
db>
Any key press in the debugger will reboot the machine.
Booting with kern.smp.disabled=1 works.
Any ideas?
-m
--
Michael Gmelin
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to
--
Michael Gmelin
Konstantin Belousov
2018-06-03 15:04:23 UTC
Permalink
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least a year
old, quite some changes in mp_machdep.c since), this machine crashes
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
1994 The Regents of the University of California. All rights
reserved. FreeBSD is a registered trademark of The FreeBSD
Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22 00:31:04
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based
on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced
performance. VT(vga): resolution 640x480
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45
Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance
statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered with
SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
(didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr), called by
native_start_all_aps. Any additional hints how I can track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().

Just look up the source line by the address native_start_all_aps+0x08f.
Michael Gmelin
2018-06-03 19:50:20 UTC
Permalink
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least a
year old, quite some changes in mp_machdep.c since), this
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992,
1993, 1994 The Regents of the University of California. All
rights reserved. FreeBSD is a registered trademark of The
FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22
00:31:04 CEST 2018
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
(based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect
reduced performance. VT(vga): resolution 640x480 CPU: Intel(R)
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45
Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB) Event timer "LAPIC" quality
600 ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered with
SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
(didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran
1 processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
called by native_start_all_aps. Any additional hints how I can
track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
Okay, according to kgbd this should be here:

https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369

364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the same
level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the same
level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB pages for
1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }

-m

p.s. This machine uses quirks in biosmem.c, see

Type '?' for a list of command, 'help' for more detailed
help.
OK biosmem
bios_basemem: 0x9e400
bios_extmem: 0x3ff00000
memtop: 0x3c000000
high_heap_base: 0x3c000000
high_heap_size: 0x4000000
bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
--
Michael Gmelin
--
Michael Gmelin
Konstantin Belousov
2018-06-03 20:53:40 UTC
Permalink
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least a
year old, quite some changes in mp_machdep.c since), this
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992,
1993, 1994 The Regents of the University of California. All
rights reserved. FreeBSD is a registered trademark of The
FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22
00:31:04 CEST 2018
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
(based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect
reduced performance. VT(vga): resolution 640x480 CPU: Intel(R)
Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45
Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB) Event timer "LAPIC" quality
600 ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered with
SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
(didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type Ox1b
= DPL 0, pres 1, long 1, def32 0, gran
1 processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
called by native_start_all_aps. Any additional hints how I can
track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the same
level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the same
level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB pages for
1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }
-m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text. It is consistent with
the faulting address. It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted, since
ro protection is somewhat recent.

It seems that mp_bootaddress() selected the bad place for the bootstrap
page tables. Even more, we do not include the kernel text into the
physmem[] array, so it is not clear how did it happen. This code was
also changed recently.

Can you add the print of the physmap[] array somewhere before the panic,
to see what is the kernel idea of the available memory ? It should
be already done if you have serial console and set debug.late_console
tunable to 0.
Post by Michael Gmelin
p.s. This machine uses quirks in biosmem.c, see
Type '?' for a list of command, 'help' for more detailed
help.
OK biosmem
bios_basemem: 0x9e400
bios_extmem: 0x3ff00000
memtop: 0x3c000000
high_heap_base: 0x3c000000
high_heap_size: 0x4000000
bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
--
Michael Gmelin
--
Michael Gmelin
Michael Gmelin
2018-06-03 22:46:32 UTC
Permalink
On Sun, 3 Jun 2018 23:53:40 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least
a year old, quite some changes in mp_machdep.c since), this
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
1992, 1993, 1994 The Regents of the University of
California. All rights reserved. FreeBSD is a registered
trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT
#1 r333992: Tue May 22 00:31:04 CEST 2018
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
(based on LLVM 6.0.0) WARNING: WITNESS option enabled,
expect reduced performance. VT(vga): resolution 640x480
K8-class CPU) Origin="GenuineIntel" Id=0x40651
Family=0x6 Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB) Event timer "LAPIC"
quality 600 ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered
with SeaBIOS 20131018_145217-build121-m2). So I didn't flash
anything (didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type
Ox1b = DPL 0, pres 1, long 1, def32 0, gran
1 processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
called by native_start_all_aps. Any additional hints how I can
track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the same
level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the same
level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB pages
for 1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }
-m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text. It is consistent with
the faulting address. It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted, since
ro protection is somewhat recent.
It seems that mp_bootaddress() selected the bad place for the
bootstrap page tables. Even more, we do not include the kernel text
into the physmem[] array, so it is not clear how did it happen. This
code was also changed recently.
Can you add the print of the physmap[] array somewhere before the
panic, to see what is the kernel idea of the available memory ? It
should be already done if you have serial console and set
debug.late_console tunable to 0.
This is a sad little machine without any kind of serial console.

Physmap looks like this after calling getmemsize():

[0]: 0x10000
[1]: 0x30000
[2]: 0x40000
[3]: 0x9e000
[4]: 0x100000
[5]: 0xf00000
[6]: 0x1003000
[7]: 0x7bf7a000

Physical memory chunks logged in cpu_startup are:

0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096 pages)
0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512 pages)

-m
Post by Konstantin Belousov
Post by Michael Gmelin
p.s. This machine uses quirks in biosmem.c, see
Type '?' for a list of command, 'help' for more detailed
help.
OK biosmem
bios_basemem: 0x9e400
bios_extmem: 0x3ff00000
memtop: 0x3c000000
high_heap_base: 0x3c000000
high_heap_size: 0x4000000
bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
--
Michael Gmelin
--
Michael Gmelin
--
Michael Gmelin
Konstantin Belousov
2018-06-04 11:06:55 UTC
Permalink
Post by Michael Gmelin
On Sun, 3 Jun 2018 23:53:40 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at least
a year old, quite some changes in mp_machdep.c since), this
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
1992, 1993, 1994 The Regents of the University of
California. All rights reserved. FreeBSD is a registered
trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT
#1 r333992: Tue May 22 00:31:04 CEST 2018
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
(based on LLVM 6.0.0) WARNING: WITNESS option enabled,
expect reduced performance. VT(vga): resolution 640x480
K8-class CPU) Origin="GenuineIntel" Id=0x40651
Family=0x6 Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102 MB)
avail memory = 1907572736 (1819 MB) Event timer "LAPIC"
quality 600 ACPI APIC Table: <CORE COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was delivered
with SeaBIOS 20131018_145217-build121-m2). So I didn't flash
anything (didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data, protection
violation instruction pointer = 0x20:Oxffffffff8102955f
stack pointer = 0x28:0xffffffff82a79be0
frame pointer = 0x28:0xffffffff82a79c10
code segment = base Ox0, limit Oxfffff, type
Ox1b = DPL 0, pres 1, long 1, def32 0, gran
1 processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
called by native_start_all_aps. Any additional hints how I can
track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP in
native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the same
level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the same
level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB pages
for 1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }
-m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text. It is consistent with
the faulting address. It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted, since
ro protection is somewhat recent.
It seems that mp_bootaddress() selected the bad place for the
bootstrap page tables. Even more, we do not include the kernel text
into the physmem[] array, so it is not clear how did it happen. This
code was also changed recently.
Can you add the print of the physmap[] array somewhere before the
panic, to see what is the kernel idea of the available memory ? It
should be already done if you have serial console and set
debug.late_console tunable to 0.
This is a sad little machine without any kind of serial console.
[0]: 0x10000
[1]: 0x30000
[2]: 0x40000
[3]: 0x9e000
[4]: 0x100000
[5]: 0xf00000
[6]: 0x1003000
[7]: 0x7bf7a000
0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
These two chunks reports are consistent with the physmap[0-1, 2-3].
Post by Michael Gmelin
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096 pages)
0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512 pages)
But these three looks completely unrelated to the rest of the physmap,
perhaps except the physmap[4]. We allocate boot pages from the top
of the last physmap chunk, but I am certain that we do not consume
that much memory for boot to make physmap[7] from the last reported
address.

Are you sure that there are no typos in the values above ?
Post by Michael Gmelin
-m
Post by Konstantin Belousov
Post by Michael Gmelin
p.s. This machine uses quirks in biosmem.c, see
Type '?' for a list of command, 'help' for more detailed
help.
OK biosmem
bios_basemem: 0x9e400
bios_extmem: 0x3ff00000
memtop: 0x3c000000
high_heap_base: 0x3c000000
high_heap_size: 0x4000000
bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
--
Michael Gmelin
--
Michael Gmelin
--
Michael Gmelin
Michael Gmelin
2018-06-04 21:17:56 UTC
Permalink
On Mon, 4 Jun 2018 14:06:55 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 23:53:40 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at
least a year old, quite some changes in mp_machdep.c
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
1992, 1993, 1994 The Regents of the University of
California. All rights reserved. FreeBSD is a registered
trademark of The FreeBSD Foundation. FreeBSD
12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
amd64 FreeBSD clang version 6.0.0
(tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
WARNING: WITNESS option enabled, expect reduced
performance. VT(vga): resolution 640x480 CPU: Intel(R)
Origin="GenuineIntel" Id=0x40651 Family=0x6
Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102
MB) avail memory = 1907572736 (1819 MB) Event timer
"LAPIC" quality 600 ACPI APIC Table: <CORE
COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was
delivered with SeaBIOS 20131018_145217-build121-m2). So I
didn't flash anything (didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data,
protection violation instruction pointer =
0x20:Oxffffffff8102955f stack pointer =
0x28:0xffffffff82a79be0 frame pointer =
0x28:0xffffffff82a79c10 code segment = base
Ox0, limit Oxfffff, type Ox1b = DPL 0, pres 1, long 1,
def32 0, gran 1 processor eflags = resume, IOPL
= 0 current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in
rdmsr), called by native_start_all_aps. Any additional
hints how I can track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP
in native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the
same level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the
same level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB
pages for 1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }
-m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text. It is consistent with
the faulting address. It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted,
since ro protection is somewhat recent.
It seems that mp_bootaddress() selected the bad place for the
bootstrap page tables. Even more, we do not include the kernel
text into the physmem[] array, so it is not clear how did it
happen. This code was also changed recently.
Can you add the print of the physmap[] array somewhere before the
panic, to see what is the kernel idea of the available memory ?
It should be already done if you have serial console and set
debug.late_console tunable to 0.
This is a sad little machine without any kind of serial console.
[0]: 0x10000
[1]: 0x30000
[2]: 0x40000
[3]: 0x9e000
[4]: 0x100000
[5]: 0xf00000
[6]: 0x1003000
[7]: 0x7bf7a000
0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
These two chunks reports are consistent with the physmap[0-1, 2-3].
Post by Michael Gmelin
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096
pages) 0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512
pages)
But these three looks completely unrelated to the rest of the physmap,
perhaps except the physmap[4]. We allocate boot pages from the top
of the last physmap chunk, but I am certain that we do not consume
that much memory for boot to make physmap[7] from the last reported
address.
Are you sure that there are no typos in the values above ?
Double checked the numbers. I changed it a bit more,
so that debug output appears all on one page. Please see here for the
results:

https://gist.github.com/grembo/cebb9f7e2a98c37a51bee1e508f7d890

This is how I generated the output:

Index: sys/amd64/amd64/machdep.c
===================================================================
--- sys/amd64/amd64/machdep.c (revision 333992)
+++ sys/amd64/amd64/machdep.c (working copy)
@@ -1215,7 +1215,7 @@
* XXX first should be vm_paddr_t.
*/
static void
-getmemsize(caddr_t kmdp, u_int64_t first)
+getmemsize(caddr_t kmdp, u_int64_t first, int* physmap_idx_out, vm_paddr_t* physmap_out)
{
int i, physmap_idx, pa_indx, da_indx;
vm_paddr_t pa, physmap[PHYSMAP_SIZE];
@@ -1482,6 +1482,10 @@

/* Map the message buffer. */
msgbufp = (struct msgbuf *)PHYS_TO_DMAP(phys_avail[pa_indx]);
+
+ *physmap_idx_out = physmap_idx;
+ for (int i=0; i<physmap_idx; ++i)
+ physmap_out[i] = physmap[i];
}

static caddr_t
@@ -1553,6 +1557,8 @@
char *env;
size_t kstack0_sz;
int late_console;
+ int physmap_idx;
+ vm_paddr_t physmap[PHYSMAP_SIZE];

TSRAW(&thread0, TS_ENTER, __func__, NULL);

@@ -1759,7 +1765,7 @@
amd64_kdb_init();
}

- getmemsize(kmdp, physfree);
+ getmemsize(kmdp, physfree, &physmap_idx, &physmap[0]);
init_param2(physmem);

/* now running on new page tables, configured,and u/iom is accessible */
@@ -1767,6 +1773,22 @@
if (late_console)
cninit();

+ printf("Physmap index: %i\n", physmap_idx);
+ for (int i=0; i<physmap_idx; ++i)
+ printf("Physmap %i: 0x%016jx\n", i, (uintmax_t)physmap[i]);
+ printf("---------\n");
+
+ for (int i = 0; phys_avail[i + 1] != 0; i += 2) {
+ vm_paddr_t size;
+
+ size = phys_avail[i + 1] - phys_avail[i];
+ printf(
+ "0x%016jx - 0x%016jx, %ju bytes (%ju pages)\n",
+ (uintmax_t)phys_avail[i],
+ (uintmax_t)phys_avail[i + 1] - 1,
+ (uintmax_t)size, (uintmax_t)size / PAGE_SIZE);
+ }
+
#ifdef DEV_ISA
#ifdef DEV_ATPIC
elcr_probe();


-m
--
Michael Gmelin
--
Michael Gmelin
Konstantin Belousov
2018-06-05 13:11:35 UTC
Permalink
Post by Michael Gmelin
On Mon, 4 Jun 2018 14:06:55 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 23:53:40 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something at
least a year old, quite some changes in mp_machdep.c
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
1992, 1993, 1994 The Regents of the University of
California. All rights reserved. FreeBSD is a registered
trademark of The FreeBSD Foundation. FreeBSD
12.0-CURRENT #1 r333992: Tue May 22 00:31:04 CEST 2018
amd64 FreeBSD clang version 6.0.0
(tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
WARNING: WITNESS option enabled, expect reduced
performance. VT(vga): resolution 640x480 CPU: Intel(R)
Origin="GenuineIntel" Id=0x40651 Family=0x6
Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
performance statistics real memory = 4301258752 (4102
MB) avail memory = 1907572736 (1819 MB) Event timer
"LAPIC" quality 600 ACPI APIC Table: <CORE
COREBOOT>
What does this mean ? Did you flashed coreboot ?
This machine comes with it by default (my model was
delivered with SeaBIOS 20131018_145217-build121-m2). So I
didn't flash anything (didn't feel like bricking it).
Post by Konstantin Belousov
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data,
protection violation instruction pointer =
0x20:Oxffffffff8102955f stack pointer =
0x28:0xffffffff82a79be0 frame pointer =
0x28:0xffffffff82a79c10 code segment = base
Ox0, limit Oxfffff, type Ox1b = DPL 0, pres 1, long 1,
def32 0, gran 1 processor eflags = resume, IOPL
= 0 current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at native_start_all_aps+0x08f: movq
%rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in
rdmsr), called by native_start_all_aps. Any additional
hints how I can track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing DMAP
in native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to the
same level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to the
same level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
373 pt3[i] |= PG_V | PG_RW | PG_U; 374
375 /* The level 2 page slots are mapped with 2MB
pages for 1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024);
377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
378 }
-m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text. It is consistent with
the faulting address. It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted,
since ro protection is somewhat recent.
It seems that mp_bootaddress() selected the bad place for the
bootstrap page tables. Even more, we do not include the kernel
text into the physmem[] array, so it is not clear how did it
happen. This code was also changed recently.
Can you add the print of the physmap[] array somewhere before the
panic, to see what is the kernel idea of the available memory ?
It should be already done if you have serial console and set
debug.late_console tunable to 0.
This is a sad little machine without any kind of serial console.
[0]: 0x10000
[1]: 0x30000
[2]: 0x40000
[3]: 0x9e000
[4]: 0x100000
[5]: 0xf00000
[6]: 0x1003000
[7]: 0x7bf7a000
0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
These two chunks reports are consistent with the physmap[0-1, 2-3].
Post by Michael Gmelin
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096
pages) 0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512
pages)
But these three looks completely unrelated to the rest of the physmap,
perhaps except the physmap[4]. We allocate boot pages from the top
of the last physmap chunk, but I am certain that we do not consume
that much memory for boot to make physmap[7] from the last reported
address.
Are you sure that there are no typos in the values above ?
Double checked the numbers. I changed it a bit more,
so that debug output appears all on one page. Please see here for the
https://gist.github.com/grembo/cebb9f7e2a98c37a51bee1e508f7d890
Ok, I have a guess what is going on. Does the result of the quirks
end up as hw.physmem tunable passed to kernel ? It seems that there
is physmap[] element pointing outside the DMAP-mapped region.

Perhaps print the dmap limit too, to see whether I am on the right
track.

Try the following change. It lacks i386 bits.

diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index e5c69ed91fa..bd6bbf04006 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1254,7 +1254,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
* in real mode mode (e.g. SMP bare metal).
*/
if (init_ops.mp_bootaddress)
- init_ops.mp_bootaddress(physmap, &physmap_idx);
+ init_ops.mp_bootaddress(physmap, &physmap_idx, first);

/*
* Maxmem isn't the "maximum memory", it's one larger than the
diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c
index 30146142087..292a6cefa91 100644
--- a/sys/amd64/amd64/mp_machdep.c
+++ b/sys/amd64/amd64/mp_machdep.c
@@ -103,7 +103,8 @@ static int start_ap(int apic_id);
* Calculate usable address in base memory for AP trampoline code.
*/
void
-mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx)
+mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx,
+ vm_paddr_t dmap_limit)
{
unsigned int i;
bool allocated;
@@ -117,8 +118,9 @@ mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx)
* store the initial page tables. Note that it needs to be
* aligned to a page boundary.
*/
- if (physmap[i] >= GiB(4) ||
- (physmap[i + 1] - round_page(physmap[i])) < (PAGE_SIZE * 3))
+ if (physmap[i] >= GiB(4) || physmap[i + 1] -
+ round_page(physmap[i]) < PAGE_SIZE * 3 ||
+ physmap[i + 1] - PAGE_SIZE * 3 > dmap_limit)
continue;

allocated = true;
diff --git a/sys/amd64/include/smp.h b/sys/amd64/include/smp.h
index 2ecfe62cf9f..24f0580fe51 100644
--- a/sys/amd64/include/smp.h
+++ b/sys/amd64/include/smp.h
@@ -58,7 +58,7 @@ void invlpg_pcid_handler(void);
void invlrng_invpcid_handler(void);
void invlrng_pcid_handler(void);
int native_start_all_aps(void);
-void mp_bootaddress(vm_paddr_t *, unsigned int *);
+void mp_bootaddress(vm_paddr_t *, unsigned int *, vm_paddr_t);

#endif /* !LOCORE */
#endif /* SMP */
diff --git a/sys/x86/include/init.h b/sys/x86/include/init.h
index 880cabaa949..58bbe0a5fd6 100644
--- a/sys/x86/include/init.h
+++ b/sys/x86/include/init.h
@@ -41,7 +41,7 @@ struct init_ops {
void (*early_clock_source_init)(void);
void (*early_delay)(int);
void (*parse_memmap)(caddr_t, vm_paddr_t *, int *);
- void (*mp_bootaddress)(vm_paddr_t *, unsigned int *);
+ void (*mp_bootaddress)(vm_paddr_t *, unsigned int *, vm_paddr_t);
int (*start_all_aps)(void);
void (*msi_init)(void);
};
Michael Gmelin
2018-06-05 23:06:25 UTC
Permalink
On Tue, 5 Jun 2018 16:11:35 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Mon, 4 Jun 2018 14:06:55 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 23:53:40 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 18:04:23 +0300
Post by Konstantin Belousov
Post by Michael Gmelin
On Sun, 3 Jun 2018 16:21:10 +0300
On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael
Post by Michael Gmelin
Hi,
After upgrading CURRENT to r333992 (from something
at least a year old, quite some changes in
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989,
1991, 1992, 1993, 1994 The Regents of the
University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD
Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue
May 22 00:31:04 CEST 2018
amd64 FreeBSD clang version 6.0.0
(tags/RELEASE_600/final 326565) (based on LLVM
6.0.0) WARNING: WITNESS option enabled, expect
reduced performance. VT(vga): resolution 640x480
(1396.80-MHz K8-class CPU) Origin="GenuineIntel"
Id=0x40651 Family=0x6 Model=0x45 Stepping=1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
AMD
Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM> Structured Extended
Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT> VT-x: (disabled in
BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state
invariant, performance statistics real memory =
4301258752 (4102 MB) avail memory = 1907572736
(1819 MB) Event timer "LAPIC" quality 600 ACPI APIC
Table: <CORE
COREBOOT>
What does this mean ? Did you flashed
coreboot ?
This machine comes with it by default (my model was
delivered with SeaBIOS 20131018_145217-build121-m2). So
I didn't flash anything (didn't feel like bricking it).
Post by Michael Gmelin
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xfffff80001000000
fault code = supervisor write data,
protection violation instruction pointer =
0x20:Oxffffffff8102955f stack pointer =
0x28:0xffffffff82a79be0 frame pointer =
0x28:0xffffffff82a79c10 code segment =
base Ox0, limit Oxfffff, type Ox1b = DPL 0, pres 1,
long 1, def32 0, gran 1 processor eflags =
resume, IOPL = 0 current process = 0 ()
[ thread pid 0 tid 0 ]
movq %rax,(%rsi)
Look up the source line number for this address.
I guess that's sys/amd64/amd64/support.S line 854 (in
rdmsr), called by native_start_all_aps. Any additional
hints how I can track it down?
Why did you decided that this is rdmsr_safe() ? First,
native_start_all_aps() does not call rdmsr, second the ddb
report clearly indicates that the fault occured acessing
DMAP in native_start_all_aps().
Just look up the source line by the address
native_start_all_aps+0x08f.
https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
364
365 /* Create the initial 1GB replicated page tables */
366 for (i = 0; i < 512; i++) {
367 /* Each slot of the level 4 pages points to
the same level 3 page */ 368 pt4[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
pt4[i] |= PG_V | PG_RW | PG_U; 370
371 /* Each slot of the level 3 pages points to
the same level 2 page */ 372 pt3[i] =
(u_int64_t)(uintptr_t)(mptramp_pagetables + (2 *
PAGE_SIZE)); 373 pt3[i] |= PG_V | PG_RW | PG_U;
374 375 /* The level 2 page slots are mapped
with 2MB pages for 1GB. */ 376 pt2[i] = i * (2 *
1024 * 1024); 377 pt2[i] |= PG_V | PG_RW | PG_PS
| PG_U; 378 }
-m
You have fault on write due to read-only mapping of the
portion of the direct map, which maps the kernel text. It is
consistent with the faulting address. It is not clear if it
is something new on your machine, or before the kernel text
was silently corrupted, since ro protection is somewhat
recent.
It seems that mp_bootaddress() selected the bad place for the
bootstrap page tables. Even more, we do not include the kernel
text into the physmem[] array, so it is not clear how did it
happen. This code was also changed recently.
Can you add the print of the physmap[] array somewhere before
the panic, to see what is the kernel idea of the available
memory ? It should be already done if you have serial console
and set debug.late_console tunable to 0.
This is a sad little machine without any kind of serial console.
[0]: 0x10000
[1]: 0x30000
[2]: 0x40000
[3]: 0x9e000
[4]: 0x100000
[5]: 0xf00000
[6]: 0x1003000
[7]: 0x7bf7a000
0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
These two chunks reports are consistent with the physmap[0-1, 2-3].
Post by Michael Gmelin
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256
pages) 0x0000000002c00000 - 0x0000000075467fff, 1921417216
bytes (469096 pages) 0x0000000100000000 - 0x00000001005e7fff,
6193152 bytes (1512 pages)
But these three looks completely unrelated to the rest of the
physmap, perhaps except the physmap[4]. We allocate boot pages
from the top of the last physmap chunk, but I am certain that we
do not consume that much memory for boot to make physmap[7] from
the last reported address.
Are you sure that there are no typos in the values above ?
Double checked the numbers. I changed it a bit more,
so that debug output appears all on one page. Please see here for
https://gist.github.com/grembo/cebb9f7e2a98c37a51bee1e508f7d890
Ok, I have a guess what is going on. Does the result of the quirks
end up as hw.physmem tunable passed to kernel ? It seems that there
is physmap[] element pointing outside the DMAP-mapped region.
Perhaps print the dmap limit too, to see whether I am on the right
track.
Try the following change. It lacks i386 bits.
diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index e5c69ed91fa..bd6bbf04006 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -1254,7 +1254,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
* in real mode mode (e.g. SMP bare metal).
*/
if (init_ops.mp_bootaddress)
- init_ops.mp_bootaddress(physmap, &physmap_idx);
+ init_ops.mp_bootaddress(physmap, &physmap_idx,
first);
/*
* Maxmem isn't the "maximum memory", it's one larger than
the diff --git a/sys/amd64/amd64/mp_machdep.c
b/sys/amd64/amd64/mp_machdep.c index 30146142087..292a6cefa91 100644
--- a/sys/amd64/amd64/mp_machdep.c
+++ b/sys/amd64/amd64/mp_machdep.c
@@ -103,7 +103,8 @@ static int start_ap(int apic_id);
* Calculate usable address in base memory for AP trampoline code.
*/
void
-mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx)
+mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx,
+ vm_paddr_t dmap_limit)
{
unsigned int i;
bool allocated;
@@ -117,8 +118,9 @@ mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx)
* store the initial page tables. Note that it needs
to be
* aligned to a page boundary.
*/
- if (physmap[i] >= GiB(4) ||
- (physmap[i + 1] - round_page(physmap[i])) <
(PAGE_SIZE * 3))
+ if (physmap[i] >= GiB(4) || physmap[i + 1] -
+ round_page(physmap[i]) < PAGE_SIZE * 3 ||
+ physmap[i + 1] - PAGE_SIZE * 3 > dmap_limit)
continue;
allocated = true;
diff --git a/sys/amd64/include/smp.h b/sys/amd64/include/smp.h
index 2ecfe62cf9f..24f0580fe51 100644
--- a/sys/amd64/include/smp.h
+++ b/sys/amd64/include/smp.h
@@ -58,7 +58,7 @@ void invlpg_pcid_handler(void);
void invlrng_invpcid_handler(void);
void invlrng_pcid_handler(void);
int native_start_all_aps(void);
-void mp_bootaddress(vm_paddr_t *, unsigned int *);
+void mp_bootaddress(vm_paddr_t *, unsigned int *, vm_paddr_t);
#endif /* !LOCORE */
#endif /* SMP */
diff --git a/sys/x86/include/init.h b/sys/x86/include/init.h
index 880cabaa949..58bbe0a5fd6 100644
--- a/sys/x86/include/init.h
+++ b/sys/x86/include/init.h
@@ -41,7 +41,7 @@ struct init_ops {
void (*early_clock_source_init)(void);
void (*early_delay)(int);
void (*parse_memmap)(caddr_t, vm_paddr_t *, int *);
- void (*mp_bootaddress)(vm_paddr_t *, unsigned int *);
+ void (*mp_bootaddress)(vm_paddr_t *, unsigned int *,
vm_paddr_t); int (*start_all_aps)(void);
void (*msi_init)(void);
};
With the patch I could boot without problems and the machine appears to
be stable (ran some high load & memory intensive tests - by the way,
the machine only has 2gb of ram [even though 4g are reported on boot -
usable memory appears to be reported ok]).

Thanks,
Michael
--
Michael Gmelin
Loading...