Discussion:
head -r325997: Fatal trap 12: page fault while in kernel mode (during
(too old to reply)
Mark Millard
2017-11-20 01:52:39 UTC
Permalink
Attempting a dump failed. I'm afraid all for
information is the below. The kernel was a
non-debug kernel (with debug information).

The following is hand typed from a screen shot:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xffffff53f000e2b0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80f2b11e
stack pointer = 0x0:0xfffffe01aeb28970
frame pointer = 0x0:0xfffffe01aeb289f0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 20 (pagedaemon)
[ thread pid 20 tid 100089 ]
Stopped at pmap_ts_referenced+0x72e: movq (%rcx,rdi,8),%rbx
bd > bt
Tracing pid 20 tid 100089 td 0xfffff80003eb3560
pmap_ts_referenced() at pmap_ts_referenced_0x72e/frame 0xfffffe01aeb289f0
vm_pageout() at vm_pageout+0xdeb/frame 0xfffffe01aeb28ab0
fork_exit() at fork_exit+0x82/frame 0xfffffe01aeb28ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01aeb28ab0
--- trap 0, rip = 0, rsp = 0, rpb = 0 ---
db>

The prior (cross) buildworld buildkernel had completed fine.

Until yesterday, I'd been running -r325700 or before and had not
seen such an issue ever before. I'd been using the virtualbox
version for a while before this as well.

===
Mark Millard
markmi at dsl-only.net
Mark Millard
2017-11-20 09:15:29 UTC
Permalink
[Adding some analysis of where the 2 failures were in
source code terms.]
[I got another of these. By the way: amd64 context.
Again: buildworld was running.]
Post by Mark Millard
Attempting a dump failed. I'm afraid all for
information is the below. The kernel was a
non-debug kernel (with debug information).
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xffffff53f000e2b0
New one: 0x806b49010
Post by Mark Millard
fault code = supervisor read data, page not present
New one: supervisor write data, page not present
Post by Mark Millard
instruction pointer = 0x20:0xffffffff80f2b11e
New one: 0x20:0xffffffff80f2b21b
Post by Mark Millard
stack pointer = 0x0:0xfffffe01aeb28970
New one: 0x28:0xfffffe01aeb28970
Post by Mark Millard
frame pointer = 0x0:0xfffffe01aeb289f0
New one: 0x28:0xfffffe01aeb289f0
Post by Mark Millard
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 20 (pagedaemon)
[ thread pid 20 tid 100089 ]
Stopped at pmap_ts_referenced+0x72e: movq (%rcx,rdi,8),%rbx
New one: pmap_ts_referenced+0x82b: movq %rcx,0x10(%rax)
Post by Mark Millard
bd > bt
Tracing pid 20 tid 100089 td 0xfffff80003eb3560
New one: td 0xfffff80003df6000
Post by Mark Millard
pmap_ts_referenced() at pmap_ts_referenced_0x72e/frame 0xfffffe01aeb289f0
pmap_ts_referenced() at pmap_ts_referenced_0x82b/frame 0xfffffe01aeb289f0
Post by Mark Millard
vm_pageout() at vm_pageout+0xdeb/frame 0xfffffe01aeb28ab0
Correction to original: frame 0xfffffe01aeb28a70
(new is the same)
Post by Mark Millard
fork_exit() at fork_exit+0x82/frame 0xfffffe01aeb28ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01aeb28ab0
--- trap 0, rip = 0, rsp = 0, rpb = 0 ---
db>
The prior (cross) buildworld buildkernel had completed fine.
Until yesterday, I'd been running -r325700 or before and had not
seen such an issue ever before. I'd been using the virtualbox
version for a while before this as well.
Taking the case of:

Stopped at pmap_ts_referenced+0x72e: movq (%rcx,rdi,8),%rbx:

ffffffff80f2b0fc <pmap_ts_referenced+0x70c> mov %rax,%rsi
ffffffff80f2b0ff <pmap_ts_referenced+0x70f> shr $0x1b,%rsi
ffffffff80f2b103 <pmap_ts_referenced+0x713> and $0xff8,%esi
ffffffff80f2b109 <pmap_ts_referenced+0x719> mov (%rcx,%rsi,1),%rcx
ffffffff80f2b10d <pmap_ts_referenced+0x71d> and %r10,%rcx
ffffffff80f2b110 <pmap_ts_referenced+0x720> or %r9,%rcx
ffffffff80f2b113 <pmap_ts_referenced+0x723> mov %eax,%edi
ffffffff80f2b115 <pmap_ts_referenced+0x725> shr $0x15,%edi
ffffffff80f2b118 <pmap_ts_referenced+0x728> and $0x1ff,%edi
ffffffff80f2b11e <pmap_ts_referenced+0x72e> mov (%rcx,%rdi,8),%rbx <<<<<<=======
ffffffff80f2b122 <pmap_ts_referenced+0x732> and %r10,%rbx
ffffffff80f2b125 <pmap_ts_referenced+0x735> or %r9,%rbx
ffffffff80f2b128 <pmap_ts_referenced+0x738> shr $0x9,%rax
ffffffff80f2b12c <pmap_ts_referenced+0x73c> and $0xff8,%eax
ffffffff80f2b131 <pmap_ts_referenced+0x741> lea (%rbx,%rax,1),%rsi
ffffffff80f2b135 <pmap_ts_referenced+0x745> mov (%rbx,%rax,1),%rbx
ffffffff80f2b139 <pmap_ts_referenced+0x749> mov %rbx,%rax
ffffffff80f2b13c <pmap_ts_referenced+0x74c> and %rdx,%rax
ffffffff80f2b13f <pmap_ts_referenced+0x74f> cmp %rdx,%rax
ffffffff80f2b142 <pmap_ts_referenced+0x752> jne ffffffff80f2b14f <pmap_ts_referenced+0x75f>

Which, if I understand right, is in the
"small_mappings:" code:

PG_A = pmap_accessed_bit(pmap);
PG_M = pmap_modified_bit(pmap);
PG_RW = pmap_rw_bit(pmap);
pde = pmap_pde(pmap, pv->pv_va);
KASSERT((*pde & PG_PS) == 0,
("pmap_ts_referenced: found a 2mpage in page %p's pv list",
m));
pte = pmap_pde_to_pte(pde, pv->pv_va);
if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW))
vm_page_dirty(m);
if ((*pte & PG_A) != 0) {


with the failure being during *pde in:

/* Return a pointer to the PT slot that corresponds to a VA */
static __inline pt_entry_t *
pmap_pde_to_pte(pd_entry_t *pde, vm_offset_t va)
{
pt_entry_t *pte;

pte = (pt_entry_t *)PHYS_TO_DMAP(*pde & PG_FRAME);
return (&pte[pmap_pte_index(va)]);
}



Taking the case of:

New one: pmap_ts_referenced+0x82b: movq %rcx,0x10(%rax)

ffffffff80f2b1fb <pmap_ts_referenced+0x80b> lock cmpxchg %rcx,(%rdx)
ffffffff80f2b200 <pmap_ts_referenced+0x810> sete %cl
ffffffff80f2b203 <pmap_ts_referenced+0x813> test %cl,%cl
ffffffff80f2b205 <pmap_ts_referenced+0x815> je ffffffff80f2b27d <pmap_ts_referenced+0x88d>
ffffffff80f2b207 <pmap_ts_referenced+0x817> test %r12,%r12
ffffffff80f2b20a <pmap_ts_referenced+0x81a> je ffffffff80f2b255 <pmap_ts_referenced+0x865>
ffffffff80f2b20c <pmap_ts_referenced+0x81c> mov 0x8(%r12),%rax
ffffffff80f2b211 <pmap_ts_referenced+0x821> test %rax,%rax
ffffffff80f2b214 <pmap_ts_referenced+0x824> je ffffffff80f2b255 <pmap_ts_referenced+0x865>
ffffffff80f2b216 <pmap_ts_referenced+0x826> mov 0x10(%r12),%rcx
ffffffff80f2b21b <pmap_ts_referenced+0x82b> mov %rcx,0x10(%rax) <<<<<<<<<=========
ffffffff80f2b21f <pmap_ts_referenced+0x82f> mov 0x8(%r12),%rax
ffffffff80f2b224 <pmap_ts_referenced+0x834> mov 0x10(%r12),%rcx
ffffffff80f2b229 <pmap_ts_referenced+0x839> mov %rax,(%rcx)

Which, if I understand right, appears to be during
the TAILQ_REMOVE of:

PMAP_UNLOCK(pmap);
/* Rotate the PV list if it has more than one entry. */
if (pv != NULL && TAILQ_NEXT(pv, pv_next) != NULL) {
TAILQ_REMOVE(&m->md.pv_list, pv, pv_next);
. . .

#define TAILQ_REMOVE(head, elm, field) do { \
QMD_SAVELINK(oldnext, (elm)->field.tqe_next); \
QMD_SAVELINK(oldprev, (elm)->field.tqe_prev); \
QMD_TAILQ_CHECK_NEXT(elm, field); \
QMD_TAILQ_CHECK_PREV(elm, field); \
if ((TAILQ_NEXT((elm), field)) != NULL) \
TAILQ_NEXT((elm), field)->field.tqe_prev = \
(elm)->field.tqe_prev; \
else { \
(head)->tqh_last = (elm)->field.tqe_prev; \
QMD_TRACE_HEAD(head); \
} \
*(elm)->field.tqe_prev = TAILQ_NEXT((elm), field); \
TRASHIT(*oldnext); \
TRASHIT(*oldprev); \
QMD_TRACE_ELEM(&(elm)->field); \
} while (0)

where the kernel was a non-debug kernel
(with debug symbols).

===
Mark Millard
markmi at dsl-only.net

Loading...