On Tue, 20 Feb 2018 12:39:53 +0100
Post by Gary JennejohnOn Mon, 19 Feb 2018 14:18:15 -0800
Post by Chris Hkernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d
and was wondering if it's anything to be concerned with, or whether
fsck(8) is fixing them.
FreeBSD dns0 12.0-CURRENT FreeBSD 12.0-CURRENT #0: Wed Dec 13 06:07:59 PST
which hadn't yet been hooked up to the UPS.
I performed an fsck in single user mode upon power-up. Which ended with the
mount points being masked CLEAN. I was asked if I wanted to use the JOURNAL.
I answered Y.
newfs -U -j
Thank you for all your time, and consideration.
fsck fixes these errors only when the user does NOT use the journal.
You should re-do the fsck.
When first these mysterious errors occured on several boxes running CURRENT,
that was in December 2017 if I'm right, I also whitnessed mysterious and
frequent crashes on several SSD driven machines, where this error described
above occured.
While the error vanished somehow in the meanwhile while CURRENT proceeds, the
crashes continued - on two boxes, I dumped restore the OS on the system's SSD
by reformatting the SSD from sratch (UFS2, soft update+ journaling). On those
boxes the mysterious crashes vanished since then!
On box left so far, my workstation. And this box continous to crash now and
started crashing today again while compiling world/kernel.
The fun-part is: even after a clean shutdown, where I can not detect any
filesystem inconsistencies and rebooting and, again: no reported
inconsistencies on the console/messages/logs, the box crashes spontanously. Now
(today) I could trigger the reboot by starting "make -j4 buildworld
buildkernel" and after showing the initial compiler statements/build framework
statements, the box went to Nirwana. A well known phenomenon right now.
I checked now the consistency of the filesystem, here is the result of
the /usr/obj tree, which is a dedicated GPT partition
(label: /dev/gpt/usr.obj):
[...]
***@box1:~ # fsck -fy /dev/gpt/usr.obj
** /dev/gpt/usr.obj
** Last Mounted on /usr/obj
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
UNALLOCATED I=515 OWNER=root MODE=0
SIZE=0 MTIME=Feb 22 07:25 2018
NAME=/usr/src/amd64.amd64/sys/BOX1/config.c.new
UNEXPECTED SOFT UPDATE INCONSISTENCY
REMOVE? yes
DIRECTORY CORRUPTED I=169691 OWNER=root MODE=40775
SIZE=1536 MTIME=Feb 22 05:16 2018
DIR=/usr/src/amd64.amd64/sys/BOX1/modules/usr/src/sys/modules/nfsd
UNEXPECTED SOFT UPDATE INCONSISTENCY
SALVAGE? yes
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
126922 files, 848197 used, 1178482 free (89210 frags, 136159 blocks, 4.4%
fragmentation)
***** FILE SYSTEM MARKED DIRTY *****
***** FILE SYSTEM WAS MODIFIED *****
***** PLEASE RERUN FSCK *****
[...]
When doing a installworld, I pre-emptively perform in single user mode before
mounting the partitions a "fsck -yf" two times. In most cases, the filesystem
are reported clean, but sometimes especially those under high I/O (/usr/src and
mostly /usr/obj on this build machine) there are reports of corruption.
As I reported, the very same behaviour occured on three boxes simultanously and
I got rid of it by completely reformatting the SSDs (never had issues so far
with HDD based boxes!).
I hope I can refurbish this weekend the remaining box and I could report, if
desired, whether this box returns to a healthy state as the others or if my
observation was a simple coincidence of issues ...
Thanks for the patience,
Oliver