Discussion:
panic: invalid bcd 194
(too old to reply)
Matthias Apitz
2017-12-30 20:03:17 UTC
Permalink
Hello,

I've got an older Acer C720 with r314251, which was not booted for some time,
and now panics on boot, also in single user mode, saying:

...
Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled
Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte sectors)
Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, expect reduced performance.
Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
panic: invalid bcd 194
...

The message comes from

$ find * -type f -exec fgrep "invalid bcd" {} /dev/null \;
sys/sys/libkern.h: ("invalid bcd %d", bcd));

$ vim sys/sys/libkern.h
...
#define LIBKERN_LEN_BCD2BIN 154
#define LIBKERN_LEN_BIN2BCD 100
#define LIBKERN_LEN_HEX2ASCII 36

static inline u_char
bcd2bin(int bcd)
{

KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
("invalid bcd %d", bcd));
return (bcd2bin_data[bcd]);
}

Any idea what could be damaged the system and what to do or check before
re-setup?

Thanks

matthias
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Matthias Apitz
2017-12-30 21:07:11 UTC
Permalink
Post by Matthias Apitz
Hello,
I've got an older Acer C720 with r314251, which was not booted for some time,
...
Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled
Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte sectors)
Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, expect reduced performance.
Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
panic: invalid bcd 194
...
The message comes from
$ find * -type f -exec fgrep "invalid bcd" {} /dev/null \;
sys/sys/libkern.h: ("invalid bcd %d", bcd));
$ vim sys/sys/libkern.h
...
#define LIBKERN_LEN_BCD2BIN 154
#define LIBKERN_LEN_BIN2BCD 100
#define LIBKERN_LEN_HEX2ASCII 36
static inline u_char
bcd2bin(int bcd)
{
KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
("invalid bcd %d", bcd));
return (bcd2bin_data[bcd]);
}
Any idea what could be damaged the system and what to do or check before
re-setup?
Show the backtrace.
Thanks, here we have it as photo: Loading Image...

matthias
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Konstantin Belousov
2017-12-30 21:11:54 UTC
Permalink
Post by Matthias Apitz
Post by Matthias Apitz
Hello,
I've got an older Acer C720 with r314251, which was not booted for some time,
...
Dec 30 19:54:26 c720-r314251 kernel: ada0: Command Queueing enabled
Dec 30 19:54:26 c720-r314251 kernel: ada0: 244198MB (500118192 512 byte sectors)
Dec 30 19:54:26 c720-r314251 kernel: WARNING: WITNESS option enabled, expect reduced performance.
Dec 30 19:54:26 c720-r314251 kernel: Trying to mount root from ufs:/dev/ada0p2 [rw,noatime]...
panic: invalid bcd 194
...
The message comes from
$ find * -type f -exec fgrep "invalid bcd" {} /dev/null \;
sys/sys/libkern.h: ("invalid bcd %d", bcd));
$ vim sys/sys/libkern.h
...
#define LIBKERN_LEN_BCD2BIN 154
#define LIBKERN_LEN_BIN2BCD 100
#define LIBKERN_LEN_HEX2ASCII 36
static inline u_char
bcd2bin(int bcd)
{
KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
("invalid bcd %d", bcd));
return (bcd2bin_data[bcd]);
}
Any idea what could be damaged the system and what to do or check before
re-setup?
Show the backtrace.
Thanks, here we have it as photo: http://www.unixarea.de/download_238222137_147226.jpg
For an immediate relief, enter the BIOS setup and set up the date. Try to
change it even if the BIOS date looks fine.

artc(4) should do more validation of the date read from CMOS, but this is
a known issue.
Matthias Apitz
2017-12-30 21:48:19 UTC
Permalink
Post by Konstantin Belousov
Post by Matthias Apitz
Post by Matthias Apitz
static inline u_char
bcd2bin(int bcd)
{
KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
("invalid bcd %d", bcd));
return (bcd2bin_data[bcd]);
}
Any idea what could be damaged the system and what to do or check before
re-setup?
Show the backtrace.
Thanks, here we have it as photo: http://www.unixarea.de/download_238222137_147226.jpg
For an immediate relief, enter the BIOS setup and set up the date. Try to
change it even if the BIOS date looks fine.
artc(4) should do more validation of the date read from CMOS, but this is
a known issue.
The problem with this hardware (Acer C720 Chromebook) is, there is no
BIOS setup, only somekind of SeaBIOS w/o any setup. Btw: An older
CURRENT from an USB key r285885 boots fine.

Any hints?

matthias
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Ian Lepore
2017-12-31 17:19:50 UTC
Permalink
Post by Matthias Apitz
Post by Matthias Apitz
static inline u_char
bcd2bin(int bcd)
{
        KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN,
            ("invalid bcd %d", bcd));
        return (bcd2bin_data[bcd]);
}
For an immediate relief, enter the BIOS setup and set up the date.  Try to
change it even if the BIOS date looks fine.
artc(4) should do more validation of the date read from CMOS, but this is
a known issue.
The problem with this hardware (Acer C720 Chromebook) is, there is no
BIOS setup, only somekind of SeaBIOS w/o any setup. Btw: An older
CURRENT from an USB key r285885 boots fine.
I have got a hint about that the problem showed up already in March this
http://freebsd.1045724.x6.nabble.com/panic-invalid-bcd-xxx-td6170480.html
Post by Matthias Apitz
http://dpaste.com/1K2W05E
If someone can test it, I'll gladly commit it.  The real-time clock will
likely be wrong, but it won't panic with INVARIANTS.
but the link is expired. Has got someone this patch? I checked the SVN
for the file sys/sys/libkern.h there is no relevant change since March
I will let the C720 over night under power while sitting in the boot menu,
maybe this will fix the RTC battery issue.
Last time I worked on RTC stuff, cleaning this up got put on my "to-do
some day" list.  I think maybe that day has arrived.

-- Ian
Kurt Jaeger
2018-01-01 08:57:23 UTC
Permalink
Hi!
For the moment we solved the issue by booting some older r28nnnn
memstick, writing a correct date with ntpdate into the RTC and rebooted
without poweroff. It seems that the RTC survives even some short
powercyle.
The CMOS battery is soldered on the motherboard of the Acer C720, i.e.
no chance to be replaced.
The issue must be fixed in FreeBSD, i.e. it should boot even with a
broken RTC. Should I file a PR for this?
Yes, please file a PR.
--
***@opsec.eu +49 171 3101372 2 years to go !
Matthias Apitz
2018-01-01 09:12:10 UTC
Permalink
Post by Kurt Jaeger
Hi!
For the moment we solved the issue by booting some older r28nnnn
memstick, writing a correct date with ntpdate into the RTC and rebooted
without poweroff. It seems that the RTC survives even some short
powercyle.
The CMOS battery is soldered on the motherboard of the Acer C720, i.e.
no chance to be replaced.
The issue must be fixed in FreeBSD, i.e. it should boot even with a
broken RTC. Should I file a PR for this?
Yes, please file a PR.
done.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224813
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Ian Lepore
2018-01-01 16:33:23 UTC
Permalink
Post by Matthias Apitz
Post by Kurt Jaeger
Hi!
For the moment we solved the issue by booting some older r28nnnn
memstick, writing a correct date with ntpdate into the RTC and rebooted
without poweroff. It seems that the RTC survives even some short
powercyle.
The CMOS battery is soldered on the motherboard of the Acer C720, i.e.
no chance to be replaced.
The issue must be fixed in FreeBSD, i.e. it should boot even with a
broken RTC. Should I file a PR for this?
Yes, please file a PR.
done.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224813
FYI, I'm working on this, but I discovered yesterday afternoon that
Eric van Gyzen already added code in r314936 to the atrtc driver to
validate the data from the hardware before calling bcd2bin() .  The
code looks correct to me, so why is this error still happening?

I suspected a clang codegen bug, and the generated code does look a bit
suspicious to me (things like ANDing with 0x0e where the C code uses
0x0f), but my x86 asm skills are 25 years out of date.  It's also very
hard asm code to follow, because inlined functions that call other
inlined functions are involved.

I'm on the path of adding some new common routines that all RTC drivers
can use to validate the BCD coming from the hardware without panicking.
 But if I switch the atrtc code to use the new routines, that may
amount to sweeping a clang bug under the rug.

-- Ian
Matthias Apitz
2018-01-02 13:00:20 UTC
Permalink
Okay, I've created a pair of patches for this.  The first adds some
common support routines usable by all RTC drivers with BCD hardware.
 The second one converts the atrtc driver to use those routines.  The
common code was tested using an i2c RTC chip, but I don't have an x86
testbed, so the atrtc patch is currently untested (it compiles).
The patches are available in a pair of phabricator reviews, plus I'll
attach them to this mail.  If the list scrubs the attachements, you can
download the patches from the phab urls below, just hit the Actions
button and look for Download Raw Diff.
https://reviews.freebsd.org/D13730
https://reviews.freebsd.org/D13731
Ian, I've applied your patches by hand to the kernel source r314251 and
added in addition two printf's to atrtc.c to see the entry in
atrtc_gettime() and atrtc_set(). I've compiled the kernel with

# make buildworld -DNO_CLEAN

and copied over the new kernel with

# cp /usr/obj/usr/src/sys/GENERIC/kernel /boot/kernel/kernel

The kernel boots fine as:

FreeBSD c720-r314251 12.0-CURRENT FreeBSD 12.0-CURRENT #2 r314251M: Tue Jan 2 12:53:31 CET 2018 ***@c720-r314251:/usr/obj/usr/src/sys/GENERIC amd64

but: while the current date is read with the correct time, the year is 1970:

# date
Tue Jan 2 1970, 13:45:24 CET

One can set the date/time with the date(1) command or ntpdate.

The debug messages are (the first line is from boot, the others from the
date or ntpdate):

# grep DEBUG /var/log/messages
Jan 2 13:43:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() entry
Jan 2 13:43:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() entry
Jan 2 13:44:00 c720-r314251 kernel: BCD DEBUG: atrtc_set() entry
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Matthias Apitz
2018-01-02 14:37:37 UTC
Permalink
Post by Matthias Apitz
# cp /usr/obj/usr/src/sys/GENERIC/kernel /boot/kernel/kernel
# date
Tue Jan 2 1970, 13:45:24 CET
One can set the date/time with the date(1) command or ntpdate.
The debug messages are (the first line is from boot, the others from the
# grep DEBUG /var/log/messages
Jan 2 13:43:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() entry
Jan 2 13:43:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() entry
Jan 2 13:44:00 c720-r314251 kernel: BCD DEBUG: atrtc_set() entry
I've added one more printf to see what is coming as year from BCD. The code
is attached below and bcd.year comes out as 24 (decimal) which is 0x18. I.e. it
seems that the year from 2018 is stored in hex as 0x18, or?


Jan 2 15:08:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() entry without USE_RTC_CENTURY
Jan 2 15:08:02 c720-r314251 kernel: BCD DEBUG: atrtc_gettime() bct.year=24
Jan 2 15:13:00 c720-r314251 kernel: BCD DEBUG: atrtc_set() entry without USE_RTC_CENTURY
Jan 2 15:14:34 c720-r314251 kernel: BCD DEBUG: atrtc_set() entry without USE_RTC_CENTURY

static int
atrtc_gettime(device_t dev, struct timespec *ts)
{
struct bcd_clocktime bct;
#ifdef USE_RTC_CENTURY
printf("BCD DEBUG: atrtc_gettime() entry with USE_RTC_CENTURY\n");
#else
printf("BCD DEBUG: atrtc_gettime() entry without USE_RTC_CENTURY\n");
#endif

/* Look if we have a RTC present and the time is valid */
if (!(rtcin(RTC_STATUSD) & RTCSD_PWR)) {
device_printf(dev, "WARNING: Battery failure indication\n");
return (EINVAL);
}

/*
* wait for time update to complete
* If RTCSA_TUP is zero, we have at least 244us before next update.
* This is fast enough on most hardware, but a refinement would be
* to make sure that no more than 240us pass after we start reading,
* and try again if so.
*/
while (rtcin(RTC_STATUSA) & RTCSA_TUP)
continue;
critical_enter();
bct.sec = rtcin(RTC_SEC);
bct.min = rtcin(RTC_MIN);
bct.hour = rtcin(RTC_HRS);
bct.day = rtcin(RTC_DAY);
bct.mon = rtcin(RTC_MONTH);
bct.year = rtcin(RTC_YEAR);
#ifdef USE_RTC_CENTURY
bct.year |= rtcin(RTC_CENTURY) << 8;
#endif
critical_exit();
/* dow is unused in timespec conversion and we have no nsec info. */
bct.dow = 0;
bct.nsec = 0;
printf("BCD DEBUG: atrtc_gettime() bct.year=%d\n", bct.year);
return (clock_bcd_to_ts(&bct, ts));
}
--
Matthias Apitz, ✉ ***@unixarea.de, ⌂ http://www.unixarea.de/ 📱 +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Kurt Jaeger
2018-01-02 15:26:31 UTC
Permalink
Hi!
Post by Matthias Apitz
I've added one more printf to see what is coming as year from BCD. The code
is attached below and bcd.year comes out as 24 (decimal) which is 0x18. I.e. it
seems that the year from 2018 is stored in hex as 0x18, or?
It's actually BCD code:

https://en.wikipedia.org/wiki/Binary-coded_decimal
--
***@opsec.eu +49 171 3101372 2 years to go !
Matthias Apitz
2018-01-02 16:41:41 UTC
Permalink
Post by Kurt Jaeger
Hi!
Post by Matthias Apitz
I've added one more printf to see what is coming as year from
BCD. The code
is attached below and bcd.year comes out as 24 (decimal) which is 0x18. I.e. it
seems that the year from 2018 is stored in hex as 0x18, or?
https://en.wikipedia.org/wiki/Binary-coded_decimal
So something must be wrong in the conversion into binary because the date
ends up as in year 1970.
--
Sent from my Ubuntu phone
http://www.unixarea.de/
Rodney W. Grimes
2018-01-02 05:49:57 UTC
Permalink
Post by Ian Lepore
I will let the C720 over night under power while sitting in the boot menu,
maybe this will fix the RTC battery issue.
Last time I worked on RTC stuff, cleaning this up got put on my "to-do
some day" list. ?I think maybe that day has arrived.
-- Ian
For the moment we solved the issue by booting some older r28nnnn
memstick, writing a correct date with ntpdate into the RTC and rebooted
without poweroff. It seems that the RTC survives even some short
powercyle.
The CMOS battery is soldered on the motherboard of the Acer C720, i.e.
no chance to be replaced.
The issue must be fixed in FreeBSD, i.e. it should boot even with a
broken RTC. Should I file a PR for this?
I'm happy to test any patch for this.
matthias
Okay, I've created a pair of patches for this. ?The first adds some
common support routines usable by all RTC drivers with BCD hardware.
?The second one converts the atrtc driver to use those routines. ?The
common code was tested using an i2c RTC chip, but I don't have an x86
testbed, so the atrtc patch is currently untested (it compiles).
Would the rtc.c emulation in bhyve work for testing?
usr.sbin/bhyve/rtc.c
The patches are available in a pair of phabricator reviews, plus I'll
attach them to this mail. ?If the list scrubs the attachements, you can
download the patches from the phab urls below, just hit the Actions
button and look for Download Raw Diff.
https://reviews.freebsd.org/D13730
https://reviews.freebsd.org/D13731
-- Ian
--
Rod Grimes ***@freebsd.org
Loading...