Post by Toomas SoomePost by Stefan EsserPost by Stefan EsserHmmm, the code references point into the boot loader code - I had
expected that there is a problem in the kernel, not the boot loader.
Post by Kyle Evans[1]
https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
<https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56>
Post by Stefan EsserSeems that setbase has either not been called or has been called with base=0.
Right, which is odd...
Post by Stefan EsserPost by Kyle Evans[2]
https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
<https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688>
Post by Stefan EsserI had thought, that the zfs boot code has been initialized before the
menu is displayed?
Right, all of this should be done looooong before we get to the
interpreter. Can you break into the loader prompt and try the `heap`
things.
Totally weird. I'd add a printf to the sethead() function to display its args
and see if you get this panic before/after that printf...
I'm currently using a Forth-enabled boot loader again, since this is a
"production" machine (my home server, which also receives and keeps all
my work email, for example).
I'll build a clean world with the LUA loader and test it on one of the
next days. Tests will include the "heap" loader command and I'll add the
printf (though, if sbrk() has really not been called, I guess that will
not go too well ...).
Is it possible, that the setheap function is called a second time, just
before jumping into the kernel? (In that case adding the printf might
crash the loader in the first setheap call ...)
Since the loader menu (and escaping from the menu) works, there must be
a valid heap, at that time.
indeed. and assuming the message really is from loader, it means, there must
be memory corruption - if so, you can check which variables are located
close to heap related ones… Also, since you have the working menu, it has to
be related to actual loading. Since the loading itself has been working so
far, it should be related to lua specific bits which are preparing towards
to call load functions.
Ok, some more data points:
1) A printf in setheap reported plausible values during start-up of zfsboot.
The menu appeared and wiped away the values so fast that I could not take
a photo or write them down.
2) I have rebuilt world and kernel based on r331763. Booting resulted in the
same panic as reported before. There was no debug output from the patched
setheap call before the panic (which indicates that it was not called a
second time).
3) In order to get my system to boot, I interrupted loading of zfsloader and
forced loading of the previous version (from a world build with Forth in
the loader). Booting succeeded with the latest kernel ...
It looks as if sbrk() was called in zfsloader before setheap() has been used
to initialize the heap parameters, if lua is enabled instead if Forth. See
stand/i386/loader/main.c:124 for the location of the setheap call in the
loader.
This is obviously hard to debug, though, since printf cannot be called at that
point. A pure write(2) should be possible without heap, but since the console
has not been initialized at the point of the setheap invocation, there is no
working output device, AFAIK.
I do not see, how any sbrk() call could occur before setheap is called. And
there does not appear to be any other setheap function (or macro) in the
tree, that could overload the one defined in stand/libsa/sbrk.c ...
I have no idea how to proceed from here ...
But now I'm sure it is a problem in zfsloader (or loader in general?).
Hmmm: How is the panic message printed by sbrk() without a initialized heap?
The definition of panic in stand/libsa/panic.c relies on a working printf!
I should be able to use printf in the same way as panic does, but I did
not succeed when I tried to use it early in zfsloader ...
Regards, STefan