Discussion:
[Request for review] Profiling the FreeBSD kernel boot
(too old to reply)
Colin Percival
2017-12-22 09:44:46 UTC
Permalink
Hi everyone,

For the past few months I've been working on code for profiling the FreeBSD
"kernel boot", i.e., everything between when kernel code starts running and
when we first enter userland as init(8). This is not trivial since it's
impossible to use tools like dtrace to monitor things prior to when said
tools are running. The goal of this exercise is to help me track down the
places where we're wasting time during the boot, and then to fix them.

The approach I've taken is to add some macros -- most notably TSENTER() and
TSEXIT() -- which by default compile to nothing, but if the TSLOG kernel
option is enabled they compile to code which logs the cycle count (e.g., on
x86 the value from the RDTSC instruction) along with some other data (in the
case of TSENTER and TSEXIT, the fact that we're entering/exiting a function).
This can then be dumped via a sysctl (debug.tslog) and processed in userland
to convert function entries/exits into stacks and to visualize the time spent
in the boot process.

Two examples:

A flame chart of my laptop booting HEAD:
http://www.daemonology.net/timestamping/tslog-laptop.svg

A flame chart of an EC2 c5.4xlarge instance booting 11.1-RELEASE:
http://www.daemonology.net/timestamping/tslog-c5.4xlarge.svg

The patches (10 of them, to be applied in order), userland scripts, and very
brief usage instructions are at:
http://www.daemonology.net/timestamping/tslog.tgz

I hope to commit the patches in the next week, since I'm planning on writing
a paper to submit to AsiaBSDCon (which has a deadline of December 31st); so
if anyone has interest/time to look at this in the near future (I mean, it's
not like anyone is going to be busy this weekend, right?) I'd love to have
some feedback before it goes into the tree.
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
Hans Petter Selasky
2017-12-22 13:48:26 UTC
Permalink
Post by Colin Percival
track down the
places where we're wasting time during the boot, and then to fix them.
Hi,

The USB stack will try to enumerate all USB controllers simultaneously.

DELAY() is frequently a problem having to wait for chips to reset during
enumeration as you probably know. Getting timers enumerated early on
allows to sleep instead of spin, also saving CPU usage during boot.

I'm not sure if switching on too many PCI devices at the same time can
cause any power issues. At least USB HUBs have an own delay per port it
needs to wait until power is turned on.

--HPS
blubee blubeeme
2017-12-22 14:05:11 UTC
Permalink
Post by Colin Percival
Hi everyone,
For the past few months I've been working on code for profiling the FreeBSD
"kernel boot", i.e., everything between when kernel code starts running and
when we first enter userland as init(8). This is not trivial since it's
impossible to use tools like dtrace to monitor things prior to when said
tools are running. The goal of this exercise is to help me track down the
places where we're wasting time during the boot, and then to fix them.
The approach I've taken is to add some macros -- most notably TSENTER() and
TSEXIT() -- which by default compile to nothing, but if the TSLOG kernel
option is enabled they compile to code which logs the cycle count (e.g., on
x86 the value from the RDTSC instruction) along with some other data (in the
case of TSENTER and TSEXIT, the fact that we're entering/exiting a function).
This can then be dumped via a sysctl (debug.tslog) and processed in userland
to convert function entries/exits into stacks and to visualize the time spent
in the boot process.
http://www.daemonology.net/timestamping/tslog-laptop.svg
http://www.daemonology.net/timestamping/tslog-c5.4xlarge.svg
The patches (10 of them, to be applied in order), userland scripts, and very
http://www.daemonology.net/timestamping/tslog.tgz
I hope to commit the patches in the next week, since I'm planning on writing
a paper to submit to AsiaBSDCon (which has a deadline of December 31st); so
if anyone has interest/time to look at this in the near future (I mean, it's
not like anyone is going to be busy this weekend, right?) I'd love to have
some feedback before it goes into the tree.
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
I've gotta say nice work with the tracing.
Hopefully your talk gets accepted, i'd love to hear it.
Mark Johnston
2017-12-22 17:08:18 UTC
Permalink
Post by Colin Percival
Hi everyone,
For the past few months I've been working on code for profiling the FreeBSD
"kernel boot", i.e., everything between when kernel code starts running and
when we first enter userland as init(8). This is not trivial since it's
impossible to use tools like dtrace to monitor things prior to when said
tools are running.
In the case of DTrace, this isn't quite true. We support so-called
boot-time DTrace on x86. The caveat is that we can only start tracing
after the SI_SUB_DTRACE_ANON sysinit has been executed. That sysinit
can't come earlier than SI_SUB_SMP, since it needs to be able to measure
TSC skew between CPUs in order to initialize DTrace's high-resolution
timer.

I don't think boot-time DTrace is quite what you want for this exercise,
but it does come in handy sometimes.

In case it's of interest: to use boot-time DTrace, invoke dtrace(1) as
you normally would and add -A. Rather than starting to trace, dtrace(1)
will save a representation of the D script to a file which gets read by
the loader during the next boot. The results of the trace can be fetched
with "dtrace -a". For instance, to print the amount of time elapsed in
microseconds during each vprintf() call, along with a stack:

# dtrace -A -n 'fbt::vprintf:entry {self->ts = timestamp;}
fbt::vprintf:return /self->ts/
{
printf("%d", (timestamp - self->ts) / 1000);
self->ts = 0;
stack();
}'
...
<reboot>
# dtrace -a -e
CPU ID FUNCTION:NAME
0 17161 vprintf:return 419
kernel`printf+0x43
kernel`randomdev_modevent+0x26
kernel`module_register_init+0xc0
kernel`mi_startup+0x9c
kernel`0xffffffff8028656c
...
Colin Percival
2017-12-23 03:38:57 UTC
Permalink
Post by Mark Johnston
Post by Colin Percival
For the past few months I've been working on code for profiling the FreeBSD
"kernel boot", i.e., everything between when kernel code starts running and
when we first enter userland as init(8). This is not trivial since it's
impossible to use tools like dtrace to monitor things prior to when said
tools are running.
In the case of DTrace, this isn't quite true. We support so-called
boot-time DTrace on x86. The caveat is that we can only start tracing
after the SI_SUB_DTRACE_ANON sysinit has been executed. That sysinit
can't come earlier than SI_SUB_SMP, since it needs to be able to measure
TSC skew between CPUs in order to initialize DTrace's high-resolution
timer.
Right. Also, even aside from details like measuring the TSC skew between
CPUs, DTrace needs things like traps, memory allocation, and mutexes, none
of which exist when we enter hammer_time (or any of the other MD startup
code).

What I meant is that it's impossible to use DTrace to monitor things which
happened prior to when the DTrace *kernel bits* are initialized.
Post by Mark Johnston
I don't think boot-time DTrace is quite what you want for this exercise,
but it does come in handy sometimes.
Absolutely. And for a long time I considered trying to splice together
a basic profiling mechanism for pre-DTrace-initialization with using DTrace
from when it's ready onwards... but I decided that it would be easier at
least to start with to simply use a single mechanism throughout.
Post by Mark Johnston
In case it's of interest: to use boot-time DTrace, invoke dtrace(1) as
you normally would and add -A. Rather than starting to trace, dtrace(1)
will save a representation of the D script to a file which gets read by
the loader during the next boot. The results of the trace can be fetched
with "dtrace -a". For instance, to print the amount of time elapsed in
microseconds during each vprintf() call, along with a stack: [...]
Thanks for the example! I think it's very likely that I'll make use of
boot-time DTrace for tracking down some of the performance warts I've found
-- the ones which happen after DTrace is initialized, that is.
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
Loading...