host, bhyve vm and ntpd

Post by Boris Samorodov
Hi All,
---
bhyve-host% uname -a
FreeBSD sm.bsnet 12.0-CURRENT FreeBSD 12.0-CURRENT #3 r322868: Fri Aug
25 05:25:26 MSK 2017
---
---
bhyve-vm: uname -a
FreeBSD builder.bsnet 12.0-CURRENT FreeBSD 12.0-CURRENT #58 r324782: Fri
Oct 20 05:12:17 MSK 2017
---
The only difference at kernel configs is a colored console. :-)
And here I get some weird (is it?) result at the VM (I expect ntpd to be
---
bhyve-vm% for t in `jot 10`; do ntpq -pn; sleep 64; done
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
XX.XX.XX.1      XX.XX.XX.245     4 u    9   64    3    0.605   -1.202
316.407
XX.XX.XX.1      XX.XX.XX.245     4 u    7   64    7    0.605   -1.202
358.395
*XX.XX.XX.1      XX.XX.XX.245     4 u    5   64   17    0.615  -328.42
181.405
*XX.XX.XX.1      XX.XX.XX.245     4 u    3   64   37    0.615  -328.42
214.868
*XX.XX.XX.1      XX.XX.XX.245     4 u   67   64   37    0.615  -328.42
214.868
*XX.XX.XX.1      XX.XX.XX.245     4 u   63   64   77    0.615  -328.42
268.618
*XX.XX.XX.1      XX.XX.XX.245     4 u   60   64  177    0.615  -328.42
333.175
XX.XX.XX.1      .STEP.          16 u 1910   64    0    0.000    0.000
0.000
XX.XX.XX.1      XX.XX.XX.245     4 u   27   64    1    0.703  -262.63
0.004
XX.XX.XX.1      XX.XX.XX.245     4 u   31   64    1    0.649  -331.43
68.800
---
---
bhyve-host% for t in `jot 10`; do ntpq -pn; sleep 64; done
     remote           refid      st t when poll reach   delay   offset
jitter
==============================================================================
*XX.XX.XX.1      XX.XX.XX.245     4 u    1   64    1    0.401    0.176
0.106
*XX.XX.XX.1      XX.XX.XX.245     4 u    6   64    3    0.401    0.176
0.459
*XX.XX.XX.1      XX.XX.XX.245     4 u    3   64    7    0.401    0.176
0.940
*XX.XX.XX.1      XX.XX.XX.245     4 u   67   64    7    0.401    0.176
0.940
*XX.XX.XX.1      XX.XX.XX.245     4 u   64   64   17    0.401    0.176
1.566
*XX.XX.XX.1      XX.XX.XX.245     4 u   60   64   37    0.448    1.275
1.739
*XX.XX.XX.1      XX.XX.XX.245     4 u   55   64   77    0.448    1.275
2.365
*XX.XX.XX.1      XX.XX.XX.245     4 u   53   64  177    0.448    1.275
3.110
*XX.XX.XX.1      XX.XX.XX.245     4 u   50   64  377    0.448    1.275
3.929
*XX.XX.XX.1      XX.XX.XX.245     4 u   45   64  377    0.443    8.750
4.722
---
The network is organized via bridge -- host igb and vm tap interfaces
are members of one bridge.
Are those results expected? Does it smell like a bug? Should I dig
furter?

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps). That is usually a sign
that the chosen timecounter is running at a different frequency than it
claimed to be when it registered itself -- the host may not be
emulating the timer hardware properly in the guest. What is the output
of sysctl kern.timecounter in the vm?

Also, what is the output of ntptime(8) in the vm?

-- Ian

Boris Samorodov

2017-10-20 15:31:46 UTC

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

Post by Ian Lepore
That is usually a sign
that the chosen timecounter is running at a different frequency than it
claimed to be when it registered itself -- the host may not be
emulating the timer hardware properly in the guest. What is the output
of sysctl kern.timecounter in the vm?

---
bhyve-vm% sysctl kern.timecounter

kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 1
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 4161213491
kern.timecounter.tc.ACPI-fast.mask: 4294967295
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 3518036865
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47597
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1199886114
kern.timecounter.tc.TSC-low.counter: 1274338278
kern.timecounter.tc.TSC-low.mask: 4294967295
---

Post by Ian Lepore
Also, what is the output of ntptime(8) in the vm?

---
bhyve-vm% ntptime

ntp_gettime() returns code 0 (OK)
time dd94930f.20ea2900 Fri, Oct 20 2017 18:21:51.128, (.128573699),
maximum error 1309110 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1309110 us, estimated error 3 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 496 ppm,
---

Ian, thank you for your help!

--
WBR, Boris Samorodov (bsam)
FreeBSD Committer, http://www.FreeBSD.org The Power To Serve

Boris Samorodov

2017-10-20 15:36:24 UTC

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

---
bhyve-vm% sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 1
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 4161213491
kern.timecounter.tc.ACPI-fast.mask: 4294967295
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 3518036865
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47597
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1199886114
kern.timecounter.tc.TSC-low.counter: 1274338278
kern.timecounter.tc.TSC-low.mask: 4294967295
---

BTW, here is the host kern.timecounter:
---
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000)
dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 9047194
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 2232552795
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 43410
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1200067168
kern.timecounter.tc.TSC-low.counter: 2463146362
kern.timecounter.tc.TSC-low.mask: 4294967295
---

Post by Ian Lepore
Also, what is the output of ntptime(8) in the vm?

---
bhyve-vm% ntptime
ntp_gettime() returns code 0 (OK)
time dd94930f.20ea2900 Fri, Oct 20 2017 18:21:51.128, (.128573699),
maximum error 1309110 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1309110 us, estimated error 3 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 496 ppm,
---
Ian, thank you for your help!

--
WBR, Boris Samorodov (bsam)
FreeBSD Committer, http://www.FreeBSD.org The Power To Serve

Boris Samorodov

2017-10-20 17:20:06 UTC

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

---
bhyve-vm% sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 1
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 4161213491
kern.timecounter.tc.ACPI-fast.mask: 4294967295
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 3518036865
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47597
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1199886114
kern.timecounter.tc.TSC-low.counter: 1274338278
kern.timecounter.tc.TSC-low.mask: 4294967295
---

---
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000)
dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 9047194
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 2232552795
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 43410
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1200067168
kern.timecounter.tc.TSC-low.counter: 2463146362
kern.timecounter.tc.TSC-low.mask: 4294967295
---

Post by Ian Lepore
Also, what is the output of ntptime(8) in the vm?

---
bhyve-vm% ntptime
ntp_gettime() returns code 0 (OK)
time dd94930f.20ea2900 Fri, Oct 20 2017 18:21:51.128, (.128573699),
maximum error 1309110 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1309110 us, estimated error 3 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 496 ppm,
---
Ian, thank you for your help!

It seems odd to me that the frequency of the host HPET is 14.3mhz and
that of the guest is 10.0mhz, but maybe that's a normal condition for
bhyve. I did find some google hits[1] for bhyve guest timekeeping
trouble with the HPET timer which was solved by switching to a
different timecounter. Timecounter choices can't be controlled from
loader.conf, so I guess a sysctl.conf entry of
kern.timecounter.hardware="ACPI-fast" is the only way to fix that. You
can also just do that command interactively first and see if it stops
the time steps and ntp settles down.

The process seems to become more monotonic. But steps nevertheless:
---
*XX.1 XX.245 4 u 20 64 1 0.717 -12.771 4.193
*XX.1 XX.245 4 u 28 64 3 0.751 -41.970 32.342
*XX.1 XX.245 4 u 23 64 7 0.748 -59.505 46.624
*XX.1 XX.245 4 u 18 64 17 0.699 -75.164 56.482
*XX.1 XX.245 4 u 14 64 37 0.669 -90.112 63.767
*XX.1 XX.245 4 u 11 64 77 0.605 -10.567 60.914
*XX.1 XX.245 4 u 7 64 177 0.591 -169.54 116.762
*XX.1 XX.245 4 u 3 64 377 0.591 -169.54 102.107
*XX.1 XX.245 4 u 68 64 377 0.591 -169.54 102.107
*XX.1 XX.245 4 u 63 64 377 0.591 -169.54 88.424
*XX.1 XX.245 4 u 58 64 377 0.591 -169.54 92.949
*XX.1 XX.245 4 u 55 64 377 0.591 -169.54 111.512
*XX.1 XX.245 4 u 50 64 377 0.591 -169.54 140.827
*XX.1 XX.245 4 u 45 64 377 0.591 -169.54 177.360
*XX.1 XX.245 4 u 43 64 377 0.591 -169.54 219.057
XX.1 .STEP. 16 u 588 64 0 0.000 0.000 0.000
---

This would be a workaround, not a fix per se. If the time steps go
away, then something in bhyve's emulation of HPET (maybe only on some
hardware?) must be buggy.
[1] https://lists.freebsd.org/pipermail/freebsd-virtualization/2015-April/003492.html

Also thanks for the link. Unfortunately the problem seems to persist.

--
WBR, Boris Samorodov (bsam)
FreeBSD Committer, http://www.FreeBSD.org The Power To Serve

Ian Lepore

2017-10-20 18:04:27 UTC

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

---
bhyve-vm% sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 1
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 4161213491
kern.timecounter.tc.ACPI-fast.mask: 4294967295
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 3518036865
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47597
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1199886114
kern.timecounter.tc.TSC-low.counter: 1274338278
kern.timecounter.tc.TSC-low.mask: 4294967295
---

Post by Ian Lepore
Also, what is the output of ntptime(8) in the vm?

---
bhyve-vm% ntptime
ntp_gettime() returns code 0 (OK)
time dd94930f.20ea2900 Fri, Oct 20 2017 18:21:51.128, (.128573699),
maximum error 1309110 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1309110 us, estimated error 3 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 496 ppm,
---
Ian, thank you for your help!

---
*XX.1      XX.245     4 u   20   64    1    0.717  -12.771   4.193
*XX.1      XX.245     4 u   28   64    3    0.751  -41.970  32.342
*XX.1      XX.245     4 u   23   64    7    0.748  -59.505  46.624
*XX.1      XX.245     4 u   18   64   17    0.699  -75.164  56.482
*XX.1      XX.245     4 u   14   64   37    0.669  -90.112  63.767
*XX.1      XX.245     4 u   11   64   77    0.605  -10.567  60.914
*XX.1      XX.245     4 u    7   64  177    0.591  -169.54 116.762
*XX.1      XX.245     4 u    3   64  377    0.591  -169.54 102.107
*XX.1      XX.245     4 u   68   64  377    0.591  -169.54 102.107
*XX.1      XX.245     4 u   63   64  377    0.591  -169.54  88.424
*XX.1      XX.245     4 u   58   64  377    0.591  -169.54  92.949
*XX.1      XX.245     4 u   55   64  377    0.591  -169.54 111.512
*XX.1      XX.245     4 u   50   64  377    0.591  -169.54 140.827
*XX.1      XX.245     4 u   45   64  377    0.591  -169.54 177.360
*XX.1      XX.245     4 u   43   64  377    0.591  -169.54 219.057
XX.1      .STEP.    16 u  588   64    0    0.000    0.000   0.000
---

Also thanks for the link. Unfortunately the problem seems to persist.

Hmm, that almost looks normal... it noticed the clock was drifting
fast, so it went into a frequency-training cycle (at the point where
the offset stopped changing at -169.54) and then once it had figured
out the frequency it did a step to get realigned, and (presumably)
adjusted the frequency of the kernel clock. The output of ntptime
before and after the step would show that... before the step the
frequency would show as zero (which was the case in your ntptime output
earlier), and after the step it would be non-zero. No more steps would
occur after the first one, if that's what is happening here.

-- Ian

Boris Samorodov

2017-10-20 18:15:44 UTC

So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

---
bhyve-vm% sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 1
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 4161213491
kern.timecounter.tc.ACPI-fast.mask: 4294967295
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 3518036865
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 47597
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.TSC-low.quality: -100
kern.timecounter.tc.TSC-low.frequency: 1199886114
kern.timecounter.tc.TSC-low.counter: 1274338278
kern.timecounter.tc.TSC-low.mask: 4294967295
---

Post by Ian Lepore
Also, what is the output of ntptime(8) in the vm?

---
bhyve-vm% ntptime
ntp_gettime() returns code 0 (OK)
time dd94930f.20ea2900 Fri, Oct 20 2017 18:21:51.128, (.128573699),
maximum error 1309110 us, estimated error 3 us, TAI offset 37
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 0.000 ppm, interval 1 s,
maximum error 1309110 us, estimated error 3 us,
status 0x2001 (PLL,NANO),
time constant 6, precision 0.001 us, tolerance 496 ppm,
---
Ian, thank you for your help!

Also thanks for the link. Unfortunately the problem seems to persist.

Those were following steps:
---
XX.1 XX.245 4 u 15 64 1 0.597 -51.802 7.547
XX.1 XX.245 4 u 17 64 1 0.597 -51.802 29.526
XX.1 XX.245 4 u 19 64 1 0.597 -51.802 52.760
XX.1 XX.245 4 u 19 64 3 0.597 -51.802 77.354
XX.1 XX.245 4 u 16 64 7 0.597 -51.802 103.182
XX.1 XX.245 4 u 12 64 17 0.597 -51.802 138.976
*XX.1 XX.245 4 u 9 64 37 0.700 -200.07 109.275
XX.1 .STEP. 16 u 1100 64 0 0.000 0.000 0.002
XX.1 XX.245 4 u 63 64 1 0.645 -3.620 0.002
XX.1 XX.245 4 u 34 64 1 0.645 -3.620 66.907
XX.1 XX.245 4 u 36 64 1 0.643 -115.30 85.082---

No stability. :-) But I see your point. I'll leave it till morning
with hope it may stabilize.

--
WBR, Boris Samorodov (bsam)
FreeBSD Committer, http://www.FreeBSD.org The Power To Serve

Boris Samorodov

2017-10-22 08:28:20 UTC

Beyond that, I'm not sure what else to try. It might be necessary to
get some bhyve developers involved (I know almost nothing about it).

NTPD behaves more normally on uniprocessor VMs.
A FreeBSD bhyve-guest running on a freebsd host will select a
different timecounter depending on whether it is a multiprocessor or a
uniprocessor. My uniprocessor bhyve-vm selected TSC-low as the best
timecounter in a uniprocessor. NTP functions there as expected.
kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000)
kern.timecounter.hardware: TSC-low
The very same VM, when given two total CPUs, selected HPET (if I
recall) and the timekeeping with NTPD was unreliable, with many
step-resets to the clock.

Yep, the same here. I've switched to TSC-low at Bhyve guest and there
is no stepping per 24 hours.

--
WBR, bsam

Boris Samorodov

2017-10-22 08:31:22 UTC

Beyond that, I'm not sure what else to try. It might be necessary to
get some bhyve developers involved (I know almost nothing about it).

Hmm, I just had glance at the code in sys/amd64/vmm/io/vhpet.c and it
looks right. I wonder if this is just a simple roundoff error in
converting between 10.0MHz and SBT units? If so, that could be wished
away easily by using a power-of-2 frequency for the virtual HPET. I
wonder if the attached patch is all that's needed?

I've tried the patch (at bhyve guest) and nothing has changed. Should
the patched system be tested at bhyve guest or bhyve host?

--
WBR, bsam

Boris Samorodov

2017-10-22 16:16:41 UTC

Beyond that, I'm not sure what else to try. It might be necessary to
get some bhyve developers involved (I know almost nothing about it).

I've tried the patch (at bhyve guest) and nothing has changed. Should
the patched system be tested at bhyve guest or bhyve host?

Oh, I'm sorry, I should have mentioned that's for the host side.

NP, that's OK.

However, the host is busy now, and I'll have an opportunity to test host
only tomorrow evening.

Ian, thank you for your help!

--
WBR, bsam

Boris Samorodov

2017-10-24 21:43:56 UTC

Hi Ian, All!

Beyond that, I'm not sure what else to try. It might be necessary to
get some bhyve developers involved (I know almost nothing about it).

I suppose the answer is "yes", the patch helped. Here are two samples
(host for bhyve VM without your patch and after patching):
---
https://poudriere.passap.ru/misc/ntpd.jot.log-HPET.frequency.10000000.txt
https://poudriere.passap.ru/misc/ntpd.jot.log-HPET.frequency.16777216.txt
---

The command was:
% for t in `jot 1000`; do ntpq -pn; sleep 64; done

The patch made the system to stabilize the process.
Ian, thank you!

--
WBR, bsam

Boris Samorodov

2017-10-20 15:41:30 UTC

Post by Ian Lepore
So it is repeatedly stepping the clock in the VM? (Set
kern.timecounter.stepwarnings=1 to log steps).

No kernel/ntpd messages for 20 minutes after setting this sysctl.

Sorry for multiply answers. The kernel message has just arrived:
---
Oct 20 18:31:25 builder kernel: Time stepped from 1508513486.200998799
to 1508513485.317299999 (1508513485.316858000)
---

So, you are right, the time is stepped.

--
WBR, Boris Samorodov (bsam)
FreeBSD Committer, http://www.FreeBSD.org The Power To Serve

Boris Samorodov

2017-10-22 15:38:44 UTC

[ Charset UTF-8 unsupported, converting... ]

Beyond that, I'm not sure what else to try. ?It might be necessary to
get some bhyve developers involved (I know almost nothing about it).

NTPD behaves more normally on uniprocessor VMs.
A FreeBSD bhyve-guest running on a freebsd host will select a
different timecounter depending on whether it is a multiprocessor or a
uniprocessor.??My uniprocessor bhyve-vm selected TSC-low as the best
timecounter in a uniprocessor.??NTP functions there as expected.
kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000)
kern.timecounter.hardware: TSC-low
The very same VM, when given two total CPUs, selected HPET (if I
recall) and the timekeeping with NTPD was unreliable, with many
step-resets to the clock.

Hmm, I just had glance at the code in?sys/amd64/vmm/io/vhpet.c and it
looks right. ?I wonder if this is just a simple roundoff error in
converting between 10.0MHz and SBT units? ?If so, that could be wished
away easily by using a power-of-2 frequency for the virtual HPET. ?I
wonder if the attached patch is all that's needed?

I've tried the patch (at bhyve guest) and nothing has changed. Should
the patched system be tested at bhyve guest or bhyve host?

I believe the suggested patch would have to be made to the bhyve
host

OK, I'd do it tomorrow and report back.

. Also on the host and guest what are the values of
sysctl kern.timecounter.tc.HPET
sysctl kern.timecounter.tc.i8254

Here they are:
---
bhyve-host% sysctl kern.timecounter.tc.HPET
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 2138094157
kern.timecounter.tc.HPET.mask: 4294967295

bhyve-host% sysctl kern.timecounter.tc.i8254
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 54883
kern.timecounter.tc.i8254.mask: 65535
---
bhyve-guest% sysctl kern.timecounter.tc.HPET
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 10000000
kern.timecounter.tc.HPET.counter: 969429421
kern.timecounter.tc.HPET.mask: 4294967295

bhyve-guest% sysctl kern.timecounter.tc.i8254
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 39893
kern.timecounter.tc.i8254.mask: 65535
---

Getting good ntpd behavior in a VM guest of any kind is sometimes a
non trivial thing to do.

As a side note, I have a CentOS-7 bhyve VM at the same host.
And it was enough to run chronyd with default config. Which stepped
twice and is stable (no messages) for several days, current log:
---
Oct 19 16:01:03 c.vpn systemd[1]: Starting NTP client/server...
Oct 19 16:01:03 c.vpn chronyd[27043]: chronyd version 3.1 starting
(+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND
+ASYNCDNS +IPV6 +DEBUG)
Oct 19 16:01:03 c.vpn chronyd[27043]: Frequency 0.000 +/- 1000000.000
ppm read from /var/lib/chrony/drift
Oct 19 16:01:03 c.vpn systemd[1]: Started NTP client/server.
Oct 19 16:01:07 c.vpn chronyd[27043]: Selected source XX.XX.XX.1
Oct 19 16:01:07 c.vpn chronyd[27043]: System clock wrong by -44.392782
seconds, adjustment started
Oct 19 16:00:23 c.vpn chronyd[27043]: System clock was stepped by
-44.392782 seconds
Oct 19 16:00:34 c.vpn chronyd[27043]: System clock was stepped by
0.000001 seconds
---

--
WBR, bsam

Rodney W. Grimes

2017-10-22 15:22:16 UTC

[ Charset UTF-8 unsupported, converting... ]