Discussion:
i386 hangs during halt "vnodes remaining... 0 time out"
(too old to reply)
Rick Macklem
2018-04-21 21:09:09 UTC
Permalink
With a recent head/current kernel (doesn't happen when running a Dec. 2017 one),
when I do a halt, it gets as far as:

vnodes remaining... 0 time out

and that's it (the time out appears several seconds after the first "0").
With a Dec. 2017 kernel there would be several "0"s printed.
It appears that it is stuck in the first iteration of the sched_sync() loop after
it is no longer in SYNCER_RUNNING state.

Any ideas? rick
Tijl Coosemans
2018-04-21 21:49:34 UTC
Permalink
On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
> With a recent head/current kernel (doesn't happen when running a Dec.
> 2017 one), when I do a halt, it gets as far as:
>
> vnodes remaining... 0 time out
>
> and that's it (the time out appears several seconds after the first "0").
> With a Dec. 2017 kernel there would be several "0"s printed.
> It appears that it is stuck in the first iteration of the sched_sync()
> loop after it is no longer in SYNCER_RUNNING state.
>
> Any ideas? rick

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
I have a patch (attached) but haven't been able to test it yet.
Konstantin Belousov
2018-04-22 12:05:21 UTC
Permalink
On Sat, Apr 21, 2018 at 11:49:34PM +0200, Tijl Coosemans wrote:
> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
> > With a recent head/current kernel (doesn't happen when running a Dec.
> > 2017 one), when I do a halt, it gets as far as:
> >
> > vnodes remaining... 0 time out
> >
> > and that's it (the time out appears several seconds after the first "0").
> > With a Dec. 2017 kernel there would be several "0"s printed.
> > It appears that it is stuck in the first iteration of the sched_sync()
> > loop after it is no longer in SYNCER_RUNNING state.
> >
> > Any ideas? rick
>
> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
> I have a patch (attached) but haven't been able to test it yet.

> Index: sys/kern/vfs_bio.c
> ===================================================================
> --- sys/kern/vfs_bio.c (revision 332165)
> +++ sys/kern/vfs_bio.c (working copy)
> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
> {
> struct bufdomain *bd;
>
> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> + SHUTDOWN_PRI_LAST);
> +
> bd = arg;
> for (;;) {
> - kproc_suspend_check(curproc);
> + kthread_suspend_check();
>
> /*
> * Free buffers from the clean queue until we meet our
> @@ -3357,7 +3360,7 @@ buf_daemon()
> /*
> * This process needs to be suspended prior to shutdown sync.
> */
> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> SHUTDOWN_PRI_LAST);
>
> /*
> @@ -3381,7 +3384,7 @@ buf_daemon()
> bd_request = 0;
> mtx_unlock(&bdlock);
>
> - kproc_suspend_check(bufdaemonproc);
> + kthread_suspend_check();
>
> /*
> * Save speedupreq for this pass and reset to capture new
This looks fine.
Rick Macklem
2018-04-22 13:10:27 UTC
Permalink
Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 11:49:34PM +0200, Tijl Coosemans wrote:
>> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
>> > With a recent head/current kernel (doesn't happen when running a Dec.
>> > 2017 one), when I do a halt, it gets as far as:
>> >
>> > vnodes remaining... 0 time out
>> >
>> > and that's it (the time out appears several seconds after the first "0").
>> > With a Dec. 2017 kernel there would be several "0"s printed.
>> > It appears that it is stuck in the first iteration of the sched_sync()
>> > loop after it is no longer in SYNCER_RUNNING state.
>> >
>> > Any ideas? rick
>>
>> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
>> I have a patch (attached) but haven't been able to test it yet.
>
>> Index: sys/kern/vfs_bio.c
>> >===================================================================
>> --- sys/kern/vfs_bio.c (revision 332165)
>> +++ sys/kern/vfs_bio.c (working copy)
>> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
>> {
>> struct bufdomain *bd;
>>
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>> + SHUTDOWN_PRI_LAST);
>> +
>> bd = arg;
>> for (;;) {
>> - kproc_suspend_check(curproc);
>> + kthread_suspend_check();
>>
>> /*
>> * Free buffers from the clean queue until we meet our
>> @@ -3357,7 +3360,7 @@ buf_daemon()
>> /*
>> * This process needs to be suspended prior to shutdown sync.
>> */
>> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>> SHUTDOWN_PRI_LAST);
>>
>> /*
>> @@ -3381,7 +3384,7 @@ buf_daemon()
>> bd_request = 0;
>> mtx_unlock(&bdlock);
>>
>> - kproc_suspend_check(bufdaemonproc);
>> + kthread_suspend_check();
>>
>> /*
>> * Save speedupreq for this pass and reset to capture new
>This looks fine.
For some reason, this thread became two threads, so I'll reply to this one as well.

The patch seems to work fine for me.

Thanks, rick
Tijl Coosemans
2018-04-22 13:15:00 UTC
Permalink
On Sun, 22 Apr 2018 15:05:21 +0300 Konstantin Belousov <***@gmail.com> wrote:
> On Sat, Apr 21, 2018 at 11:49:34PM +0200, Tijl Coosemans wrote:
>> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
>>> With a recent head/current kernel (doesn't happen when running a Dec.
>>> 2017 one), when I do a halt, it gets as far as:
>>>
>>> vnodes remaining... 0 time out
>>>
>>> and that's it (the time out appears several seconds after the first "0").
>>> With a Dec. 2017 kernel there would be several "0"s printed.
>>> It appears that it is stuck in the first iteration of the sched_sync()
>>> loop after it is no longer in SYNCER_RUNNING state.
>>>
>>> Any ideas? rick
>>
>> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
>> I have a patch (attached) but haven't been able to test it yet.
>>
>> Index: sys/kern/vfs_bio.c
>> ===================================================================
>> --- sys/kern/vfs_bio.c (revision 332165)
>> +++ sys/kern/vfs_bio.c (working copy)
>> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
>> {
>> struct bufdomain *bd;
>>
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>> + SHUTDOWN_PRI_LAST);
>> +
>> bd = arg;
>> for (;;) {
>> - kproc_suspend_check(curproc);
>> + kthread_suspend_check();
>>
>> /*
>> * Free buffers from the clean queue until we meet our
>> @@ -3357,7 +3360,7 @@ buf_daemon()
>> /*
>> * This process needs to be suspended prior to shutdown sync.
>> */
>> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
>> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
>> SHUTDOWN_PRI_LAST);
>>
>> /*
>> @@ -3381,7 +3384,7 @@ buf_daemon()
>> bd_request = 0;
>> mtx_unlock(&bdlock);
>>
>> - kproc_suspend_check(bufdaemonproc);
>> + kthread_suspend_check();
>>
>> /*
>> * Save speedupreq for this pass and reset to capture new
> This looks fine.

Thanks for the review. There's just one concern I have. With this patch
the bufspace_daemon threads appear to shutdown after the buf_daemon and
after the syncer because the event handlers are registered later. Are
there any dependencies between these processes that require the bufspace
threads to be stopped earlier?
Konstantin Belousov
2018-04-22 13:28:55 UTC
Permalink
On Sun, Apr 22, 2018 at 03:15:00PM +0200, Tijl Coosemans wrote:
> On Sun, 22 Apr 2018 15:05:21 +0300 Konstantin Belousov <***@gmail.com> wrote:
> > On Sat, Apr 21, 2018 at 11:49:34PM +0200, Tijl Coosemans wrote:
> >> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
> >>> With a recent head/current kernel (doesn't happen when running a Dec.
> >>> 2017 one), when I do a halt, it gets as far as:
> >>>
> >>> vnodes remaining... 0 time out
> >>>
> >>> and that's it (the time out appears several seconds after the first "0").
> >>> With a Dec. 2017 kernel there would be several "0"s printed.
> >>> It appears that it is stuck in the first iteration of the sched_sync()
> >>> loop after it is no longer in SYNCER_RUNNING state.
> >>>
> >>> Any ideas? rick
> >>
> >> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
> >> I have a patch (attached) but haven't been able to test it yet.
> >>
> >> Index: sys/kern/vfs_bio.c
> >> ===================================================================
> >> --- sys/kern/vfs_bio.c (revision 332165)
> >> +++ sys/kern/vfs_bio.c (working copy)
> >> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
> >> {
> >> struct bufdomain *bd;
> >>
> >> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> >> + SHUTDOWN_PRI_LAST);
> >> +
> >> bd = arg;
> >> for (;;) {
> >> - kproc_suspend_check(curproc);
> >> + kthread_suspend_check();
> >>
> >> /*
> >> * Free buffers from the clean queue until we meet our
> >> @@ -3357,7 +3360,7 @@ buf_daemon()
> >> /*
> >> * This process needs to be suspended prior to shutdown sync.
> >> */
> >> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
> >> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> >> SHUTDOWN_PRI_LAST);
> >>
> >> /*
> >> @@ -3381,7 +3384,7 @@ buf_daemon()
> >> bd_request = 0;
> >> mtx_unlock(&bdlock);
> >>
> >> - kproc_suspend_check(bufdaemonproc);
> >> + kthread_suspend_check();
> >>
> >> /*
> >> * Save speedupreq for this pass and reset to capture new
> > This looks fine.
>
> Thanks for the review. There's just one concern I have. With this patch
> the bufspace_daemon threads appear to shutdown after the buf_daemon and
> after the syncer because the event handlers are registered later. Are
> there any dependencies between these processes that require the bufspace
> threads to be stopped earlier?

I think for correctness bufdaemon must stop after the syncer, since syncer
operation can cause a situation where bufdaemon help is needed to proceed.
Other than this, the stop order is irrelevant, because after syncer finished,
there should be no any further filesystem activity.
Tijl Coosemans
2018-04-22 14:55:13 UTC
Permalink
On Sun, 22 Apr 2018 16:28:55 +0300 Konstantin Belousov <***@gmail.com> wrote:
> On Sun, Apr 22, 2018 at 03:15:00PM +0200, Tijl Coosemans wrote:
>> Thanks for the review. There's just one concern I have. With this patch
>> the bufspace_daemon threads appear to shutdown after the buf_daemon and
>> after the syncer because the event handlers are registered later. Are
>> there any dependencies between these processes that require the bufspace
>> threads to be stopped earlier?
>
> I think for correctness bufdaemon must stop after the syncer, since syncer
> operation can cause a situation where bufdaemon help is needed to proceed.
> Other than this, the stop order is irrelevant, because after syncer
> finished, there should be no any further filesystem activity.

A quick way to do that would be to use SHUTDOWN_PRI_LAST + 100 for the
event handlers in the patch, like shutdown_conf in kern_shutdown.c
already does. Is that acceptable here as well?
Konstantin Belousov
2018-04-22 15:03:57 UTC
Permalink
On Sun, Apr 22, 2018 at 04:55:13PM +0200, Tijl Coosemans wrote:
> On Sun, 22 Apr 2018 16:28:55 +0300 Konstantin Belousov <***@gmail.com> wrote:
> > On Sun, Apr 22, 2018 at 03:15:00PM +0200, Tijl Coosemans wrote:
> >> Thanks for the review. There's just one concern I have. With this patch
> >> the bufspace_daemon threads appear to shutdown after the buf_daemon and
> >> after the syncer because the event handlers are registered later. Are
> >> there any dependencies between these processes that require the bufspace
> >> threads to be stopped earlier?
> >
> > I think for correctness bufdaemon must stop after the syncer, since syncer
> > operation can cause a situation where bufdaemon help is needed to proceed.
> > Other than this, the stop order is irrelevant, because after syncer
> > finished, there should be no any further filesystem activity.
>
> A quick way to do that would be to use SHUTDOWN_PRI_LAST + 100 for the
> event handlers in the patch, like shutdown_conf in kern_shutdown.c
> already does. Is that acceptable here as well?

I do not see why not.
Claude Buisson
2018-04-21 22:09:46 UTC
Permalink
On 04/21/2018 23:09, Rick Macklem wrote:
> With a recent head/current kernel (doesn't happen when running a Dec. 2017 one),
> when I do a halt, it gets as far as:
>
> vnodes remaining... 0 time out
>
> and that's it (the time out appears several seconds after the first "0").
> With a Dec. 2017 kernel there would be several "0"s printed.
> It appears that it is stuck in the first iteration of the sched_sync() loop after
> it is no longer in SYNCER_RUNNING state.
>
> Any ideas? rick
> _______________________________________________
> freebsd-***@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-***@freebsd.org"
>

There is a PR which seems related: 227404

I was hit by this problem when upgrading an i386 UP system from r329142
to r332518. The culprit appears to be r329612.

I reverted 4 files:

sys/kern/vfs_bio.c -> r329187
sys/kern/vfs_subr.c -> r328643
sys/sys/buf.h -> r329078
sys/sys/bufobj.h -> r326256

rebuild the system, and the hang disappeared.

I have now 2 i386 UP and 1 amd64 MP systems running with this
modification, and 1 RPI2 in the pipe line.

Hope it helps

Claude Buisson
Cy Schubert
2018-04-21 22:27:51 UTC
Permalink
In message <***@kalimero.tijl.coosemans.org>, Tijl
Cooseman
s writes:
> --MP_/TDsO+CDIra7UXGs=vVO3NTB
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrote:
> > With a recent head/current kernel (doesn't happen when running a Dec.
> > 2017 one), when I do a halt, it gets as far as:
> >
> > vnodes remaining... 0 time out
> >
> > and that's it (the time out appears several seconds after the first "0").
> > With a Dec. 2017 kernel there would be several "0"s printed.
> > It appears that it is stuck in the first iteration of the sched_sync()
> > loop after it is no longer in SYNCER_RUNNING state.
> >
> > Any ideas? rick
>
> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
> I have a patch (attached) but haven't been able to test it yet.

I've noticed this as well on my old Penium-M laptop (updated about
twice a year). I'll try your patch this weekend or early next week.

~cy

>
> --MP_/TDsO+CDIra7UXGs=vVO3NTB
> Content-Type: text/x-patch
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment; filename=bufdaemon.patch
>
> Index: sys/kern/vfs_bio.c
> ===================================================================
> --- sys/kern/vfs_bio.c (revision 332165)
> +++ sys/kern/vfs_bio.c (working copy)
> @@ -791,9 +791,12 @@ bufspace_daemon(void *arg)
> {
> struct bufdomain *bd;
>
> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> + SHUTDOWN_PRI_LAST);
> +
> bd = arg;
> for (;;) {
> - kproc_suspend_check(curproc);
> + kthread_suspend_check();
>
> /*
> * Free buffers from the clean queue until we meet our
> @@ -3357,7 +3360,7 @@ buf_daemon()
> /*
> * This process needs to be suspended prior to shutdown sync.
> */
> - EVENTHANDLER_REGISTER(shutdown_pre_sync, kproc_shutdown, bufdaemonproc,
> + EVENTHANDLER_REGISTER(shutdown_pre_sync, kthread_shutdown, curthread,
> SHUTDOWN_PRI_LAST);
>
> /*
> @@ -3381,7 +3384,7 @@ buf_daemon()
> bd_request = 0;
> mtx_unlock(&bdlock);
>
> - kproc_suspend_check(bufdaemonproc);
> + kthread_suspend_check();
>
> /*
> * Save speedupreq for this pass and reset to capture new
>
> --MP_/TDsO+CDIra7UXGs=vVO3NTB
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> freebsd-***@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-***@freebsd.org"
>
> --MP_/TDsO+CDIra7UXGs=vVO3NTB--
>

--
Cheers,
Cy Schubert <***@cschubert.com>
FreeBSD UNIX: <***@FreeBSD.org> Web: http://www.FreeBSD.org

The need of the many outweighs the greed of the few.
Rick Macklem
2018-04-22 03:00:35 UTC
Permalink
Cy Schubert wrote:
>In message <***@slippy.cwsent.com>, Cy Schubert
writes:
>> In message <***@kalimero.tijl.coosemans.org>, Tijl
>> Cooseman
>> s writes:
>> > --MP_/TDsO+CDIra7UXGs=vVO3NTB
>> > Content-Type: text/plain; charset=US-ASCII
>> > Content-Transfer-Encoding: 7bit
>> > Content-Disposition: inline
>> >
>> > On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrot
>> e:
>> > > With a recent head/current kernel (doesn't happen when running a Dec.
>> > > 2017 one), when I do a halt, it gets as far as:
>> > >
>> > > vnodes remaining... 0 time out
>> > >
>> > > and that's it (the time out appears several seconds after the first "0").
>> > > With a Dec. 2017 kernel there would be several "0"s printed.
>> > > It appears that it is stuck in the first iteration of the sched_sync()
>> > > loop after it is no longer in SYNCER_RUNNING state.
>> > >
>> > > Any ideas? rick
>> >
>> > See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
>> > I have a patch (attached) but haven't been able to test it yet.
>>
>> I've noticed this as well on my old Penium-M laptop (updated about
>> twice a year). I'll try your patch this weekend or early next week.
>
>Works perfectly.
Patch seems to work for my i386 as well.

Thanks, rick
Cy Schubert
2018-04-22 03:21:15 UTC
Permalink
In message <***@YQBPR0101MB1042.CAN
PRD01.P
ROD.OUTLOOK.COM>, Rick Macklem writes:
> Cy Schubert wrote:
> >In message <***@slippy.cwsent.com>, Cy Schubert
> writes:
> >> In message <***@kalimero.tijl.coosemans.org>, Tijl
> >> Cooseman
> >> s writes:
> >> > --MP_/TDsO+CDIra7UXGs=vVO3NTB
> >> > Content-Type: text/plain; charset=US-ASCII
> >> > Content-Transfer-Encoding: 7bit
> >> > Content-Disposition: inline
> >> >
> >> > On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> w
> rot
> >> e:
> >> > > With a recent head/current kernel (doesn't happen when running a Dec.
> >> > > 2017 one), when I do a halt, it gets as far as:
> >> > >
> >> > > vnodes remaining... 0 time out
> >> > >
> >> > > and that's it (the time out appears several seconds after the first "0
> ").
> >> > > With a Dec. 2017 kernel there would be several "0"s printed.
> >> > > It appears that it is stuck in the first iteration of the sched_sync()
> >> > > loop after it is no longer in SYNCER_RUNNING state.
> >> > >
> >> > > Any ideas? rick
> >> >
> >> > See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
> >> > I have a patch (attached) but haven't been able to test it yet.
> >>
> >> I've noticed this as well on my old Penium-M laptop (updated about
> >> twice a year). I'll try your patch this weekend or early next week.
> >
> >Works perfectly.
> Patch seems to work for my i386 as well.
>
> Thanks, rick

Yes, thank you.


--
Cheers,
Cy Schubert <***@cschubert.com>
FreeBSD UNIX: <***@FreeBSD.org> Web: http://www.FreeBSD.org

The need of the many outweighs the greed of the few.
Cy Schubert
2018-04-22 02:55:09 UTC
Permalink
In message <***@slippy.cwsent.com>, Cy Schubert
writes:
> In message <***@kalimero.tijl.coosemans.org>, Tijl
> Cooseman
> s writes:
> > --MP_/TDsO+CDIra7UXGs=vVO3NTB
> > Content-Type: text/plain; charset=US-ASCII
> > Content-Transfer-Encoding: 7bit
> > Content-Disposition: inline
> >
> > On Sat, 21 Apr 2018 21:09:09 +0000 Rick Macklem <***@uoguelph.ca> wrot
> e:
> > > With a recent head/current kernel (doesn't happen when running a Dec.
> > > 2017 one), when I do a halt, it gets as far as:
> > >
> > > vnodes remaining... 0 time out
> > >
> > > and that's it (the time out appears several seconds after the first "0").
> > > With a Dec. 2017 kernel there would be several "0"s printed.
> > > It appears that it is stuck in the first iteration of the sched_sync()
> > > loop after it is no longer in SYNCER_RUNNING state.
> > >
> > > Any ideas? rick
> >
> > See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227404
> > I have a patch (attached) but haven't been able to test it yet.
>
> I've noticed this as well on my old Penium-M laptop (updated about
> twice a year). I'll try your patch this weekend or early next week.

Works perfectly.


--
Cheers,
Cy Schubert <***@cschubert.com>
FreeBSD UNIX: <***@FreeBSD.org> Web: http://www.FreeBSD.org

The need of the many outweighs the greed of the few.
Loading...