swapping is completely broken in -CURRENT r334649?

I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"âŠ

--
// Lev Serebryakov

Gary Jennejohn

2018-06-05 16:17:16 UTC

On Tue, 5 Jun 2018 18:55:52 +0300

Post by Lev Serebryakov
I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"

I complained about this also and alc@ gave me this hint:
sysctl vm.pageout_update_period=0

I don't whether it will help, but you can give it a try.

--
Gary Jennejohn

Lev Serebryakov

2018-06-05 21:09:43 UTC

Post by Gary Jennejohn

Post by Lev Serebryakov
I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"

sysctl vm.pageout_update_period=0
I don't whether it will help, but you can give it a try.

Looks like it helps a little. Very resource-hungry operation have been
completed, but after ~10 minutes, when compilation have been finished,
and swap is clear again, system starts to kill processes. WTF?!

--
// Lev Serebryakov

Lev Serebryakov

2018-06-05 21:22:08 UTC

Post by Gary Jennejohn
sysctl vm.pageout_update_period=0

Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!

It looks like very serious bug.

--
// Lev Serebryakov

Mark Johnston

2018-06-05 21:48:08 UTC

Post by Gary Jennejohn
sysctl vm.pageout_update_period=0

Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.

The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.

Kevin Lo

2018-06-15 05:10:25 UTC

Post by Gary Jennejohn
sysctl vm.pageout_update_period=0

Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.

I have a large swap space and I've encountered this issue as well

pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...

Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.

Kevin

Mark Johnston

2018-06-15 08:40:22 UTC

Post by Kevin Lo

Post by Gary Jennejohn
sysctl vm.pageout_update_period=0

Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.

I have a large swap space and I've encountered this issue as well
pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...
Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.

The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

Kurt Jaeger

2018-06-15 08:48:08 UTC

Hi!

Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.

I'm unsure it was because of that problem or a problem with qemu.

--
***@opsec.eu +49 171 3101372 2 years to go !

Mark Johnston

2018-06-15 09:03:58 UTC

Post by Kurt Jaeger
Hi!

Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.

How much memory and swap does the guest have? Were you consistently
able to complete a run before?

If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);

/*

Kurt Jaeger

2018-06-15 09:07:34 UTC

Hi!

Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.

How much memory and swap does the guest have?

It's started by poudriere, I do not really know.

Post by Mark Johnston
Were you consistently able to complete a run before?

Two years ago, on a much lower version of FreeBSD, yes.

I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.

Post by Mark Johnston
If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.
diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);
/*

I'll have a look at this.

--
***@opsec.eu +49 171 3101372 2 years to go !

Mark Johnston

2018-06-15 09:09:54 UTC

Post by Kurt Jaeger
Hi!

Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.

How much memory and swap does the guest have?

It's started by poudriere, I do not really know.

Post by Mark Johnston
Were you consistently able to complete a run before?

Two years ago, on a much lower version of FreeBSD, yes.
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.

I suspect it is a different issue then.

I'll have a look at this.
--

Mikaël Urankar

2018-06-15 09:14:40 UTC

Post by Kurt Jaeger
Hi!

Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?

When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.

How much memory and swap does the guest have?

It's started by poudriere, I do not really know.

Post by Mark Johnston
Were you consistently able to complete a run before?

Two years ago, on a much lower version of FreeBSD, yes.
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.

Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.

Kurt Jaeger

2018-06-15 09:36:00 UTC

Hi!

Post by MikaÃ«l Urankar

Post by Kurt Jaeger
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.

Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.

Ok, thanks! Then it's not the same problem.

--
***@opsec.eu +49 171 3101372 2 years to go !

Mark Linimon

2018-06-16 02:31:48 UTC

Post by MikaÃ«l Urankar
Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.

I've been told by more than one person that it works, but the 2? 3? times
I've tried it it just hung.

I have real hardware so in general it doesn't make a difference to me,
but I'd like to know one way or the other.

mcl

Kevin Lo

2018-06-15 13:46:21 UTC

Post by Kevin Lo