Discussion:
swapping is completely broken in -CURRENT r334649?
(too old to reply)
Lev Serebryakov
2018-06-05 15:55:52 UTC
Permalink
I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"

--
// Lev Serebryakov
Gary Jennejohn
2018-06-05 16:17:16 UTC
Permalink
On Tue, 5 Jun 2018 18:55:52 +0300
Post by Lev Serebryakov
I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"
I complained about this also and alc@ gave me this hint:
sysctl vm.pageout_update_period=0

I don't whether it will help, but you can give it a try.
--
Gary Jennejohn
Lev Serebryakov
2018-06-05 21:09:43 UTC
Permalink
Post by Gary Jennejohn
Post by Lev Serebryakov
I have 16G of free swap (out of 16G configured), but programs are
killed due to "out of swap space"
sysctl vm.pageout_update_period=0
I don't whether it will help, but you can give it a try.
Looks like it helps a little. Very resource-hungry operation have been
completed, but after ~10 minutes, when compilation have been finished,
and swap is clear again, system starts to kill processes. WTF?!
--
// Lev Serebryakov
Lev Serebryakov
2018-06-05 21:22:08 UTC
Permalink
Post by Gary Jennejohn
sysctl vm.pageout_update_period=0
Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!

It looks like very serious bug.
--
// Lev Serebryakov
Mark Johnston
2018-06-05 21:48:08 UTC
Permalink
Post by Lev Serebryakov
Post by Gary Jennejohn
sysctl vm.pageout_update_period=0
Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.
The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.
Kevin Lo
2018-06-15 05:10:25 UTC
Permalink
Post by Mark Johnston
Post by Lev Serebryakov
Post by Gary Jennejohn
sysctl vm.pageout_update_period=0
Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.
The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.
I have a large swap space and I've encountered this issue as well

pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...

Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.

Kevin
Mark Johnston
2018-06-15 08:40:22 UTC
Permalink
Post by Kevin Lo
Post by Mark Johnston
Post by Lev Serebryakov
Post by Gary Jennejohn
sysctl vm.pageout_update_period=0
Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.
The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.
I have a large swap space and I've encountered this issue as well
pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...
Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
Kurt Jaeger
2018-06-15 08:48:08 UTC
Permalink
Hi!
Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.

I'm unsure it was because of that problem or a problem with qemu.
--
***@opsec.eu +49 171 3101372 2 years to go !
Mark Johnston
2018-06-15 09:03:58 UTC
Permalink
Post by Kurt Jaeger
Hi!
Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.
How much memory and swap does the guest have? Were you consistently
able to complete a run before?

If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);

/*
Kurt Jaeger
2018-06-15 09:07:34 UTC
Permalink
Hi!
Post by Mark Johnston
Post by Kurt Jaeger
Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.
How much memory and swap does the guest have?
It's started by poudriere, I do not really know.
Post by Mark Johnston
Were you consistently able to complete a run before?
Two years ago, on a much lower version of FreeBSD, yes.

I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.
Post by Mark Johnston
If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.
diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);
/*
I'll have a look at this.
--
***@opsec.eu +49 171 3101372 2 years to go !
Mark Johnston
2018-06-15 09:09:54 UTC
Permalink
Post by Kurt Jaeger
Hi!
Post by Mark Johnston
Post by Kurt Jaeger
Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.
How much memory and swap does the guest have?
It's started by poudriere, I do not really know.
Post by Mark Johnston
Were you consistently able to complete a run before?
Two years ago, on a much lower version of FreeBSD, yes.
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.
I suspect it is a different issue then.
Post by Kurt Jaeger
Post by Mark Johnston
If it's happening during a poudriere run, it may well have been a true
OOM situation. The patch below prints a few stats to the dmesg before
the kill. The output of that together with "sysctl vm" output should be
enough to determine what's happening.
diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1670,6 +1670,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
* start OOM. Initiate the selection and signaling of the
* victim.
*/
+ printf("v_free_count: %u, v_inactive_count: %u\n",
+ vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
vm_pageout_oom(VM_OOM_MEM);
/*
I'll have a look at this.
--
Mikaël Urankar
2018-06-15 09:14:40 UTC
Permalink
Post by Kurt Jaeger
Hi!
Post by Mark Johnston
Post by Kurt Jaeger
Post by Mark Johnston
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
When I tried to run a qemu-based poudriere run yesterday on a r334918
box, it killed a few processes outside of that run and did not
work out.
I'm unsure it was because of that problem or a problem with qemu.
How much memory and swap does the guest have?
It's started by poudriere, I do not really know.
Post by Mark Johnston
Were you consistently able to complete a run before?
Two years ago, on a much lower version of FreeBSD, yes.
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.
Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.
Kurt Jaeger
2018-06-15 09:36:00 UTC
Permalink
Hi!
Post by Mikaël Urankar
Post by Kurt Jaeger
I just started it again, and after a while the qemu-ppc64-static
was at approx. 23 GB memory and increasing, without much progress.
Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.
Ok, thanks! Then it's not the same problem.
--
***@opsec.eu +49 171 3101372 2 years to go !
Mark Linimon
2018-06-16 02:31:48 UTC
Permalink
Post by Mikaël Urankar
Last time I tried (2 weeks ago) qemu-ppc64-static was broken, not sure the
situation has evolved since that.
I've been told by more than one person that it works, but the 2? 3? times
I've tried it it just hung.

I have real hardware so in general it doesn't make a difference to me,
but I'd like to know one way or the other.

mcl

Kevin Lo
2018-06-15 13:46:21 UTC
Permalink
Post by Mark Johnston
Post by Kevin Lo
Post by Mark Johnston
Post by Lev Serebryakov
Post by Gary Jennejohn
sysctl vm.pageout_update_period=0
Really, situation is worse than stated in subject, because processes
are being killed AFTER memory pressure, when here are a lot of free
memory already!
It looks like very serious bug.
The issue was identified earlier this week and is being worked on. It's
a regression from r329882 which appears only on certain hardware. You
can probably work around it by setting vm.pageout_oom_seq to a large
value (try 1000 for instance), though this will make the "true" OOM
killer take longer to kick in. The problem is unrelated to the
pageout_update_period.
I have a large swap space and I've encountered this issue as well
pid 90707 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
pid 90709 (getty), uid 0, was killed: out of swap space
...
Setting vm.pageout_oom_seq to 1000 doesn't help. If you have a patch I'll be
happy to test it, thanks.
The change was committed as r334752. Are you seeing unexpected OOM
kills on or after that revision?
The box is running -CURRENT r334983. I'll investigate further, thanks.
Loading...