Deadlocks / hangs in ZFS

I may be seeing similar issues. Have you tried leaving top -SHa running
and seeing what threads are using CPU when it hangs? I did and saw pid
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
happening. Do you see similar?

Steve

Post by Alexander Leidinger
Hi,
does someone else experience deadlocks / hangs in ZFS?
What I see is that if on a 2 socket / 4 cores -> 16 threads system I do
a lot in parallel (e.g. updating ports in several jails), then the
system may get into a state were I can login, but any exit (e.g. from
top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C
all updates to get the system into a good shape again, but most of the
times it doesn't.
On another system at the same rev (333966) with a lot less CPUs (and AMD
instead of Intel), I don't see such a behavior.
Bye,
Alexander.

Slawa Olhovchenkov

2018-05-22 12:29:24 UTC

Post by Steve Wills
I may be seeing similar issues. Have you tried leaving top -SHa running
and seeing what threads are using CPU when it hangs? I did and saw pid
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
happening. Do you see similar?

Can you try https://reviews.freebsd.org/D7538 and report?

Post by Steve Wills

_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current

Alexander Leidinger

2018-05-22 14:16:32 UTC

I will try and report back.

Post by Slawa Olhovchenkov
Can you try https://reviews.freebsd.org/D7538 and report?

The patch tells it is against -STABLE, we're talking -current here.
It has been a while since I tried Karl's patch the last time, and I
stopped because it didn't apply to -current anymore at some point.
Will what is provided right now in the patch work on -current?

As a data point, the system I talk about in the start of the thread
has 64 GB RAM and the ARC is not limited via sysctl.

Bye,
Alexander.

--
http://www.Leidinger.net ***@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org ***@FreeBSD.org : PGP 0x8F31830F9F2772BF

Slawa Olhovchenkov

2018-05-22 14:40:55 UTC

I will try and report back.

Post by Slawa Olhovchenkov
Can you try https://reviews.freebsd.org/D7538 and report?

The patch tells it is against -STABLE, we're talking -current here.

ZFS don't changes this.

Post by Alexander Leidinger
It has been a while since I tried Karl's patch the last time, and I
stopped because it didn't apply to -current anymore at some point.
Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and current) in one review.

Post by Alexander Leidinger
As a data point, the system I talk about in the start of the thread
has 64 GB RAM and the ARC is not limited via sysctl.

Currently vanlia ARC poorly limited via sysctl. After abd extra.
May be interesting test

./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:boolean_t zfs_abd_scatter_enabled = B_FALSE;

(no sysctl for change this exist)

Kirill Ponomarev

2018-05-27 19:41:59 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and current) in one review.

I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Slawa Olhovchenkov

2018-05-27 22:06:12 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and current) in one review.

I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

Alexander Leidinger

2018-05-28 07:02:01 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and

current) in one review.
I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

I applied the patch in the review yesterday to rev 333966, it applied
OK (with some fuzz). I will try to reproduce my issue with the patch.

Some thoughts I had after looking a little bit at the output of top...
half of the RAM of my machine is in use, the other half is listed as
free. Swap gets used while there is plenty of free RAM. I have NUMA in
my kernel (it's 2 socket Xeon system). I don't see any NUMA specific
code in the diff (and I don't expect something there), but could it be
that some NUMA related behavior comes into play here too? Does it make
sense to try without NUMA in the kernel?

Bye,
Alexander.

--
http://www.Leidinger.net ***@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org ***@FreeBSD.org : PGP 0x8F31830F9F2772BF

Slawa Olhovchenkov

2018-05-28 08:10:46 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and

current) in one review.
I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

I applied the patch in the review yesterday to rev 333966, it applied
OK (with some fuzz). I will try to reproduce my issue with the patch.
Some thoughts I had after looking a little bit at the output of top...
half of the RAM of my machine is in use, the other half is listed as
free. Swap gets used while there is plenty of free RAM. I have NUMA in
my kernel (it's 2 socket Xeon system). I don't see any NUMA specific
code in the diff (and I don't expect something there), but could it be
that some NUMA related behavior comes into play here too? Does it make
sense to try without NUMA in the kernel?

Good question, NUMA in FreeBSD too new, nobody know it.
For Linux, some effectt exists: exhaust all memory in one NUMA domain
can cause memory deficit (swap/allocation failure/etc) simultaneous
with many free memory in other NUMA domain.

Yes, try w/o NUMA, this is may be interesting for NUMA developers.

Alexander Leidinger

2018-06-03 19:14:50 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and
current) in one review.

I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

I applied the patch in the review yesterday to rev 333966, it
applied OK (with some fuzz). I will try to reproduce my issue with
the patch.

The behavior changed (or the system was long enough in this state
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xfffff803766db580,
blocked for 1803003 ticks

I only have the textdump. Is nayone up to debug this? If yes, I switch
to normal dumps, just tell me what I shall check for.

db:0:kdb.enter.panic> run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo> show alllocks
No such command; use "help" to list available commands
db:1:lockinfo> show lockedvnods
Locked vnodes
db:0:kdb.enter.panic> show pcpu
cpuid = 6
dynamic pcpu = 0xfffffe008f03e840
curthread = 0xfffff80370c82000: pid 0 tid 100218 "deadlkres"
curpcb = 0xfffffe0116472cc0
fpcurthread = none
idlethread = 0xfffff803700b9580: tid 100008 "idle: cpu6"
curpmap = 0xffffffff80d28448
tssp = 0xffffffff80d96d90
commontssp = 0xffffffff80d96d90
rsp0 = 0xfffffe0116472cc0
gs32p = 0xffffffff80d9d9c8
ldt = 0xffffffff80d9da08
tss = 0xffffffff80d9d9f8
db:0:kdb.enter.panic> bt
Tracing pid 0 tid 100218 td 0xfffff80370c82000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0116472aa0
vpanic() at vpanic+0x1c0/frame 0xfffffe0116472b00
panic() at panic+0x43/frame 0xfffffe0116472b60
deadlkres() at deadlkres+0x3a6/frame 0xfffffe0116472bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0116472bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0116472bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Bye,
Alexander.

--
http://www.Leidinger.net ***@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org ***@FreeBSD.org : PGP 0x8F31830F9F2772BF

Slawa Olhovchenkov

2018-06-03 19:28:14 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and
current) in one review.

I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

I applied the patch in the review yesterday to rev 333966, it
applied OK (with some fuzz). I will try to reproduce my issue with
the patch.

The behavior changed (or the system was long enough in this state
panic: deadlkres: possible deadlock detected for 0xfffff803766db580,
blocked for 1803003 ticks

Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xfffff803766db580

or

kgdb
x/10i 0xfffff803766db580

Post by Alexander Leidinger
I only have the textdump. Is nayone up to debug this? If yes, I switch
to normal dumps, just tell me what I shall check for.
db:0:kdb.enter.panic> run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo> show alllocks
No such command; use "help" to list available commands
db:1:lockinfo> show lockedvnods
Locked vnodes
db:0:kdb.enter.panic> show pcpu
cpuid = 6
dynamic pcpu = 0xfffffe008f03e840
curthread = 0xfffff80370c82000: pid 0 tid 100218 "deadlkres"
curpcb = 0xfffffe0116472cc0
fpcurthread = none
idlethread = 0xfffff803700b9580: tid 100008 "idle: cpu6"
curpmap = 0xffffffff80d28448
tssp = 0xffffffff80d96d90
commontssp = 0xffffffff80d96d90
rsp0 = 0xfffffe0116472cc0
gs32p = 0xffffffff80d9d9c8
ldt = 0xffffffff80d9da08
tss = 0xffffffff80d9d9f8
db:0:kdb.enter.panic> bt
Tracing pid 0 tid 100218 td 0xfffff80370c82000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0116472aa0
vpanic() at vpanic+0x1c0/frame 0xfffffe0116472b00
panic() at panic+0x43/frame 0xfffffe0116472b60
deadlkres() at deadlkres+0x3a6/frame 0xfffffe0116472bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0116472bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0116472bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Bye,
Alexander.
--

Alexander Leidinger

2018-06-04 20:31:08 UTC

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and
current) in one review.

I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?

I applied the patch in the review yesterday to rev 333966, it
applied OK (with some fuzz). I will try to reproduce my issue with
the patch.

The behavior changed (or the system was long enough in this state
panic: deadlkres: possible deadlock detected for 0xfffff803766db580,
blocked for 1803003 ticks

Hmm, may be first determinate locked function
addr2line -ie /boot/kernel/kernel 0xfffff803766db580
or
kgdb
x/10i 0xfffff803766db580

Both don'T produce any sensible output:
(kgdb) x/10i 0xfffff803766db580
0xfffff803766db580: subb $0x80,-0x78(%rsi)
0xfffff803766db584: (bad)
0xfffff803766db585: (bad)
0xfffff803766db586: (bad)
0xfffff803766db587: incl -0x7f7792(%rax)
0xfffff803766db58d: (bad)
0xfffff803766db58e: (bad)
0xfffff803766db58f: incl -0x7f7792(%rax)
0xfffff803766db595: (bad)
0xfffff803766db596: (bad)

Seems I need to provoke a real kernel dump instead of a textdump for this.

Bye,
Alexander.

--
http://www.Leidinger.net ***@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org ***@FreeBSD.org : PGP 0x8F31830F9F2772BF

Alexander Leidinger

2018-05-26 19:54:10 UTC

Post by Steve Wills
I may be seeing similar issues. Have you tried leaving top -SHa
running and seeing what threads are using CPU when it hangs? I did
and saw pid 17 [zfskern{txg_thread_enter}] using lots of CPU but no
disk activity happening. Do you see similar?

For me it is a different zfs process/kthread, l2arc_feed_thread.
Please note that there is still 31 GB free, so it doesn't look lie
resource exhaustion. What I consider strange is the swap usage. I
watched the system and it started to use swap while there were >30 GB
listed as free (in/out rates visible from time to time, and plenty of
RAM free... ???).

last pid: 93392; load averages: 0.16, 0.44, 1.03
up 1+15:36:34 22:35:45
1509 processes:17 running, 1392 sleeping, 3 zombie, 97 waiting
CPU: 0.1% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle
Mem: 597M Active, 1849M Inact, 6736K Laundry, 25G Wired, 31G Free
ARC: 20G Total, 9028M MFU, 6646M MRU, 2162M Anon, 337M Header, 1935M Other
14G Compressed, 21G Uncompressed, 1.53:1 Ratio
Swap: 4096M Total, 1640M Used, 2455M Free, 40% Inuse

PID JID USERNAME PRI NICE SIZE RES STATE C TIME
WCPU COMMAND
10 0 root 155 ki31 0K 256K CPU1 1 35.4H
100.00% [idle{idle: cpu1}]
10 0 root 155 ki31 0K 256K CPU11 11 35.2H
100.00% [idle{idle: cpu11}]
10 0 root 155 ki31 0K 256K CPU3 3 35.2H
100.00% [idle{idle: cpu3}]
10 0 root 155 ki31 0K 256K CPU15 15 35.1H
100.00% [idle{idle: cpu15}]
10 0 root 155 ki31 0K 256K RUN 9 35.1H
100.00% [idle{idle: cpu9}]
10 0 root 155 ki31 0K 256K CPU5 5 35.0H
100.00% [idle{idle: cpu5}]
10 0 root 155 ki31 0K 256K CPU14 14 35.0H
100.00% [idle{idle: cpu14}]
10 0 root 155 ki31 0K 256K CPU0 0 35.8H
99.12% [idle{idle: cpu0}]
10 0 root 155 ki31 0K 256K CPU6 6 35.3H
98.79% [idle{idle: cpu6}]
10 0 root 155 ki31 0K 256K CPU8 8 35.1H
98.31% [idle{idle: cpu8}]
10 0 root 155 ki31 0K 256K CPU12 12 35.0H
97.24% [idle{idle: cpu12}]
10 0 root 155 ki31 0K 256K CPU4 4 35.4H
96.71% [idle{idle: cpu4}]
10 0 root 155 ki31 0K 256K CPU10 10 35.0H
92.37% [idle{idle: cpu10}]
10 0 root 155 ki31 0K 256K CPU7 7 35.2H
92.20% [idle{idle: cpu7}]
10 0 root 155 ki31 0K 256K CPU13 13 35.1H
91.90% [idle{idle: cpu13}]
10 0 root 155 ki31 0K 256K CPU2 2 35.4H
90.97% [idle{idle: cpu2}]
11 0 root -60 - 0K 816K WAIT 0 15:08
0.82% [intr{swi4: clock (0)}]
31 0 root -16 - 0K 80K pwait 0 44:54
0.60% [pagedaemon{dom0}]
45453 0 root 20 0 16932K 7056K CPU9 9 4:12
0.24% top -SHaj
24 0 root -8 - 0K 256K l2arc_ 0 4:12
0.21% [zfskern{l2arc_feed_thread}]
2375 0 root 20 0 16872K 6868K select 11 3:52
0.20% top -SHua
7007 12 235 20 0 18017M 881M uwait 12 0:00
0.19% [java{ESH-thingHandler-35}]
32 0 root -16 - 0K 16K psleep 15 5:03
0.11% [vmdaemon]
41037 0 netchild 27 0 18036K 9136K select 4 2:20
0.09% tmux: server (/tmp/tmux-1001/default) (t
36 0 root -16 - 0K 16K - 6 2:02
0.09% [racctd]
7007 12 235 20 0 18017M 881M uwait 9 1:24
0.07% [java{java}]
4746 0 root 20 0 13020K 3792K nanslp 8 0:52
0.05% zpool iostat space 1
0 0 root -76 - 0K 10304K - 4 0:16
0.05% [kernel{if_io_tqg_4}]
5550 8 933 20 0 2448M 607M uwait 8 0:41
0.03% [java{java}]
5550 8 933 20 0 2448M 607M uwait 13 0:03
0.03% [java{Timer-1}]
7007 12 235 20 0 18017M 881M uwait 0 0:39
0.02% [java{java}]
5655 8 560 20 0 21524K 4840K select 6 0:21
0.02% /usr/local/sbin/hald{hald}
30 0 root -16 - 0K 16K - 4 0:25
0.01% [rand_harvestq]
1259 0 root 20 0 18780K 18860K select 14 0:19
0.01% /usr/sbin/ntpd -c /etc/ntp.conf -p /var/
0 0 root -76 - 0K 10304K - 12 0:19
0.01% [kernel{if_config_tqg_0}]
31 0 root -16 - 0K 80K psleep 0 0:38
0.01% [pagedaemon{dom1}]
0 0 root -76 - 0K 10304K - 5 0:04
0.01% [kernel{if_io_tqg_5}]
7007 12 235 20 0 18017M 881M uwait 1 0:16
0.01% [java{Karaf Lock Monitor }]
12622 2 88 20 0 1963M 247M uwait 7 0:13
0.01% [mysqld{mysqld}]
27043 0 netchild 20 0 18964K 9124K select 6 0:01
0.01% sshd: ***@pts/0 (sshd)
7007 12 235 20 0 18017M 881M uwait 8 0:10
0.01% [java{openHAB-job-schedul}]
7007 12 235 20 0 18017M 881M uwait 6 0:10
0.01% [java{openHAB-job-schedul}]

Post by Steve Wills

Post by Alexander Leidinger
Hi,
does someone else experience deadlocks / hangs in ZFS?
What I see is that if on a 2 socket / 4 cores -> 16 threads system
I do a lot in parallel (e.g. updating ports in several jails), then
the system may get into a state were I can login, but any exit
(e.g. from top) or logout of shell blocks somewhere. Sometimes it
helps to CTRL-C all updates to get the system into a good shape
again, but most of the times it doesn't.
On another system at the same rev (333966) with a lot less CPUs
(and AMD instead of Intel), I don't see such a behavior.
Bye,
Alexander.

_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current

--
http://www.Leidinger.net ***@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org ***@FreeBSD.org : PGP 0x8F31830F9F2772BF

Luciano Mannucci

2018-05-22 12:38:36 UTC

On Tue, 22 May 2018 10:17:49 +0200

Post by Alexander Leidinger
does someone else experience deadlocks / hangs in ZFS?

I did experience ZFS hangs on heavy load on relatively big iron (using
rsync, in my case). Theh was cured by reducing the amount of available
RAM to the zfs caching mechanism. Parameters in /boot/loader.conf
vfs.zfs.vdev.cache.size and vfs.zfs.arc_max may be your friends.
On a 16G machine not showing the syptoms anymore I have set:

kern.maxusers="4096"
vfs.zfs.vdev.cache.size="5G"
vfs.zfs.arc_min="1228800000"
vfs.zfs.arc_max="9830400000"

hope it helps,

Luciano.

--
/"\ /Via A. Salaino, 7 - 20144 Milano (Italy)
\ / ASCII RIBBON CAMPAIGN / PHONE : +39 2 485781 FAX: +39 2 48578250
X AGAINST HTML MAIL / E-MAIL: ***@sublink.sublink.ORG
/ \ AND POSTINGS / WWW: http://www.lesassaie.IT/

Andrea Venturoli

2018-05-22 14:04:59 UTC