Discussion:
New NUMA support coming to CURRENT
(too old to reply)
Jeff Roberson
2018-01-09 19:46:54 UTC
Permalink
Hello folks,

I am working on merging improved NUMA support with policy implemented by
cpuset(2) over the next week. This work has been supported by Dell/EMC's
Isilon product division and Netflix. You can see some discussion of these
changes here:

https://reviews.freebsd.org/D13403
https://reviews.freebsd.org/D13289
https://reviews.freebsd.org/D13545

The work has been done in user/jeff/numa if you want to look at svn
history or experiment with the branch. It has been tested by Peter Holm
on i386 and amd64 and it has been verified to work on arm at various
points.

We are working towards compatibility with libnuma and linux mbind. These
commits will bring in improved support for NUMA in the kernel. There are
new domain specific allocation functions available to kernel for UMA,
malloc, kmem_, and vm_page*. busdmamem consumers will automatically be
placed in the correct domain, bringing automatic improvements to some
device performance.

cpuset will be able to constrains processes, groups of processes, jails,
etc. to subsets of the system memory domains, just as it can with sets of
cpus. It can set default policy for any of the above. Threads can use
cpusets to set policy that specifies a subset of their visible domains.

Available policies are first-touch (local in linux terms), round-robin
(similar to linux interleave), and preferred. For now, the default is
round-robin. You can achieve a fixed domain policy by using round-robin
with a bitmask of a single domain. As the scheduler and VM become more
sophisticated we may switch the default to first-touch as linux does.

Currently these features are enabled with VM_NUMA_ALLOC and MAXMEMDOM. It
will eventually be NUMA/MAXMEMDOM to match SMP/MAXCPU. The current NUMA
syscalls and VM_NUMA_ALLOC code was 'experimental' and will be deprecated.
numactl will continue to be supported although cpuset should be preferred
going forward as it supports the full feature set of the new API.

Thank you for your patience as I deal with the inevitable fallout of such
sweeping changes. If you do have bugs, please file them in bugzilla, or
reach out to me directly. I don't always have time to catch up on all of
my mailing list mail and regretfully things slip through the cracks when
they are not addressed directly to me.

Thanks,
Jeff
Jeff Roberson
2018-01-14 03:39:49 UTC
Permalink
Hello,

This work has been committed. It is governed by a new 'NUMA' config
option and 'DEVICE_NUMA' and 'VM_NUMA_ALLOC' have both been retired. This
option is fairly light weight and I will likely enable it in GENERIC
before 12.0 release.

I have heard reports that switching from a default policy of first-touch
to round-robin has caused some performance regression. You can change the
default policy at runtime by doing the following:

cpuset -s 1 -n first-touch:all

This is the default set that all others inherit from. You can query the
current default with:
cpuset -g -s 1

I will be investigating the regression and tweaking the default policy
based on performance feedback from multiple workloads. This may take some
time.

numactl is still functional but deprecated. Man pages will be updated
soonish.

Thank you for your patience as I work on refining this somewhat involved
feature.

Thanks,
Jeff
Post by Jeff Roberson
Hello folks,
I am working on merging improved NUMA support with policy implemented by
cpuset(2) over the next week. This work has been supported by Dell/EMC's
Isilon product division and Netflix. You can see some discussion of these
https://reviews.freebsd.org/D13403
https://reviews.freebsd.org/D13289
https://reviews.freebsd.org/D13545
The work has been done in user/jeff/numa if you want to look at svn history
or experiment with the branch. It has been tested by Peter Holm on i386 and
amd64 and it has been verified to work on arm at various points.
We are working towards compatibility with libnuma and linux mbind. These
commits will bring in improved support for NUMA in the kernel. There are new
domain specific allocation functions available to kernel for UMA, malloc,
kmem_, and vm_page*. busdmamem consumers will automatically be placed in the
correct domain, bringing automatic improvements to some device performance.
cpuset will be able to constrains processes, groups of processes, jails, etc.
to subsets of the system memory domains, just as it can with sets of cpus.
It can set default policy for any of the above. Threads can use cpusets to
set policy that specifies a subset of their visible domains.
Available policies are first-touch (local in linux terms), round-robin
(similar to linux interleave), and preferred. For now, the default is
round-robin. You can achieve a fixed domain policy by using round-robin with
a bitmask of a single domain. As the scheduler and VM become more
sophisticated we may switch the default to first-touch as linux does.
Currently these features are enabled with VM_NUMA_ALLOC and MAXMEMDOM. It
will eventually be NUMA/MAXMEMDOM to match SMP/MAXCPU. The current NUMA
syscalls and VM_NUMA_ALLOC code was 'experimental' and will be deprecated.
numactl will continue to be supported although cpuset should be preferred
going forward as it supports the full feature set of the new API.
Thank you for your patience as I deal with the inevitable fallout of such
sweeping changes. If you do have bugs, please file them in bugzilla, or
reach out to me directly. I don't always have time to catch up on all of my
mailing list mail and regretfully things slip through the cracks when they
are not addressed directly to me.
Thanks,
Jeff
Ultima
2018-01-18 02:48:23 UTC
Permalink
Hello Jeff,

Few days ago I upgraded my system firmware, upgraded
base to r327991 and altered snooping to
cluster-on-die. Its hard to say what is making
the server feels like a completely system due
to all these changes (also running llvm 6.0),
but I am betting it is the NUMA optimizations.
It is so responsive! Thanks for the amazing
work!

Best regards,
Richard Gallamore
Post by Jeff Roberson
Hello,
This work has been committed. It is governed by a new 'NUMA' config
option and 'DEVICE_NUMA' and 'VM_NUMA_ALLOC' have both been retired. This
option is fairly light weight and I will likely enable it in GENERIC before
12.0 release.
I have heard reports that switching from a default policy of first-touch
to round-robin has caused some performance regression. You can change the
cpuset -s 1 -n first-touch:all
This is the default set that all others inherit from. You can query the
cpuset -g -s 1
I will be investigating the regression and tweaking the default policy
based on performance feedback from multiple workloads. This may take some
time.
numactl is still functional but deprecated. Man pages will be updated
soonish.
Thank you for your patience as I work on refining this somewhat involved
feature.
Thanks,
Jeff
Hello folks,
Post by Jeff Roberson
I am working on merging improved NUMA support with policy implemented by
cpuset(2) over the next week. This work has been supported by Dell/EMC's
Isilon product division and Netflix. You can see some discussion of these
https://reviews.freebsd.org/D13403
https://reviews.freebsd.org/D13289
https://reviews.freebsd.org/D13545
The work has been done in user/jeff/numa if you want to look at svn
history or experiment with the branch. It has been tested by Peter Holm on
i386 and amd64 and it has been verified to work on arm at various points.
We are working towards compatibility with libnuma and linux mbind. These
commits will bring in improved support for NUMA in the kernel. There are
new domain specific allocation functions available to kernel for UMA,
malloc, kmem_, and vm_page*. busdmamem consumers will automatically be
placed in the correct domain, bringing automatic improvements to some
device performance.
cpuset will be able to constrains processes, groups of processes, jails,
etc. to subsets of the system memory domains, just as it can with sets of
cpus. It can set default policy for any of the above. Threads can use
cpusets to set policy that specifies a subset of their visible domains.
Available policies are first-touch (local in linux terms), round-robin
(similar to linux interleave), and preferred. For now, the default is
round-robin. You can achieve a fixed domain policy by using round-robin
with a bitmask of a single domain. As the scheduler and VM become more
sophisticated we may switch the default to first-touch as linux does.
Currently these features are enabled with VM_NUMA_ALLOC and MAXMEMDOM.
It will eventually be NUMA/MAXMEMDOM to match SMP/MAXCPU. The current NUMA
syscalls and VM_NUMA_ALLOC code was 'experimental' and will be deprecated.
numactl will continue to be supported although cpuset should be preferred
going forward as it supports the full feature set of the new API.
Thank you for your patience as I deal with the inevitable fallout of such
sweeping changes. If you do have bugs, please file them in bugzilla, or
reach out to me directly. I don't always have time to catch up on all of
my mailing list mail and regretfully things slip through the cracks when
they are not addressed directly to me.
Thanks,
Jeff
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-current
Loading...