Non-uniform memory access (NUMA) is a computer memory design
used in multiprocessing, where the memory access time depends on the memory
location relative to the processor. Under NUMA, a processor can access its own
local memory faster than non-local memory (memory local to another processor or
memory shared between processors) 1
While configuring a SQL Server environments, it is very
important to understand the underlying NUMA architecture. SQL Server has its
own scheduling system, the User Mode Scheduler (UMS) which is NUMA aware.
Taking this into account we have to be aware of:
Taking this into account we have to be aware of:
- Parallel queries do not exceed the boundaries of a NUMA node. Ensure that the ‘max degree of parallelism’ setting (MAXDOP) should be set to maximum the number of physical CPUs per NUMA node or a number through which this number can be divided. For example, for 8 CPUs per NUMA node valid the MAXDOP values would be 8,4,2 or 1. The idea behind is to avoid remote memory access on NUMA nodes
- Even distributed number of CPUs across NUMA nodes
Because of core licensing, CPU affinity
mask or virtual machine configuration, the number of CPUs used per NUMA node
can differ. This must be adjusted in order to prevent scheduling and
performance issues.
Query to get the number of CPUs per NUMA node
used by SQL Server:
SELECT parent_node_id 'NUMANode',COUNT(*) 'ActiveCPUs'
FROM sys.dm_os_schedulers
WHERE scheduler_id<1048576 AND is_online=1
GROUP BY parent_node_id
Query to get the number of CPUs per NUMA node:
SELECT parent_node_id 'NUMANode',count(*) 'Numer of CPUs'
FROM sys.dm_os_schedulers
WHERE scheduler_id<1048576
GROUP BY parent_node_id
In this article, I am going more in depth on the VM
configuration in order to successfully align it to the host NUMA configuration.
In order to get the information about your NUMA configuration of the host you
can use the Sysinternals Coreinfo tool (https://technet.microsoft.com/en-us/sysinternals/cc835722.aspx).
The most important sections to understand the architecture:
·
Logical to Physical Processor Map: you can see
if hyper-threading is enabled and how the logical processors are mapped to the
physical one
·
Logical Processor to Socket Map
·
Logical Processor to NUMA Node Map
We are taking as an example a host which has the following configuration:
- · 4 NUMA nodes
- · 8 CPUs per NUMA node
- · 80 GB Ram per NUMA node
Scenario 1
If we create a VM with 8 cores and 80 GM of RAM the resources
in green will be used by the VM:
Scenario 2
If we modify the VM and instead of 80 GB we assign 81 GB of
RAM(anything between 80 and 160 GB), the NUMA on the VM will see completely different.
The resources used are going to look like as follows:
Note that actually not the full memory is going to be used.
Out of the 160 GB are going to be used only 81 GB. Also take into account that
the memory is distributed over the NUMA nodes.
The setup is still ok, but it would be better to use the previous
one. Parallel queries should use maximum 4 cores (according to rule 2, MAXDOP
should be set to maximum 4) instead of 8.
Scenario 3
We are discussing now about 16 CPUs 160 GB of RAM
An interesting fact occurs after increasing the RAM over 160
GB but still equal or less-than 240 GB. You cannot evenly distribute 16 (CPUs) across
3 (NUMA nodes).
We end up having 5 or 6 CPUs per NUMA node. In this setup,
none of the 2 rules are validated. What MAXDOP value should you use? (Probably
1 – disabling parallelism).
From my experience, multiple performance issues can occur (some
of them cannot be identified as pointing to this issue). Some examples:
- · Uneven distribution of workload across schedulers (CPUs)
- · Deadlocks
- · Parallel queries run longer
A more particular scenario I faced (I cannot generalize this)
was that the DBCC CHECKDB brought all CPUs (limited to the MAXDOP value) constantly
to 100% , despite the fact that no other workload was on the VM. A more in
depth investigation, revealed that 40% of the CPU usage was used by spinlocks
while allocating memory (this spinlock could not be found in the undocumented dynamic
management view sys.dm_os_spinlock_stats). After the evenly distributing the CPUs
across the NUMA nodes, the CPU was no longer at 100%. Actually it had spikes
not reaching more than 70% on each core.
Takeaway: When running VMs, ensure that their NUMA configuration is aligned with the ones of the host.
1 Source Wikipedia
No comments:
Post a Comment