Saturday, April 7, 2018

NUMA Server Soft Partitioning Using CGROUP and PROCESSOR_GROUP_NAME in Red Hat 7

It now more common to see server with terabytes of memory and hundreds of CPU cores. this more a revised topic.
Imagine a situation: a client has a NUMA server, for CONF, SAT and DEV environment.
why not instance caging?
1.  cpu_count is only for CPU.
2. memory can be allocated from any NUMA node, this can cause numa_miss, increase latency between memory and CPU.
3. there is no restriction on set of CPU core that DBs can used. so, there is a possibility of increase context switch on CPU cores.
Here is a NUMA server memory distribution which hosts up to 100 oracle databases.

Numa_miss can be avoided using PROCESSOR_GROUP_NAME
how to use PROCESSOR_GROUP_NAME. this is based on the concept of cgroup(linux)/resource group(solaris).
cgroup as defined by Linux kernel documentation, associates a set of tasks with a set of parameters for one or more subsystems.
cgroup allow basically larger server to be soft partitioned. there other cool stuff apart from CPU and memory allocation you can do with this, and it easy.
Here is a simple config to show this concept.
vi /usr/lib/systemd/system/ora_sat.service
[Unit]
Description=test PROCESSOR_GROUP_NAME
After=syslog.target network.target auditd.service
 [Service]
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/ora_sat
ExecStartPre=/bin/bash -c '/bin/chown -R oracle:dba /sys/fs/cgroup/cpuset/ora_sat '
ExecStartPre=/bin/bash -c '/usr/bin/echo "0" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.mems'
ExecStart=/bin/bash -c '/usr/bin/echo "1" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus'
Restart=on-failure
RemainAfterExit=yes
 [Install]
WantedBy=multi-user.target
--
 #systemctl start ora_sat.service
#systemctl enable ora_sat.service
Do find and replace for "ora_sat" and create ora_dev and ora_conf or other names/environment.
Since just a PoC, I am allocating one core. I will recommend allocating all CPU core in one NUMA node. Avoid cross CPU core allocation. Below numactl  -H, gives good visual overview.

To allocate all CPU on node 1:
ExecStartPre=/bin/bash -c '/usr/bin/echo "0-9,40-49" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.mems'

Enabling at Oracle DB level.
SQL> alter system set processor_group_name='ora_sat' scope=spfile;
SQL> show parameter processor_group_name
NAME                                                        TYPE VALUE
------------------------------------ ----------- ------------------------------
processor_group_name                     string               ora_sat
SQL>
SQL> show parameter cpu_count
NAME                                                        TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count                                               integer            1
SQL>
checking status of service
-bash-4.2# systemctl status ora_sat.service
● ora_sat.service - test PROCESSOR_GROUP_NAME
   Loaded: loaded (/usr/lib/systemd/system/ora_sat.service; disabled; vendor preset: disabled)
   Active: active (exited) since Sun 2018-04-01 17:24:37 EDT; 1h 35min ago
  Process: 14792 ExecStart=/bin/bash -c /usr/bin/echo "1" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus (code=exited, status=0/SUCCESS)
  Process: 14788 ExecStartPre=/bin/bash -c /usr/bin/echo "0" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.mems (code=exited, status=0/SUCCESS)
  Process: 14785 ExecStartPre=/bin/bash -c /bin/chown -R oracle:dba /sys/fs/cgroup/cpuset/ora_sat  (code=exited, status=0/SUCCESS)
  Process: 14782 ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/ora_sat (code=exited, status=0/SUCCESS)
 Main PID: 14792 (code=exited, status=0/SUCCESS)
   Memory: 0B
   CGroup: /system.slice/ora_sat.service
 Apr 01 17:24:36 lab-server.teits.net systemd[1]: Starting test PROCESSOR_GROUP_NAME...
Apr 01 17:24:37 lab-server.teits.net systemd[1]: Started test PROCESSOR_GROUP_NAME.
-bash-4.2#
Ensure persistence after reboot
-bash-4.2# systemctl enable ora_sat.serviceCreated symlink from /etc/systemd/system/multi-user.target.wants/ora_sat.service to /usr/lib/systemd/system/ora_sat.service.-bash-4.2#
Caution: If you try modifying beyond allow CPU:
SQL> alter system set cpu_count=2;alter system set cpu_count=2*ERROR at line 1:ORA-02097: parameter cannot be modified because specified value is invalidORA-02097: parameter cannot be modified because specified value is invalidSQL>
Note: required to set cpuset.mems, if not result to below error
SQL> startup
ORA-56729: Failed to bind the database instance to processor group ora_sat;
 Additional Information: cpuset.mems is not set
 Location: chkcpuset:3
ORA-01078: failure in processing system parameters
SQL> exit
Disconnected
lab-server[orcl](/usr/share/doc/kernel-doc-3.10.0)$

Dynamically allocate more resource:(this for testing purpose consider the implications).
 [oracle@lab-server ~]$ cat /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus1[oracle@lab-server ~]$  echo "0,1" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus[oracle@lab-server ~]$ cat /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus0-1[oracle@lab-server ~]$ 
...
SQL> alter system set cpu_count=2;System altered.SQL>
Decrease or increase to a lower value of CPU using cgroup from OS. I created workload on database using SLOB. during the period change number of CPU from (1)50% to 100%(2). see the idle time zero-out
07:16:18 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle07:16:19 PM  all   27.27    0.00    9.09    0.00    0.00    1.52    0.00    0.00    0.00   62.1207:16:19 PM    0    1.14    0.00    4.55    0.00    0.00    1.14    0.00    0.00    0.00   93.1807:16:19 PM    1   82.93    0.00   17.07    0.00    0.00    0.00    0.00    0.00    0.00    0.00 07:16:19 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle07:16:20 PM  all   47.27    0.00   11.82    0.00    0.00    1.82    0.00    0.00    0.00   39.0907:16:20 PM    0   22.39    0.00   11.94    0.00    0.00    1.49    0.00    0.00    0.00   64.1807:16:20 PM    1   86.05    0.00   11.63    0.00    0.00    2.33    0.00    0.00    0.00    0.00======>>>>>>>>>>>> echo "0,1" > /sys/fs/cgroup/cpuset/ora_sat/cpuset.cpus07:16:20 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle07:16:21 PM  all   92.13    0.00    7.87    0.00    0.00    0.00    0.00    0.00    0.00    0.0007:16:21 PM    0   91.11    0.00    8.89    0.00    0.00    0.00    0.00    0.00    0.00    0.0007:16:21 PM    1   93.33    0.00    6.67    0.00    0.00    0.00    0.00    0.00    0.00    0.00 07:16:21 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle07:16:22 PM  all   92.86    0.00    5.95    0.00    0.00    1.19    0.00    0.00    0.00    0.0007:16:22 PM    0   92.86    0.00    4.76    0.00    0.00    2.38    0.00    0.00    0.00    0.0007:16:22 PM    1   90.48    0.00    7.14    0.00    0.00    2.38    0.00    0.00    0.00    0.00

Conclusion:
Control Groups has less overhead and easy to create comparing to other isolation method e.g. virtual machines.
Cgroup can also do other cool stuff apart from resource limitation such as prioritization, accounting and control.