Sunday, November 5, 2017

LOAD_BALANCE OFF Still checks other ADDRESS list. DB Connection consideration.


This note highlight performance penalty that can comes with using database connection with multiple address, contradicting what oracle document says, at least at "surface level".
this is worth considering when setting up application connection in DR/dataguard consideration

even when the first address is available i.e. DB is active, you notice stark difference in terms of percentage of time for connection. in this test, there was more than 6 times difference
between using one ADDRESS and two ADDRESS

1. Two TNS entry are created to test. TNS Client version 11.2.0.1.0
first TNS entry use one ADDRESS ,
second have two entry for ADDRESS.

IBA11DB=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=first-dr-scan)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=IBA11DB)))
Second TNS entry with two ADDRESS
IBA11DB_W=(DESCRIPTION_LIST=(LOAD_BALANCE=off)(FAILOVER=on)  (DESCRIPTION=(CONNECT_TIMEOUT=10)(RETRY_COUNT=1)(ADDRESS_LIST=(LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=first-dr-scan.labcenter.priv)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=IBA11DB))) (DESCRIPTION=(CONNECT_TIMEOUT=10)(RETRY_COUNT=1)(ADDRESS_LIST=LOAD_BALANCE=on)(ADDRESS=(PROTOCOL=TCP)(HOST=secon-dr-scan.labcenter.priv)(PORT=1521)))(CONNECT_DATA=SERVICE_NAME=IBA11DB))))



2.  exit3.sql contain "exit". this is to simulate logon and logout. this approach is use for simplicity.
strace -fc -o test_IBA11DB_W.strace sqlplus system/xxxxxxxxx@IBA11DB_W  @exit3.sql
strace -fc -o test_IBA11DB.strace sqlplus system/xxxxxxxxx@IBA11DB  @exit3.sql
see result below.
comparing the time of connection (+ disconnection). the first connection took 175 microseconds, second with two ADDRESS took 1176 microseconds, almost 7 times slower than the IBA11DB.


3. checking the detail of strace, there are calls on the second ADDRESS.
sendto transmitted message to secon-dr-scan (second ADDRESS)
connect - initiated connection to the second IP ADDRESS
recvfrom() used to to receive messages from a socket

$cat test__verbose_IBA11DB_W.strace

21857 sendto(9, "\303\246\1\0\0\1\0\0\0\0\0\0\Rsecon-dr-scan\titsmt"..., 46, MSG_NOSIGNAL, NULL, 0) = 46
21857 poll([{fd=9, events=POLLIN}], 1, 5000) = 1 ([{fd=9, revents=POLLIN}])
21857 ioctl(9, FIONREAD, [185])         = 0
21857 recvfrom(9, "\303\246\205\200\0\1\0\3\0\2\0\2\Rsecon-dr-scan\titsmt"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.2.27.1")}, [16]) = 185
21857 poll([{fd=9, events=POLLOUT}], 1, 4999) = 1 ([{fd=9, revents=POLLOUT}])
21857 sendto(9, "R\317\1\0\0\1\0\0\0\0\0\0\Rsecon-dr-scan\titsmt"..., 46, MSG_NOSIGNAL, NULL, 0) = 46
21857 poll([{fd=9, events=POLLIN}], 1, 4999) = 1 ([{fd=9, revents=POLLIN}])
21857 ioctl(9, FIONREAD, [104])         = 0
21857 recvfrom(9, "R\317\205\200\0\1\0\0\0\1\0\0\Rsecon-dr-scan\titsmt"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.2.27.1")}, [16]) = 104
….


even thou the first ADDRESS in the ADDRESS_LIST is same IBA11DB and LOAD_BALANCE=off, according to oracle doc it should check the first address before trying the connecting to next ADDRESS.

But as show in the strace, oracle issues system call such as "connect", probing the second ADDRESS, even when the first is available.



/TNS_TEST>>cat test_IBA11DB.strace
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.71    0.000129           1       162         2 read
 14.29    0.000025           6         4           recvfrom
 12.00    0.000021           0       106        19 open
  0.00    0.000000           0        32           write
  0.00    0.000000           0        97           close
  0.00    0.000000           0         7         3 stat
  0.00    0.000000           0        69           fstat
  0.00    0.000000           0        10           poll
  0.00    0.000000           0        42           lseek
  0.00    0.000000           0        99           mmap
  0.00    0.000000           0        25           mprotect
  0.00    0.000000           0        54           munmap
  0.00    0.000000           0        15           brk
  0.00    0.000000           0        42           rt_sigaction
  0.00    0.000000           0         5           rt_sigprocmask
  0.00    0.000000           0         5           ioctl
  0.00    0.000000           0        17        10 access
  0.00    0.000000           0        13         1 socket
  0.00    0.000000           0        18         6 connect
  0.00    0.000000           0         6           sendto
  0.00    0.000000           0         4           recvmsg
  0.00    0.000000           0         2           bind
  0.00    0.000000           0        10           getsockname
  0.00    0.000000           0         6           setsockopt
  0.00    0.000000           0         6           getsockopt
  0.00    0.000000           0         1           execve
  0.00    0.000000           0        14           uname
  0.00    0.000000           0        19           fcntl
  0.00    0.000000           0         2           getdents
  0.00    0.000000           0         6           getcwd
  0.00    0.000000           0         1           readlink
  0.00    0.000000           0         5           getrlimit
  0.00    0.000000           0         4           times
  0.00    0.000000           0        10           getuid
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         2           statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         2           setrlimit
  0.00    0.000000           0         5         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.000175                   931        42 total


TNS_TEST>>cat test_IBA11DB_W.strace
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.84    0.001033           6       165         2 read
  3.23    0.000038           0       110        19 open
  3.15    0.000037           0       103           mmap
  1.62    0.000019           1        34           write
  1.53    0.000018           3         7         3 stat
  1.36    0.000016           1        20           times
  1.28    0.000015           0        42           rt_sigaction
  0.00    0.000000           0       101           close
  0.00    0.000000           0        73           fstat
  0.00    0.000000           0        15           poll
  0.00    0.000000           0        42           lseek
  0.00    0.000000           0        25           mprotect
  0.00    0.000000           0        58           munmap
  0.00    0.000000           0        14           brk
  0.00    0.000000           0         5           rt_sigprocmask
  0.00    0.000000           0         5           ioctl
  0.00    0.000000           0        17        10 access
  0.00    0.000000           0        13         1 socket
  0.00    0.000000           0        18         6 connect
  0.00    0.000000           0         6           sendto
  0.00    0.000000           0         4           recvfrom
  0.00    0.000000           0         4           recvmsg
  0.00    0.000000           0         2           bind
  0.00    0.000000           0        11           getsockname
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         8         4 getsockopt
  0.00    0.000000           0         1           execve
  0.00    0.000000           0        22           uname
  0.00    0.000000           0        15           fcntl
  0.00    0.000000           0         2           getdents
  0.00    0.000000           0         6           getcwd
  0.00    0.000000           0         1           readlink
  0.00    0.000000           0         5           getrlimit
  0.00    0.000000           0        14           getuid
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         2           statfs
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         2           setrlimit
  0.00    0.000000           0         5         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.001176                   983        46 total

Summary: if application is had issue with connection to database, using two “ADDRESS” will even make it worst. Of cos, better design such as connection pooling can be deployed to alleviate this overhead. The degree of difference depends network latency between  ADDRESS and the client.

Friday, November 3, 2017

Fixed for Materialized View slow refresh or hangs (1)


Fixed for  Materialized View slow refresh or hangs.

 I did a performance tuning on a database where the application developer, literally are solving all performance problems with materialized views!?.
redo log switch occurs 300 times within an hour because 5 minute interval of COMPLETE refresh.
FAST refresh was not use because its hangs. the following highlights how issue was resolved MV slow refresh or hangs.

Lock MV log statistics immediately after creation and  alter system set "_mv_refresh_use_stats"=FALSE;
See Doc ID 420040.1
  1. this is what Explain plan of materialized view refresh look like this before above recommendation was applied.
--------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | 4948M(100)| | | |
| 1 | LOAD TABLE CONVENTIONAL | MVIEW_EVOLUTION_SUMMARY_DATA | | | | | | | |
| 2 | FILTER | | | | | | | | |
| 3 | NESTED LOOPS OUTER | | 1352K| 167M| | 78593 (1)| 00:00:04 | | |
| 4 | NESTED LOOPS OUTER | | 1352K| 136M| | 78591 (1)| 00:00:04 | | |
| 5 | NESTED LOOPS OUTER | | 1352K| 107M| | 78590 (1)| 00:00:04 | | |
| 6 | HASH JOIN RIGHT OUTER | | 1352K| 77M| | 78589 (1)| 00:00:04 | | |
| 7 | VIEW | | 12840 | 426K| | 70990 (1)| 00:00:03 | | |
| 8 | HASH JOIN SEMI | | 12840 | 2156K| 293M| 70982 (1)| 00:00:03 | | |
| 9 | TABLE ACCESS FULL | APP_INCREA_EVENT | 6679K| 216M| | 52261 (1)| 00:00:03 | | |
| 10 | TABLE ACCESS FULL | MLOG$_APP_INCREA_EVENT | 20545 | 2768K| | 4028 (1)| 00:00:01 | | |
| 11 | TABLE ACCESS FULL | APP_TRANSACTION | 1352K| 33M| | 7595 (1)| 00:00:01 | | |
| 12 | TABLE ACCESS BY GLOBAL INDEX ROWID| APP_INCOMES_QUALIFICATION | 1 | 23 | | 1 (0)| 00:00:01 | ROWID | ROWID |
| 13 | INDEX UNIQUE SCAN | APP_INCOMES_QUALIFICATION_PK | 1 | | | 0 (0)| | | |
| 14 | TABLE ACCESS BY GLOBAL INDEX ROWID | APP_INCOMES_QUALIFICATION | 1 | 23 | | 1 (0)| 00:00:01 | ROWID | ROWID |
| 15 | INDEX UNIQUE SCAN | APP_INCOMES_QUALIFICATION_PK | 1 | | | 0 (0)| | | |
| 16 | TABLE ACCESS BY GLOBAL INDEX ROWID | APP_INCOMES_CUSTOMER | 1 | 24 | | 2 (0)| 00:00:01 | ROWID | ROWID |
| 17 | INDEX UNIQUE SCAN | APP_INCOMES_CUSTOMER_PK | 1 | | | 1 (0)| 00:00:01 | | |
| 18 | FILTER | | | | | | | | |
| 19 | MAT_VIEW ACCESS FULL | MVIEW_EVOLUTION_SUMMARY_DATA | 1 | 10 | | 3663 (1)| 00:00:01 | | |
| 20 | FILTER | | | | | | | | |
| 21 | NESTED LOOPS | | 1 | 24 | | 3 (0)| 00:00:01 | | |
| 22 | TABLE ACCESS BY USER ROWID | APP_TRANSACTION | 1 | 18 | | 1 (0)| 00:00:01 | | |
| 23 | INDEX RANGE SCAN | INCREA_EVENT_2_TRANSACTION_IDX | 1 | 6 | | 2 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)

  1. if MV log exist, drop and recreate.
  1.   drop materialized view log on crs_change_event;
  2.   CREATE MATERIALIZED VIEW LOG ON USERS_APP.APP_INCREA_EVENT TABLESPACE "APP_TBS"
  2. gather statistics on MV LOGS and lock the statisctics.
  1. exec DBMS_STATS.gather_TABLE_STATS('USERS_APP','MLOG$_APP_INCREA_EVENT',force=>true);
  2. exec DBMS_STATS.LOCK_TABLE_STATS('USERS_APP','MLOG$_APP_INCREA_EVENT');

3. alter system
  1.  alter system set "_mv_refresh_use_stats"=FALSE;

4. create MV
  1. CREATE MATERIALIZED VIEW "USERS_APP"."MVIEW_EVOLUTION_SUMMARY_DATA".....

After the remommedation. you can see a signficant improvement in the Cost section of both plans. and it using an hash join instead of a nested loops.

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | | 100K(100)| | | |
| 1 | LOAD TABLE CONVENTIONAL | MVIEW_EVOLUTION_SUMMARY_DATA | | | | | | | |
| 2 | HASH JOIN RIGHT OUTER | | 6532K| 809M| | 100K (1)| 00:00:04 | | |
| 3 | PARTITION LIST ALL | | 44179 | 992K| | 248 (1)| 00:00:01 | 1 | 3 |
| 4 | TABLE ACCESS FULL | APP_INCOMES_QUALIFICATION | 44179 | 992K| | 248 (1)| 00:00:01 | 1 | 3 |
| 5 | HASH JOIN RIGHT OUTER | | 6532K| 666M| | 100K (1)| 00:00:04 | | |
| 6 | PARTITION LIST ALL | | 44179 | 992K| | 248 (1)| 00:00:01 | 1 | 3 |
| 7 | TABLE ACCESS FULL | APP_INCOMES_QUALIFICATION | 44179 | 992K| | 248 (1)| 00:00:01 | 1 | 3 |
| 8 | HASH JOIN RIGHT OUTER| | 6532K| 523M| 220M| 100K (1)| 00:00:04 | | |
| 9 | PARTITION LIST ALL | | 6426K| 147M| | 33553 (1)| 00:00:02 | 1 | 3 |
| 10 | TABLE ACCESS FULL | APP_INCOMES_CUSTOMER | 6426K| 147M| | 33553 (1)| 00:00:02 | 1 | 3 |
| 11 | HASH JOIN OUTER | | 6532K| 373M| 49M| 33239 (1)| 00:00:02 | | |
| 12 | TABLE ACCESS FULL | APP_TRANSACTION | 1352K| 33M| | 2065 (1)| 00:00:01 | | |
| 13 | TABLE ACCESS FULL | APP_INCREA_EVENT | 6679K| 216M| | 14184 (1)| 00:00:01 | | |
----------------------------------------------------------------------------------------------------------------------------------


5.  Query dba_mviews to check the staleness.
  1. select a.LAST_REFRESH_DATE,a.staleness,a.after_fast_refresh,a.MVIEW_NAME from dba_mviews a where owner='USERS_APP'

  1. LAST_REFRESH_DATE   STALENESS AFTER_FAST_REFRESH  MVIEW_NAME
  2. ------------------- ------------------- ------------------- --------------------------------------------------------------------------------------------------------------------------------
  3. 2017-08-26 13:37:37 FRESH FRESH    MVIEW_EVOLUTION_SUMMARY_DATA 
  4.   
  5.   


Version: 12.1


Tuesday, October 24, 2017

MANUALLY CREATE HISTOGRAM WITHOUT WORKLOAD OR COLUMN USAGE INFORMATION IN SYS.COL_USAGE$

MANUALLY CREATE HISTOGRAM WITHOUT WORKLOAD OR COLUMN USAGE INFORMATION IN SYS.COL_USAGE$


For skew data, histogram provides better statistics based on data distribution to help optimizer "to generate accurate cardinality estimates for filter and join predicates that involve these columns."
see oracle doc:

When you cannot wait for SQL workload to be used to populate filter and join predicates information in COL_USAGE$.
After identifying the need to create histogram(probably after you created extended statistics/group column), the test method provides how to create histogram with workload been executed.



--creating test objects

  1. SH: IBA> create table test_histogram as select * from dba_objects;

  2. Table created.


--simulating previous statistic gathering before histogram
  1. SH: IBA> exec dbms_stats.gather_table_stats(null,'TEST_HISTOGRAM');

  2. PL/SQL procedure successfully completed.
--Query column histogram information. 
  1. SH: IBA> SELECT COLUMN_NAME ,NUM_DISTINCT,NUM_NULLS,HISTOGRAM FROM USER_TAB_COL_STATISTICS WHERE TABLE_name='TEST_HISTOGRAM';
  1. COLUMN_NAME       NUM_DISTINCT  NUM_NULLS HISTOGRAM
  2. ------------------------------ ------------ ---------- ---------------
  3. OWNER 30     0 NONE
  4. OBJECT_NAME      46108     0 NONE
  5. SUBOBJECT_NAME 162 74732 NONE
  6. OBJECT_ID      75273     0 NONE
  7. DATA_OBJECT_ID       9540 65663 NONE
  8. OBJECT_TYPE 45     0 NONE
  9. CREATED 872     0 NONE
  10. LAST_DDL_TIME 955     0 NONE
  11. TIMESTAMP       1007     0 NONE
  12. STATUS  2     0 NONE
  13. TEMPORARY  2     0 NONE

  14. COLUMN_NAME       NUM_DISTINCT  NUM_NULLS HISTOGRAM
  15. ------------------------------ ------------ ---------- ---------------
  16. GENERATED  2     0 NONE
  17. SECONDARY  2     0 NONE
  18. NAMESPACE 21     0 NONE
  19. EDITION_NAME  0 75273 NONE

  20. 15 rows selected.

  21. SH: IBA>



--Above show histogram not created.
--Setting preference for dbms_stats

  1. SH: IBA>  exec dbms_stats.set_table_prefs('SH', 'TEST_HISTOGRAM','METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 OBJECT_TYPE');

  2. PL/SQL procedure successfully completed.
--gather stat.
  1. SH: IBA> exec dbms_stats.gather_table_stats(null,'TEST_HISTOGRAM');

  2. PL/SQL procedure successfully completed.

  3. SH: IBA>  SELECT COLUMN_NAME ,NUM_DISTINCT,NUM_NULLS,HISTOGRAM FROM USER_TAB_COL_STATISTICS WHERE TABLE_name='TEST_HISTOGRAM';

  4. COLUMN_NAME       NUM_DISTINCT  NUM_NULLS HISTOGRAM
  5. ------------------------------ ------------ ---------- ---------------
  6. OWNER 30     0 NONE
  7. OBJECT_NAME      46108     0 NONE
  8. SUBOBJECT_NAME 162 74732 NONE
  9. OBJECT_ID      75273     0 NONE
  10. DATA_OBJECT_ID       9540 65663 NONE
  11. OBJECT_TYPE 45     0 FREQUENCY
  12. CREATED 872     0 NONE
  13. LAST_DDL_TIME 955     0 NONE
  14. TIMESTAMP       1007     0 NONE
  15. STATUS  2     0 NONE
  16. TEMPORARY  2     0 NONE
  17. GENERATED  2     0 NONE
  18. SECONDARY  2     0 NONE
  19. NAMESPACE 21     0 NONE
  20. EDITION_NAME  0 75273 NONE

  21. 15 rows selected.

  22. SH: IBA>

--this histogram, will always be created when you executed dbms_stats accordingly.

--below show that after workload, database register predicates usage and create histogram for other columns because of "FOR ALL COLUMNS SIZE AUTO"

--run some workload.
  1. SH: IBA> select OWNER,SUBOBJECT_NAME from TEST_HISTOGRAM where OBJECT_NAME='TEST_HISTOGRAM' and created between sysdate and sysdate -1;

  2. no rows selected
--gather stats, this create new histogram basic on COL_USAGE.
  1. SH: IBA> exec dbms_stats.gather_table_stats(null,'TEST_HISTOGRAM');

  2. PL/SQL procedure successfully completed.

  3. SH: IBA> SELECT COLUMN_NAME ,NUM_DISTINCT,NUM_NULLS,HISTOGRAM FROM USER_TAB_COL_STATISTICS WHERE TABLE_name='TEST_HISTOGRAM';

  4. COLUMN_NAME       NUM_DISTINCT  NUM_NULLS HISTOGRAM
  5. ------------------------------ ------------ ---------- ---------------
  6. OWNER 30     0 NONE
  7. OBJECT_NAME      46108     0 NONE
  8. SUBOBJECT_NAME 162 74732 NONE
  9. OBJECT_ID      75273     0 NONE
  10. DATA_OBJECT_ID       9540 65663 NONE
  11. OBJECT_TYPE 45     0 FREQUENCY
  12. CREATED 872     0 HEIGHT BALANCED
  13. LAST_DDL_TIME 955     0 NONE
  14. TIMESTAMP       1007     0 NONE
  15. STATUS  2     0 NONE
  16. TEMPORARY  2     0 NONE
  17. GENERATED  2     0 NONE
  18. SECONDARY  2     0 NONE
  19. NAMESPACE 21     0 NONE
  20. EDITION_NAME  0 75273 NONE

  21. 15 rows selected.

Hope you save some time ;)


Saturday, September 23, 2017

Oracle IO, Storage and CPU Limits: Knowing Your Environment Limits.


I did some test related Oracle IO, Storage I/O and CPU limits.
Aiming to understanding environment limits, potential bottlenecks and performance characteristics and capacity, I did some test related to Oracle IO, Storage I/O and CPU limits. Enjoy and leave comments.
In this PART 1, I will try to find out the limit of single core to driving I/O using dd utility. this will give an idea of Core and I/O limit.
Why is this important?  such knowledge of environment can be use in deployment planning. RDBMS related test results will be shared in PART 2 using SLOB to generate I/O workload.
Why not just trust OEM datasheet?  because each environment differs in components and configurations, bottleneck can exist anywhere. Btt is a useful tool to track where you have bottleneck between OS and storage stack.
The knowledge of the “limit values” will improve accuracy of decision making especially in tuning real database workload, application design and capacity planning.

TEST1:
what is the limit of linux OS core driving IO. I used a simple DD utility to generate workload. this should give upper limited of CPU core IO processing power.
I can safely assume that single core on this system will not drive IO more than 300MB/s (at least for dataset greater than 4GB).
I used tasket to enforce particular CPU core to used, but I did not go nuclear to make sure I isolate IRQ or other users, for simplicity i assumed that IRQs and other users interference will be minimal. You can isolate CPU using tuna or boot parameter isolcpus - remove core from kernel scheduler and  nohz_full - disable tick on cpu core.
Trying to ignore OS buffer, I use iflag=direct ensuring direct I/O for LUN read.
Another caution: other process might be accessing the LUN path. I am accepting this noise in the results posted here as negligible. this is an existing LUN use by ASM. so, I can only do read IO with dd utility.


Result of TEST1, using single core to drive IO
[grid@ServerE-54640 ~]$ taskset -c 77 dd if=/dev/dm-71 of=/dev/null iflag=direct bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.3439 s, 292 MB/s


avg cpu is not here since we are running test on particular core(s).
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-71              0.00     0.00    1.00    4.00     0.02     0.04    20.80     0.00    0.60    1.00    0.50   0.60   0.30
dm-71              0.00     0.00  571.00    3.00   140.17     0.03   500.24     0.71    1.23    1.23    0.67   0.34  19.50
dm-71              0.00     0.00 1829.00    7.00   456.08     0.09   508.85     3.08    1.68    1.69    0.29   0.53  97.30
dm-71              0.00     0.00 1781.00    7.00   445.02     0.04   509.77     3.45    1.93    1.93    0.43   0.55  98.50
dm-71              0.00     0.00 2201.00    4.00   547.91     0.03   508.92     3.12    1.41    1.42    0.50   0.44  97.90
dm-71              0.00     0.00  821.00    7.00   203.08     0.08   502.51     2.39    2.86    2.88    0.71   1.20  99.20
dm-71              0.00     0.00  829.00    5.00   206.78     0.03   507.86     2.35    2.78    2.80    0.60   1.19  99.60
dm-71              0.00     0.00 1205.00    4.00   298.67     0.03   505.99     3.55    2.97    2.98    0.75   0.81  98.20
dm-71              0.00     0.00 1049.00    9.00   261.08     0.08   505.52     3.86    3.67    3.69    0.78   0.93  98.80
dm-71              0.00     0.00  835.00    4.00   208.52     0.05   509.10     2.50    2.94    2.95    0.50   1.18  99.00
dm-71              0.00     0.00  890.00    5.00   219.69     0.04   502.79     2.86    3.22    3.23    0.40   1.10  98.70
dm-71              0.00     0.00  971.00    6.00   242.05     0.07   507.53     3.80    3.80    3.83    0.33   1.01  98.60
dm-71              0.00     0.00  806.00    4.00   200.80     0.05   507.81     2.98    3.80    3.81    0.50   1.23  99.50
dm-71              0.00     0.00  732.00    4.00   179.48     0.03   499.53     2.43    3.30    3.31    1.00   1.34  98.70
dm-71              0.00     0.00  733.00    8.00   182.08     0.08   503.46     2.16    2.91    2.94    0.38   1.33  98.50
dm-71              0.00     0.00  698.00    7.00   174.03     0.05   505.70     3.26    4.63    4.67    0.57   1.41  99.40
dm-71              0.00     0.00  153.00    3.00    36.14     0.03   474.88     0.43    2.79    2.84    0.00   0.92  14.40
dm-71              0.00     0.00    6.00    6.00     0.09     0.08    29.50     0.01    0.42    0.33    0.50   0.42   0.50


Do not trust "svctm  %util" values in this case, the information is not reliable. %util of 99 in this case does not show that the storage is saturated because iostat do not handle parallelism well.
LUN attached to this server are RAID5 with SSD disks and storage cache, so they scale well compare to single disks.


TEST2:
Here is a simple prove of storage is not saturated. the next result shows output of iostat when two processes simultaneously executed dd using core 77 and 78.
as you can see below the read rMB/s almost doubled. we still have 99% in %util column.
r_await values are very similar in comparison to a single core. r_await is more accurate. r_await = (current total read time - previous total read time)/current total read time


DD output using parallelly two core IO.
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.396 s, 291 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.3969 s, 291 MB/s


Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-71              0.00     2.00    5.00    8.00     0.08     1.18   198.69     0.01    0.62    0.20    0.88   0.54   0.70
dm-71              0.00     0.00 2500.00    3.00   623.12     0.03   509.88     4.63    1.85    1.85    0.67   0.27  67.60
dm-71              0.00     1.00 2855.00    7.00   713.05     0.36   510.50     7.45    2.51    2.51    0.71   0.35  99.80
dm-71              0.00     1.00 3321.00   14.00   829.08     0.93   509.70     7.34    2.28    2.29    1.21   0.30  99.60
dm-71              0.00     4.00 3719.00   10.00   926.70     2.15   510.13     6.63    1.77    1.77    1.70   0.27  99.80
dm-71              0.00     0.00 1105.00    8.00   275.55     0.11   507.23     4.55    3.99    4.01    0.88   0.90 100.00
dm-71              0.00     0.00 1674.26    3.96   417.87     0.05   510.00     5.08    3.08    3.09    0.25   0.59  98.91
dm-71              0.00     0.00 2111.00    4.00   525.17     0.03   508.57     6.57    3.08    3.09    1.00   0.47  99.90
dm-71              0.00     0.00 2598.00    5.00   649.03     0.06   510.70     7.59    2.95    2.95    0.40   0.38  99.90
dm-71              0.00     0.00 1709.00    4.00   426.08     0.05   509.46     5.55    3.20    3.21    0.50   0.58  99.80
dm-71              0.00     0.00 1602.00    3.00   398.16     0.03   508.09     5.90    3.72    3.72    1.00   0.62  99.80
dm-71              0.00     0.00 1963.00    6.00   490.05     0.06   509.78     7.50    3.81    3.82    0.50   0.51  99.90
dm-71              0.00     0.00 1885.00    8.00   470.08     0.07   508.64     6.93    3.64    3.66    0.38   0.53  99.60
dm-71              0.00     0.00 2135.00    3.00   530.94     0.03   508.62     5.47    2.58    2.58    0.33   0.47  99.80
dm-71              0.00     0.00 1614.00    7.00   403.27     0.08   509.59     5.07    3.12    3.14    0.71   0.62 100.10
dm-71              0.00     0.00 1280.00    5.00   319.06     0.05   508.59     5.52    4.31    4.32    0.20   0.56  71.90
dm-71              0.00     0.00   11.00    3.00     0.17     0.03    29.86     0.01    0.36    0.36    0.33   0.36   0.50


OTHER important result from TEST1:
what is the core 77 utilization when reading IO at 292 MB/s. Checked mpstat output, it is observed that core varies between 98-100%.


01:30:11 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
...
01:26:12 PM   77    1.02    0.00    2.04   96.94    0.00    0.00    0.00    0.00    0.00
01:26:13 PM   77    1.02    0.00    2.04   96.94    0.00    0.00    0.00    0.00    0.00
01:26:14 PM   77    1.02    0.00    0.00    3.06    0.00    0.00    0.00    0.00   95.92
01:26:15 PM   77    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   98.00
01:26:16 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:17 PM   77    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:26:18 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:19 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:20 PM   77    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   98.00
01:26:21 PM   77    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
01:26:22 PM   77    1.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.98
01:26:23 PM   77    1.01    0.00    1.01    0.00    0.00    0.00    0.00    0.00   97.98
01:26:24 PM   77    1.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.98
01:26:25 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:26 PM   77    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   98.00
01:26:27 PM   77    1.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.98
01:26:28 PM   77    2.04    0.00    1.02    0.00    0.00    0.00    0.00    0.00   96.94
01:26:29 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:30 PM   77    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   98.00
01:26:31 PM   77    1.01    0.00    1.01    0.00    0.00    0.00    0.00    0.00   97.98
01:26:32 PM   77    1.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.98
01:26:33 PM   77    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
01:26:34 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:35 PM   77    1.01    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.99
01:26:36 PM   77    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00


TEST 3:
How many cores can drive the IO to the limit with acceptable latency and efficacy.
here are the test results: I stop at four cores because at this level r_wait reaching 10ms. see graph below.
Note at 3 cores,  MB/s in dd output shows 295MB/s and at 4 core, it drop drastically to 255MB/s per core., this is significant increment in queue time.


Trade Off between Latency and Throughput:
It’s interesting that throughput is still higher for 4 cores (1020MB/s, 135MB/s higher than 3 cores) but latency start to reach 10ms. IO should be measured considering throughput but with latency.
if your SLA says 30ms is acceptance, then you should be able to increase the level of parallelism. another versatile utility to measure I/O latency is flexible I/O (see: https://github.com/axboe/fio)


Results:
1. using 1 core
$ taskset -c 77 dd if=/dev/dm-71 of=/dev/null iflag=direct bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.3439 s, 292 MB/s


2. using 2 core
$ 4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.396 s, 291 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.3969 s, 291 MB/s


3. using 3 core:
$ 4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.2124 s, 295 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.2147 s, 295 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 14.2163 s, 295 MB/s


4. using 4 core:
$ 4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 16.425 s, 255 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 16.4266 s, 255 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 16.428 s, 255 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 16.429 s, 255 MB/s


See below graph plotting rMB/s and r_wait:



Part II will contain discussion about result of oracle workload using SLOB.