The basic idea of ASM failgroup is to have diskgroup resilient to storage related hardware component failure (controller, pool, module, disk etc) or even a complete storage node failure by taking advantage of the redundancy at storage level.
for example, if you have two controllers in a storage, two failure group can be created for normal redundancy. when one controller fails, this does not cause downtime at ASM/DB instance level.
if you have two storage nodes (typical for extended/cross data centre RAC), two failure group can hold each storage node/site disks.
there other advantages such as disk site affinity.
The purpose for the note is to describe issue encounter during storage migration from storage vendor A to storage vendor B. This is related to bug 19471572.
What happen:
in other to migrate into a new storage, a simple procedure was used.
1. two failure group already exist
2. add one or two new failure groups from storage B.
3. drop old storage A.
According to oracle documentation "A normal redundancy disk group must contain at least two failure groups." but in this case, ASM refuse to drop failure group when the number of failure group is 3.
Below is test case:
=================================================CHECK DISK STATUS=============================== DISKGROUP_NAME DISK_NAME FAILGROUP MOUNT_S MODE_ST STATE ------------------------------ ------------------------------ ------------------------------ ------- ------- -------- DATAMIRROR DATAMIRROR_0000 NEW_STORAGE2 CACHED ONLINE NORMAL DATAMIRROR DATAMIRROR_0003 OLD_STORAGE CACHED ONLINE NORMAL DATAMIRROR DATAMIRROR_0002 NEW_STORAGE1 CACHED ONLINE NORMAL SQL> alter diskgroup DATAMIRROR drop disks in failgroup OLD_STORAGE; alter diskgroup DATAMIRROR drop disks in failgroup OLD_STORAGE * ERROR at line 1: ORA-15067: command or option incompatible with diskgroup redundancy
"force" option can be used but this put the diskgroup in an inconsistent state.
SQL> alter diskgroup DATAMIRROR drop disks in failgroup OLD_STORAGE force; Diskgroup altered. SQL> @op =================================================CHECK OPERATION STATUS=============================== INST_ID GROUP_NUMBER OPERA PASSSTAT POWER ACTUALSOFAREST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID ---------- ------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- -------------------------------------------- ---------- 1 2 REBAL COMPACTWAIT 6 0 1 2 REBAL REBALANCE WAIT 6 0 1 2 REBAL REBUILDWAIT 6 0 1 2 REBAL RESYNCWAIT 6 0 2 2 REBAL COMPACTWAIT 6 6 0 0 0 0 0 2 2 REBAL REBALANCE WAIT 6 6 0 0 0 0 0 2 2 REBAL REBUILDRUN 6 6 1854 48001 168039 0 0 2 2 REBAL RESYNCDONE 6 6 0 0 0 0 0 8 rows selected.
After the rebalancing, status reported as missing/forcing
=================================================CHECK DISK STATUS=============================== DISKGROUP_NAME DISK_NAME FAILGROUP MOUNT_S MODE_ST STATE ------------------------------ ------------------------------ ------------------------------ ------- ------- -------- DATAMIRROR _DROPPED_0003_DATAMIRROR OLD_STORAGE MISSING OFFLINE FORCING DATAMIRROR DATAMIRROR_0000 NEW_STORAGE2 CACHED ONLINE NORMAL DATAMIRROR DATAMIRROR_0002 NEW_STORAGE1 CACHED ONLINE NORMAL ASMCMD> lsdg State Type Rebal Sector Logical_Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED NORMAL N 512 512 4096 1048576 262144 91352 131072 -19860 1 N DATAMIRROR/ MOUNTED NORMAL N 512 512 4096 1048576 102400 85444 0 42722 0 N xx/ MOUNTED EXTERN N 512 512 4096 1048576 51200 51088 0 51088 0 N xxxx/ MOUNTED NORMAL N 512 512 4096 1048576 104448 103815 0 50901 0 Y xxxxx/ MOUNTED NORMAL N 512 512 4096 1048576 40960 39854 20480 9687 0 N xxxxxxx/ ASMCMD>
Workaround:
the idea to prevent the above situation, is to keep maximum of two failure group at any particular point in time. using the add/drop alter diskgroup statement below:alter diskgroup DATAMIRROR add failgroup new_pool1 disk '/dev/mapper/newdisk_01_p1' drop disks in failgroup P1 ;
alter diskgroup DATAMIRROR add failgroup new_pool2 disk '/dev/mapper/newdisk01_01_p2' drop disks in failgroup P2 ;
MISC:
1. if you already in this situation, you can create a quorum failgroup disk but consider your infrastructure support this properly i.e. consider if you really have third independent redundancy component at storage level like first and second failure group. this principle should work but not tested.
2. You can estimate the work by using dynamic performance view from V$ASM_ESTIMATE but I am disappointed that the view does not show time based value, only number of allocation unit.
SQL> explain work set statement_id='storage_mig_task1' for alter diskgroup DATAMIRROR add failgroup new_pool2 disk '/dev/mapper/newdisk01_01_p2' drop disks in failgroup P2; Explained. SQL> select est_work,GROUP_NUMBER from v$asm_estimate where statement_id='storage_mig_task1'; EST_WORK ---------- 169