LAB-time: upgrading Grid Infrastructure (GI) from 12.1 to 18c
EDIT: This post is meant for playing around and this way is unsupported. The good way of upgrading is described here
First of all I like to start with telling you that this blog-post is intended only for testing, playing, learning,… things. It is not intended for production use and currently 18c is only available for Cloud and engineered systems environments. For on-premises availability check Mike Dietrich’s ( @MikeDietrichDE ) blog here: https://mikedietrichde.com/2018/03/20/when-will-oracle-database-18c-be-available-on-prem/ . There I like his sentence that I will quote:
Anyway, for all the details and announcements, please see always the single-source-of-truth MOS Note: 742060.1 – Release Schedule of Current Database Releases.
This being said, this blogpost will tell you the upgrade path I decided to take to upgrade my 12.1 Grid infrastructure (GI) to 18c.
My homelab is running a 4 node flex cluster with 2 serverpools and one policy managed database. That one I will keep on 12.1. I know I should upgrade it as well to 12.2 but hey … can’t do it all at once.
The main problem which we will face are the exadata features which are currently enforced. The same happens when you try to start an 18c database on premise. The error you will get is
1 2 3 |
SQL> startup nomount; ORA-12754: Feature 'startup' is disabled due to missing capability 'Runtime Environment'. SQL> |
You will see that, apart from this, the upgrade goes pretty smooth. I’d like to thank Mahmoud Hatem (@Hatem__Mahmoud) for doing his research about why this happens. You can read his discoveries here.
Installation
Software staging
As of 12.2 setting up the GI become a lot easier as it used to be in earlier days. Just unzipping the software and running a setup script. How hard can it be?
Create the new directories
1 2 3 |
[root@labvmr01n01 ~]# mkdir -p /u01/app/18.0.0/grid [root@labvmr01n01 ~]# chown -R grid:oinstall /u01/app/18.0.0 [root@labvmr01n01 ~]# |
And this has to be done on all the nodes.
Unzipping the software, has to be done as the owner of the Grid Infrastructure on the first node only:
1 2 |
[grid@labvmr01n01 grid]$ unzip -qd /u01/app/18.0.0/grid /ora18csoft/V974952-01.zip [grid@labvmr01n01 grid]$ |
Prechecks
As with each installation, prerequisites should be met. That’s what we need our good friend cluvfy for.
Call me paranoia, but I usually first want to have my peace of mind that my current running cluster is ok.
1 2 3 |
[root@labvmr01n01 ~]# /u01/app/12.1.0/grid/bin/crsctl query crs releaseversion Oracle High Availability Services release version on the local node is [12.1.0.2.0] [root@labvmr01n01 ~]# |
1 2 3 |
[root@labvmr01n01 ~]# /u01/app/12.1.0/grid/bin/crsctl query crs softwareversion Oracle Clusterware version on node [labvmr01n01] is [12.1.0.2.0] [root@labvmr01n01 ~]# |
1 2 3 |
[root@labvmr01n01 ~]# /u01/app/12.1.0/grid/bin/crsctl query crs activeversion -f Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [0]. [root@labvmr01n01 ~]# |
1 2 3 4 5 6 |
[root@labvmr01n01 ~]# /u01/app/12.1.0/grid/bin/crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [root@labvmr01n01 ~]# |
And then I usually gather some evidence.
1 2 |
[root@labvmr01n01 ~]# /u01/app/12.1.0/grid/bin/crsctl stat res -t > /root/crsstatus.txt [root@labvmr01n01 ~]# |
That way, I can always refer back on “which was the output again”.
Cluvfy
As the Grid infrastructure owner, run the cluvfy in pre install mode.
1 |
/u01/app/18.0.0/grid/runcluvfy.sh stage -pre crsinst -upgrade -rolling -src_crshome /u01/app/12.1.0/grid -dest_crshome /u01/app/18.0.0/grid -dest_version 18.0.0.0.0 -fixup -verbose |
This will check our environment about potential issues which will hold you back from from upgrading. You see I will be brave and attempt a rolling upgrade 🙂 for the rest it is similar to the cluvfy command you’re used to.
When I ran it, something caught my attention:
1 2 3 4 5 6 7 8 9 |
Verifying Oracle patch:21255373 ... Node Name Applied Required Comment ------------ ------------------------ ------------------------ ---------- labvmr01n01 missing 21255373 failed labvmr01n04 missing 21255373 failed labvmr01n03 missing 21255373 failed labvmr01n02 missing 21255373 failed Verifying Oracle patch:21255373 ...FAILED (PRVG-1261) Verifying This test checks that the source home "/u01/app/12.1.0/grid" is suitable for upgrading to version "18.0.0.0.0". ...PASSED |
and at the bottom
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
Pre-check for cluster services setup was unsuccessful. Checks did not pass for the following nodes: labvmr01n04,labvmr01n03,labvmr01n02,labvmr01n01 Failures were encountered during execution of CVU verification request "stage -pre crsinst". Verifying Swap Size ...FAILED labvmr01n04: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n04" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n03: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n03" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n02: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n02" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n01: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n01" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] Verifying Soft Limit: maximum stack size ...FAILED labvmr01n04: PRVG-0449 : Proper soft limit for maximum stack size was not found on node "labvmr01n04" [Expected >= "10240" ; Found = "8192"]. labvmr01n03: PRVG-0449 : Proper soft limit for maximum stack size was not found on node "labvmr01n03" [Expected >= "10240" ; Found = "8192"]. labvmr01n02: PRVG-0449 : Proper soft limit for maximum stack size was not found on node "labvmr01n02" [Expected >= "10240" ; Found = "8192"]. labvmr01n01: PRVG-0449 : Proper soft limit for maximum stack size was not found on node "labvmr01n01" [Expected >= "10240" ; Found = "8192"]. Verifying Oracle patch:21255373 ...FAILED labvmr01n04: PRVG-1261 : Required Oracle patch "21255373" in home "/u01/app/12.1.0/grid" is not found on node "labvmr01n04". labvmr01n03: PRVG-1261 : Required Oracle patch "21255373" in home "/u01/app/12.1.0/grid" is not found on node "labvmr01n03". labvmr01n02: PRVG-1261 : Required Oracle patch "21255373" in home "/u01/app/12.1.0/grid" is not found on node "labvmr01n02". labvmr01n01: PRVG-1261 : Required Oracle patch "21255373" in home "/u01/app/12.1.0/grid" is not found on node "labvmr01n01". CVU operation performed: stage -pre crsinst Date: Mar 12, 2018 4:40:56 PM CVU home: /u01/app/18.0.0/grid/ User: grid ****************************************************************************************** Following is the list of fixable prerequisites selected to fix in this session ****************************************************************************************** -------------- --------------- ------------- ------------- --- ----- Check failed. Failed on nodes Reboot Re-Login required? required? -------------- --------------- ------------- ------------- --- ----- Soft Limit: maximum stack labvmr01n04, no yes size labvmr01n03, labvmr01n02, labvmr01n01 Execute "/tmp/CVU_18.0.0.0.0_grid/runfixup.sh" as root user on nodes "labvmr01n01,labvmr01n02,labvmr01n03,labvmr01n04" to perform the fix up operations manually Press ENTER key to continue after execution of "/tmp/CVU_18.0.0.0.0_grid/runfixup.sh" has completed on nodes "labvmr01n01,labvmr01n02,labvmr01n03,labvmr01n04" |
I do ignore the swap, I know about that and you should not ignore it for production, but as a sandbox playgarden … you got the picture.
I HAVE to run the fixup scripts
1 2 3 |
[root@labvmr01n01 ~]# /tmp/CVU_18.0.0.0.0_grid/runfixup.sh All Fix-up operations were completed successfully. [root@labvmr01n01 ~]# |
Afterwards I have to patch my 12.1 cluster. The only patch a base install 12.1 cluster needs is patch 21255373. It’s a full rolling patch which is applied using opatchauto and I did not have any issues on my environment, so I won’t cover that here.
After this, patched the system with the required patch (rolling ofcourse) and reran the precheck:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Pre-check for cluster services setup was unsuccessful. Checks did not pass for the following nodes: labvmr01n04,labvmr01n03,labvmr01n02,labvmr01n01 Failures were encountered during execution of CVU verification request "stage -pre crsinst". Verifying Swap Size ...FAILED labvmr01n04: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n04" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n03: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n03" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n02: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n02" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] labvmr01n01: PRVF-7573 : Sufficient swap size is not available on node "labvmr01n01" [Required = 7.6739GB (8046680.0KB) ; Found = 1023.9961MB (1048572.0KB)] CVU operation performed: stage -pre crsinst Date: Mar 12, 2018 9:20:31 PM CVU home: /u01/app/18.0.0/grid/ User: grid [grid@labvmr01n01 grid]$ |
So for this cluster, we’re good to go (I would not continue in prod, but now I’m repeating myself).
Setup
As stated before, installing a cluster is now as simple as running a configuration script. At least, kind of. Instead of runInstaller we now use gridSetup.sh.
1 2 |
[grid@labvmr01n01 grid]$ ./gridSetup.sh -responseFile /home/grid/grid18c.rsp Launching Oracle Grid Infrastructure Setup Wizard... |
And the GUI pops up.
We go for the upgrade.
Make sure all your nodes are listed and test the ssh-equivalence, it should be working already, but better safe than sorry.
As it is just for test and playing, I won’t register it in Cloud control now.
I have started it with a responseFile and I did fill in my Oracle Home. That’s why I can’t change it here I think. I like the idea of keeping the oracle base and oracle home separated. But that’s another discussion if this is ok or not.
I usually run my root scripts part of the installer. I know you can do it manually, but the only scripts he’s running, and he will ask you before starting, is the rootupgrade.sh. So we know what he’s doing and if it fails, then there is the retry button because since 12 the rootupgrade.sh is restartable. So no harm in doing it this way.
I like this idea! If you have a big amount of nodes you can separate them in batches. This also saved my a** a little as between the batches you get a pop-up to ask you if it is ok to continue with the next batch. I used this time in between to correct the missing _ parameter in the asm spfile to make sure that during the installation always at least two asm instances are available. Yes this is something I definitely would do in production, but it’s to get it running. Also, we know that in July it is planned to be released for on premises, so no fiddling around anymore by then, but for now, it does help.
The very well known moment of truth.
it’s my lab … /u01 and swap are pretty small, so this is safe to ignore. He will complain with a dialog box that you choose to ignore this message and you can confirm that you’re sue about it.
This is something I would definitely recommend. Always save your response files! You never know what you need them for. For re-running your configuration assistants for example 😉
And there the fun starts! After a while it pauses, and do not click anything yet!
During my installation I had node1 and node2 in one batch and node3 and node4 in the other batch. What happened during the rootupgrade.sh was that indeed the asm instance did not come up properly due to the error
1 |
ORA-12754: Feature 'startup' is disabled due to missing capability 'Runtime Environment'. |
This wasn’t too much of a problem as my database was still able to connect to asm through the other surviving asm instances. The moment I saw that I hit this error I started the instance using a pfile containing the spfile entry and the underscore parameter. When all was done, I recreated the spfile containing the _exadata_features_on parameter. The proxy instances did pick up their pfile in $GRID_HOME/dbs and started up without any issue.
If you have only 2 nodes, it can be an option to put each node in a separate batch. It seems a bit overkill at first, but it gives you a pause to make sure you always have an asm instance available and to connect to it so the assistants don’t fail. When your both asm instances and their proxies are back online, then click “Execute now” and the installer continues.
Then It’s time for the configuration assistants.
If for some reason or another you loose your session if you want to rerun the config assistants, then you can rerun them using gridSetup.sh and giving the execConfigTools flag.
1 2 3 |
[grid@labvmr01n01 ~]$ /u01/app/18.0.0/grid/gridSetup.sh -responseFile /home/grid/grid18c_ok.rsp -executeConfigTools -all Launching Oracle Grid Infrastructure Setup Wizard... |
This went actually pretty smooth as soon as I found out on how to get around the GIMR issue. Check ISSUE 4 further in this blogpost. Afterwards … all was done and I had a running 18c cluster.
Next steps was to
- enable the volume and acfs volume GHCHKPT.
- enable and start the rhp (rapid home provisioning)
In my case they were not enabled by default. You can choose, or you do it in the brand new fancy asmca and click around. In the settings box, you can enter the root passwordt, which makes life a little easier, or you use the commandline. It’s up to you.
For rhp, you must do it using cli as the grid user
1 2 |
[grid@labvmr01n01 ~]$ /u01/app/18.0.0/grid/bin/srvctl start rhpserver [grid@labvmr01n01 ~]$ |
After doing all that this was the result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
[grid@labvmr01n01 ~]$ /u01/app/18.0.0/grid/bin/crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ACFS.ACFS.advm ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.ACFS.dg ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.ASMNET1LSNR_ASM.lsnr ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.DATA.GHCHKPT.advm ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.DATA.dg ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.LISTENER.lsnr ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.RECO.dg ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.acfs.acfs.acfs ONLINE ONLINE labvmr01n01 mounted on /acfs,STA BLE ONLINE ONLINE labvmr01n02 mounted on /acfs,STA BLE ONLINE ONLINE labvmr01n03 mounted on /acfs,STA BLE ONLINE ONLINE labvmr01n04 mounted on /acfs,STA BLE ora.chad ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.data.ghchkpt.acfs ONLINE ONLINE labvmr01n01 mounted on /mnt/orac le/rhpimages/chkbase ,STABLE ONLINE ONLINE labvmr01n02 mounted on /mnt/orac le/rhpimages/chkbase ,STABLE ONLINE ONLINE labvmr01n03 mounted on /mnt/orac le/rhpimages/chkbase ,STABLE ONLINE ONLINE labvmr01n04 mounted on /mnt/orac le/rhpimages/chkbase ,STABLE ora.helper ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.net1.network ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.ons ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE ora.proxy_advm ONLINE ONLINE labvmr01n01 STABLE ONLINE ONLINE labvmr01n02 STABLE ONLINE ONLINE labvmr01n03 STABLE ONLINE ONLINE labvmr01n04 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE labvmr01n04 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE labvmr01n02 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE labvmr01n03 STABLE ora.MGMTLSNR 1 ONLINE ONLINE labvmr01n04 169.254.28.180 192.1 68.123.4,STABLE ora.asm 1 ONLINE ONLINE labvmr01n01 Started,STABLE 2 ONLINE ONLINE labvmr01n03 Started,STABLE 3 ONLINE ONLINE labvmr01n04 Started,STABLE 4 ONLINE ONLINE labvmr01n02 Started,STABLE ora.cdbrac.db 1 ONLINE ONLINE labvmr01n01 Open,HOME=/u01/app/o racle/product/12.1.0 /dbhome_1,STABLE 2 ONLINE ONLINE labvmr01n02 Open,HOME=/u01/app/o racle/product/12.1.0 /dbhome_1,STABLE ora.cdbrac.raclab_pdb_ha_test.svc 1 ONLINE ONLINE labvmr01n02 STABLE 2 ONLINE ONLINE labvmr01n01 STABLE ora.cvu 1 ONLINE ONLINE labvmr01n04 STABLE ora.gns 1 ONLINE ONLINE labvmr01n02 STABLE ora.gns.vip 1 ONLINE ONLINE labvmr01n02 STABLE ora.labvmr01n01.vip 1 ONLINE ONLINE labvmr01n01 STABLE ora.labvmr01n02.vip 1 ONLINE ONLINE labvmr01n02 STABLE ora.labvmr01n03.vip 1 ONLINE ONLINE labvmr01n03 STABLE ora.labvmr01n04.vip 1 ONLINE ONLINE labvmr01n04 STABLE ora.mgmtdb 1 ONLINE ONLINE labvmr01n04 Open,STABLE ora.qosmserver 1 ONLINE ONLINE labvmr01n04 STABLE ora.rhpserver 1 ONLINE ONLINE labvmr01n01 STABLE ora.scan1.vip 1 ONLINE ONLINE labvmr01n04 STABLE ora.scan2.vip 1 ONLINE ONLINE labvmr01n02 STABLE ora.scan3.vip 1 ONLINE ONLINE labvmr01n03 STABLE -------------------------------------------------------------------------------- [grid@labvmr01n01 ~]$ |
One happy DBA 🙂
The summary of the issues and their workaround comes next.
Issues and their workarounds
ISSUE 1
During one of the upgrade attempts, my installation kept complaining I wasn’t on the first node. Afterwards I found out I found it had to do with DNS. I Installed my old cluster using shortnames and wanted the new nodes to have their fully qualified domain name. In the logs he then sees that it doesn’t match exactly and he tells you that you’re not on the first node. The logfile you’re looking for is cluutil2.log.
1 2 |
[main] [ 2018-03-17 11:51:15.821 CET ] [Utils.getLocalHost:479] Hostname retrieved: http://labvmr01n01.labo.internal.stepi.net , returned: labvmr01n01 [main] [ 2018-03-17 11:51:15.822 CET ] [ClusterwareCkpt.parseArgs:280] args = -ckpt,-global,-oraclebase,/u01/app/grid,-chkckpt,-name,ROOTCRS_FIRSTNODE,-status |
So to make sure for our own peace of mind, do the installation on the master node. It’s easy to retrieve who this master node is:
1 2 3 |
[root@labvmr01n01 ~]# grep 'master node' /u01/app/grid/diag/crs/labvmr01n01/crs/trace/ocssd.trc |tail -1 2018-03-17 09:42:26.014931 : CSSD:2289198848: clssgmCMReconfig: reconfiguration successful, incarnation 416745742 with 4 nodes, local node number 1, master node labvmr01n01, number 1 [root@labvmr01n01 ~]# |
So you see that in my particular case labvmr01n01 is my master node and I will thus perform the upgrade from the master node.
ISSUE 2
ASM instances
If you’re just like me too stubborn to check some things upfront sometimes. Ok I admit, this was on first attempt, but I would highly recommend to make sure your asm proxy instances are running. I needed them to make sure the upgrade succeeded. Also, make a not on where your spfile is located in asm:
1 2 3 |
[grid@labvmr01n01 ~]$ asmcmd spget +DATA/labvmr01-clu/ASMPARAMETERFILE/registry.253.943891935 [grid@labvmr01n01 ~]$ |
WARNING: this is an example! In the rest of my journey, the spfile might differ. What I did is also create a copy on the filesystem “just in case”.
If you find yourself in the same troubles as me, then you would end up in an asm instance who refused to start and teases you with
1 |
ORA-12754: Feature 'startup' is disabled due to missing capability 'Runtime Environment'. |
We can get around this. If you’re stuck and you don’t have a copy of your spfile, find the parameters back in the asm alertlog and construct it yourself with some creativity. In asm alert log you will seen something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
... NOTE: remote asm mode is remote (mode 0x202; from cluster type) Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options. ORACLE_HOME = /u01/app/12.1.0/grid System name: Linux Node name: labvmr01n01.labo.internal.stepi.net Release: 3.8.13-16.2.1.el6uek.x86_64 Version: #1 SMP Thu Nov 7 17:01:44 PST 2013 Machine: x86_64 Using parameter settings in server-side spfile +DATA/labvmr01-clu/ASMPARAMETERFILE/registry.253.943891935 System parameters with non-default values: large_pool_size = 12M remote_login_passwordfile= "EXCLUSIVE" asm_diskstring = "/dev/asm*" asm_diskgroups = "RECO" asm_diskgroups = "ACFS" asm_power_limit = 1 NOTE: remote asm mode is remote (mode 0x202; from cluster type) ... |
This is the moment where the underscore parameter comes into play the first time. Construct a pfile containing the spfile and the underscore parameter, then you can include the underscore in the spfile and you’re good to go again (but only until the proxy instances pop up).
1 2 3 4 5 6 7 8 9 |
[grid@labvmr01n01 ~]$ cd /u01/app/18.0.0/grid/dbs/ [grid@labvmr01n01 dbs]$ ls hc_+APX1.dat hc_+ASM1.dat init.ora [grid@labvmr01n01 dbs]$ vi init+ASM1.ora [grid@labvmr01n01 dbs]# cat /u01/app/18.0.0/grid/dbs/init+ASM1.ora *.spfile="+DATA/labvmr01-clu/ASMPARAMETERFILE/registry.253.943891935" *._exadata_feature_on=true [grid@labvmr01n01 dbs]$ |
Then start the asm instance and get it online (shut it down afterwards again, because in this stage your upgrade assistant may be hanging and then you can just retry the operation)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
[grid@labvmr01n01 dbs]$ export ORACLE_HOME=/u01/app/18.0.0/grid [grid@labvmr01n01 dbs]$ export ORACLE_SID=+ASM1 [grid@labvmr01n01 dbs]$ export PATH=$ORACLE_HOME:$ORACLE_HOME/bin:$PATH [grid@labvmr01n01 dbs]$ sqlplus / as sysasm SQL*Plus: Release 18.0.0.0.0 Production on Tue Mar 13 11:48:21 2018 Version 18.1.0.0.0 Copyright (c) 1982, 2017, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount pfile=/u01/app/18.0.0/grid/dbs/init+ASM1.ora; ASM instance started Total System Global Area 1136934472 bytes Fixed Size 8666696 bytes Variable Size 1103101952 bytes ASM Cache 25165824 bytes SQL> alter diskgroup data mount; Diskgroup altered. SQL> alter system set "_exadata_feature_on"=true scope=spfile sid='*'; System altered. SQL> shutdown immediate; ASM diskgroups dismounted ASM instance shutdown SQL> |
Oh some nice to know. Don’t try to be smarter than Oracle and set it upfront, 12.1 doesn’t recognise it and will not start due to invalid parameters. At this point it’s in the spfile in the 18c version, so all good now.
proxy instances
For the proxy instances, they are a little different. The easiest workaround I found to get them starting and remain consistent during the process is to give them a pfile in /u01/app/18.0.0/grid/dbs . If you do this upfront, you only need to add them to node 1 as during the gridSetup.sh the home is copied over.
1 2 3 4 5 6 |
[root@labvmr01n01 ~]# ls -l /u01/app/18.0.0/grid/dbs/init*APX* -rw-r--r-- 1 grid oinstall 52 Mar 13 12:52 /u01/app/18.0.0/grid/dbs/init+APX1.ora -rw-r--r-- 1 grid oinstall 52 Mar 13 12:52 /u01/app/18.0.0/grid/dbs/init+APX2.ora -rw-r--r-- 1 grid oinstall 52 Mar 13 12:52 /u01/app/18.0.0/grid/dbs/init+APX3.ora -rw-r--r-- 1 grid oinstall 52 Mar 13 12:52 /u01/app/18.0.0/grid/dbs/init+APX4.ora [root@labvmr01n01 ~]# |
So in the end, I have these 4 files on all 4 nodes, just in case some instance is not on it’s normal node which can happen in a flex cluster.
The content of these files is the same for every file
1 2 3 4 |
[root@labvmr01n01 ~]# cat /u01/app/18.0.0/grid/dbs/init+APX1.ora *._exadata_feature_on=true *.instance_type=asmproxy [root@labvmr01n01 ~]# |
ISSUE 3
This is completely my fault by running the installer of 12.1 manually during a rebuild instead of using my scripts. I ended up with different groups. It is normal that MUST match. So what I did is, I copied my response file 12.1 to 18c and then i started the gridSetup.sh with the -responseFile option. That way you can convince the installer to use some other variables.
ISSUE 4
The GIMR (Grid Infrastructure Management repository ). This puzzled me during the first time I tried to upgrade. I admit, it was a bit late already, but it looked like the pfile was coming back or being generated. After some digging and reading scripts it was actually pretty simple. The assistant for the gimr firsts starts it up using its own pfile which it has backed up in the old $GRID_HOME/dbs and tries to drop it in order to recreate it.
In my, and i repeat: this is my particular case, I had screwed up (before) my GIMR already and as i do have limited resources I already deleted it. I know it is not healthy and I would strongly advise against doing so especially for production or real use clusters. That’s why the /u01/app/18.0.0/grid/crs/sbs/dropdb.sbs script failed. If you follow that carefully you can remove the GIMR manually and edit the script so it returns 0 then the installer accepts the retry. If you decide to do this, make sure you know what you’re doing and understand what is happening because if you leave some things behind Oracle doesn’t expect the rest might fail as well and we don’t want that. For creating the GIMR following command is used internally:
1 |
/u01/app/18.0.0/grid/bin/dbca -silent -createDatabase -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc -sid -MGMTDB -gdbName _mgmtdb -storageType ASM -diskGroupName DATA -datafileJarLocation /u01/app/18.0.0/grid/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck -oui_internal |
And that explains why I was thinking the pfile was rebuild each time. To work around this is very easy. Edit the template and add the underscore parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[grid@labvmr01n01 pfile]$ grep -A5 -B5 exadata /u01/app/18.0.0/grid/assistants/dbca/templates/MGMTSeed_Database.dbc </CommonAttributes> <Variables/> <CustomScripts Execute="false"/> <InitParamAttributes> <InitParams> <initParam name="_exadata_feature_on" value="TRUE"/> <initParam name="cluster_database" value="FALSE"/> <initParam name="db_name" value=""/> <initParam name="db_domain" value=""/> <initParam name="dispatchers" value="(PROTOCOL=TCP) (SERVICE={SID}XDB)"/> <initParam name="audit_file_dest" value="{ORACLE_BASE}/admin/{DB_UNIQUE_NAME}/adump"/> [grid@labvmr01n01 pfile]$ |
So folks, this was all. It was a lengthy post this time, but …
As always, questions, remarks? find me on twitter @vanpupi