While preparing my VCF 5.1 lab setup to upgrade to VCF 5.2. I noticed the SDDC UI kept loading and UI wasn't displaying the information.
Service(s) may fail to start with "Could not acquire change log lock." if the service or SDDC Manager is abruptly restarted during service initialization
So it was time to look at the logs and identify what was causing this issue.
Logged into the SDDC via SSH and checked the status of the services all the services were up and running fine.
So I thought this could be due to a thread/CPU lock so first, restarted all the VCF services successfully using the below command. But this didn't help still the UI kept loading.
/opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh
I started, by looking into the operationsmanager.log and noticed the below error:
2024-08-30T01:00:10.050+0000 ERROR [vcf_om,b5dcc0f4377f67c9,2136] [c.v.v.r.c.LogicalInventoryClient,pool-5-thread-1] Failed to get the inventory details for a particular domain 00f8056a-57a8-4ac6-b977-5e74ed5e252c with error message {}
com.vmware.cloud.foundation.rest.commonsvcs.runtime.ApiException: java.net.SocketTimeoutException: timeout
at com.vmware.cloud.foundation.rest.commonsvcs.runtime.ApiClient.execute(ApiClient.java:845)
at com.vmware.cloud.foundation.rest.commonsvcs.service.InventoryServiceApi.getDomainInventoryWithHttpInfo(InventoryServiceApi.java:4586)
at com.vmware.cloud.foundation.rest.commonsvcs.service.InventoryServiceApi.getDomainInventory(InventoryServiceApi.java:4572)
2024-08-30T01:20:25.186+0000 ERROR [vcf_om,0000000000000000,0000] [c.v.v.p.s.PasswordLookupService,om-exec-2] Error occurred in lookup tasks
2024-08-30T01:20:25.187+0000 DEBUG [vcf_om,0000000000000000,0000] [c.v.v.p.v.u.CredentialsValidationTaskExecutor,om-exec-2] Exception occurred while validating credentials: Failed to look up passwords due to an internal problem. Try again later.
com.vmware.vcf.passwordmanager.exception.PasswordManagerException: Failed to look up passwords due to an internal problem. Try again later.
at com.vmware.vcf.passwordmanager.service.PasswordLookupService.getPasswordsForCredentialIdList(PasswordLookupService.java:472)
at com.vmware.vcf.passwordmanager.validation.utils.CredentialsValidationTaskExecutor$1.call(CredentialsValidationTaskExecutor.java:114)
at com.vmware.vcf.passwordmanager.validation.utils.CredentialsValidationTaskExecutor$1.call(CredentialsValidationTaskExecutor.java:84)
2024-08-30T01:22:31.439+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] status for validation id cee9c703-d98f-4ae9-a1ca-26eb4b5acbba is IN_PROGRESS, continue
2024-08-30T01:22:31.439+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] Wait for 10 seconds and then recheck status for remaining validation ids, size 1
2024-08-30T01:22:41.439+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] starting monitoring loop(226)
2024-08-30T01:22:41.441+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] Checking status for validation id cee9c703-d98f-4ae9-a1ca-26eb4b5acbba
2024-08-30T01:22:41.447+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordValidationService,pool-5-thread-1] Validation operation for id: cee9c703-d98f-4ae9-a1ca-26eb4b5acbba is in progress, retrieving response from credential validation operations
2024-08-30T01:22:41.452+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] status for validation id cee9c703-d98f-4ae9-a1ca-26eb4b5acbba is IN_PROGRESS, continue
2024-08-30T01:22:41.452+0000 DEBUG [vcf_om,90bc5f42bdc82603,5435] [c.v.v.p.s.PasswordExpirationService,pool-5-thread-1] Wait for 10 seconds and then recheck status for remaining validation ids, size 1
2024-08-30T01:22:44.382+0000 ERROR [vcf_om,6c2fa735440133fb,bce0] [c.v.v.p.helper.CredentialHelper,om-exec-17] Unable to find credentials for entityIds, status 502, exception com.vmware.cloud.foundation.rest.commonsvcs.runtime.ApiException: Bad Gateway com.vmware.cloud.foundation.rest.commonsvcs.runtime.ApiException: Bad Gateway
2024-08-30T12:21:39.099+0000 INFO [vcf_om,5e0579eb89d84c97,e2a1] [c.v.v.o.s.OperationsManagerAppInfo,om-exec-7] Going to update operationsmanager app info with { "id": "b5186f97-fb5a-4fc5-b43b-b29cd625089a", "name": "OPERATIONS_MANAGER", "version": "5.1.0-vcf5100RELEASE-22688321", "status": "ACTIVE", "serviceUrl": "http://127.0.0.1/operationsmanager", "description": "Operations Manager"}
2024-08-30T12:21:44.162+0000 INFO [vcf_om,5e0579eb89d84c97,e2a1] [c.v.v.o.s.OperationsManagerAppInfo,om-exec-7] Retrieving operationsmanager app info
2024-08-30T12:21:44.162+0000 INFO [vcf_om,5e0579eb89d84c97,e2a1] [c.v.v.o.s.OperationsManagerAppInfo,om-exec-7] Going to update operationsmanager app info with { "id": "b5186f97-fb5a-4fc5-b43b-b29cd625089a", "name": "OPERATIONS_MANAGER", "version": "5.1.0-vcf5100RELEASE-22688321", "status": "ACTIVE", "serviceUrl": "http://127.0.0.1/operationsmanager", "description": "Operations Manager"}
2024-08-30T12:21:45.872+0000 ERROR [vcf_om,0000000000000000,0000] [c.v.v.t.ceip.service.CronService,Thread-5] Unable to fetch SddcManager details from commonsvcs, as it might be restarting or down
From the logs it was observed that there is some issue with the commonsvc service, checked the service status was ACTIVE and RUNNING.
root@sddc-01a [ /home/vcf ]# systemctl status commonsvcs
● commonsvcs.service - VMware Cloud Foundation Platform Services
Loaded: loaded (/etc/systemd/system/commonsvcs.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2024-08-30 12:12:38 UTC; 3min 40s ago
Process: 17995 ExecStartPre=systemctl is-active commonsvcs-db.service (code=exited, status=0/SUCCESS)
Process: 17998 ExecStartPost=/usr/bin/sh -c /usr/bin/sudo /bin/test -f /var/log/vmware/vcf/commonsvcs/commonsvcs.out && /bin/chown vcf_commonsvcs:vcf /var/log/vmware/vcf/commonsvcs/cted, status=0/SUCCESS)
Process: 18004 ExecStartPost=/usr/bin/sh -c /usr/bin/sudo /bin/test -f /var/log/vmware/vcf/commonsvcs/commonsvcs.err && /bin/chown vcf_commonsvcs:vcf /var/log/vmware/vcf/commonsvcs/cted, status=0/SUCCESS)
Main PID: 17997 (java)
Tasks: 21 (limit: 19149)
Memory: 539.6M
CGroup: /system.slice/commonsvcs.service
└─17997 /usr/lib/jvm/openjdk-java17-headless.x86_64/bin/java -XX:+UseParallelGC --add-opens java.base/java.lang.module=ALL-UNNAMED --add-opens java.base/jdk.internal.reflect
Then it was time to dig into the commonsvc logs, and I noticed the below errors:
2024-08-30T02:09:14.367+0000 INFO [common,c525bd218568fe0e,0362] [liquibase.lockservice,main] Waiting for changelog lock....
2024-08-30T02:09:24.388+0000 INFO [common,c525bd218568fe0e,0362] [liquibase.lockservice,main] Waiting for changelog lock....
2024-08-30T02:09:34.400+0000 INFO [common,c525bd218568fe0e,0362] [liquibase.lockservice,main] Waiting for changelog lock....
2024-08-30T02:09:44.408+0000 INFO [common,c525bd218568fe0e,0362] [liquibase.lockservice,main] Waiting for changelog lock....
2024-08-30T02:19:15.316+0000 INFO [common,b1a0c0dc6c7eb7f2,aaf0] [liquibase.command,main] Command execution complete
2024-08-30T02:19:15.318+0000 WARN [common,0000000000000000,0000] [o.s.b.w.s.c.AnnotationConfigServletWebServerApplicationContext,main] Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [com/vmware/evo/sddc/sqldatabasemgmt/config/PlatformDbConfig.class]: Invocation of init method failed; nested exception is liquibase.exception.CommandExecutionException: liquibase.exception.LockException: Could not acquire change log lock. Currently locked by sddc-01a.corp.local (192.168.xx.1) since 8/30/24, 1:39 AM
From the above logs, it was confirmed that there is a lock held by a workflow, that needs to be cleared.
So now it was time to login to the Postgres DB and check which table and identify which db service and table was holding the lock.
All 4 services (Domain Manager, Operation Manager, LCM, and Platform) had a db lock table
So checked all four tables and Bingo found the lock on the platform table:
platform=# select * from databasechangeloglock;
id | locked | lockgranted | lockedby
----+--------+---------------------------+-------------------------------- 1 | t | 2024-08-30 01:39:03.05498 | sddc-01a.corp.local (192.168.xx.x)
(1 row)
Followed the below steps to fix this issue:
Took a snapshot of the SDDC Manager VM before proceeding
Open an SSH to the SDDC Manager VM, and change to root:
Run the following command to update the databasechangeloglock table status to false:
platform=# update databasechangeloglock set locked=false, lockgranted=null, lockedby=null where id=1;
UPDATE 1
Checked the table status again and the lock was released successfully:
platform=# select * from databasechangeloglock;
id | locked | lockgranted | lockedby
----+--------+-------------+----------
1 | f | |
(1 row)
After this, I restarted the services again using the script (sddcmanager_restart_services.sh) and post which reloaded the UI and was able to view the information successfully.
Happy learning.....
Commenti