Aria Automation pods fail to start
- Mohammed Bilal
- Sep 24
- 3 min read
A few days ago, I ran into an interesting issue in my lab setup. Out of nowhere, all my appliances suddenly went into read-only mode due to a backend storage issue. At first, it appeared to be a complete outage, but after a series of reboots, most of the appliances recovered and came back online without much issue
.
The real trouble started with the Aria Automation appliance. Unlike the others, it didn’t bounce back after the reboot. Instead, I noticed that the PODs were consistently failing to start and showing the error status "ImageInspectError"
Aria Automation UI was inaccessible.
deploy.sh script fails to start the services.
Executing kubectl describe pod <name-of-pod> results in: Error from server (NotFound): pods <name-of-pod> not found.
Running kubectl -n prelude get pods shows the pod status as ImageInspectError.

This turned out to be the key challenge I had to troubleshoot further.
Inventory sync failed in LCM with error code: "LCMVRAVACONFIG590093"
Failed to trigger import for the associated vRealize Automation SaltStack Config in vRealize Automation.
Failed to trigger import for the associated vRealize Automation SaltStack Config in vRealize Automation.
Start services failed in LCM with error Code: "LCMVRAVACONFIG590070"
Failed to start services on vRealize Automation. Refer to the vRealize Suite Lifecycle Manager log for additional details.
Failed to start services on vRA vrlautomation.austro.grpfin. For more information, login into the vRA and check /var/log/deploy.log fileWhen checking the services, we can see there are postgres errors like the one below:
/services-logs/prelude/provisioning-service-app/file-logs/provisioning-service-app.log
2022-12-04T06:16:40.521Z ERROR provisioning [host='provisioning-service-app-c976b74f8-t6zbx' thread='main' user='' org='' trace='' parent='' span=''] o.s.boot.SpringAp
plication.reportFailure:824 - Application run failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataSourceScriptDatabaseInitializer' defined in class path resource [org/springframework/boot/autoconfigure/sql/init/DataSourceInitializationConfiguration.class]:
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
Caused by: java.net.UnknownHostException: pgpool
2022-12-04T06:16:40.522Z WARN provisioning [host='provisioning-service-app-c976b74f8-t6zbx' thread='main' user='' org='' trace='' parent='' span=''] c.v.a.h.ProvisioningSpringApplication.main:125 - Failed to start ProvisioningSpringApplication
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataSourceScriptDatabaseInitializer' defined in class path resource [org/springframework/boot/autoconfigure/sql/init/DataSourceInitializationConfiguration.class]: Unsatisfied dependency expressed through method 'dataSourceScriptDatabaseInitializer' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataSource' defined in com.vmware.admiral.host.ProvisioningSpringApplicationConfiguration: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [javax.sql.DataSource]: Factory method 'dataSource' threw exception; nested exception is com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: The connection attempt failed.
Tried rebooting the node multiple times, but the issue persisted.
Re-ran the deploy.sh script several times, without success.
After further discussions with colleagues, we suspected possible corruption in the underlying VM storage. Another possibility was that an abrupt interruption had prevented certain Docker containers from loading properly.
Since traditional restarts weren’t resolving the problem, I followed the steps below to recover the environment.
Pre-req:
Take non-memory snapshots of each virtual appliance in the cluster.
Procedure:
SSH into the virtual appliance experiencing the issue.
Shut down services:
/opt/scripts/deploy.sh --shutdownRemove cleanup files (on affected nodes). In my case, I just had 1 single node:
rm -f /var/vmware/prelude/docker/last-cleanupExecute the script below on the affected node to refresh the Docker images:
/opt/scripts/restore_docker_images.shStart services again:
/opt/scripts/deploy.shValidate the PODs are running successfully by executing the following command:
kubectl get pods -n preludeFollowing the above steps, the services were brought back online, and the environment stabilized.
The key takeaway for me was that when symptoms point to container-level failures (like ImageInspectError), the root cause may involve corrupted state or interrupted processes, which a targeted cleanup and controlled restart can resolve.





Comments