vCenter Server Appliance: Troubleshooting full database partition

A customer of mine had within 6 months twice a full database partition on a VMware vCenter Server Appliance. After the first outage, the customer increased the size of the partition which is mounted to /storage/db. Some months later, some days ago, the vCSA became unresponsive again. Again because of a filled up database partition. The customer increased the size of the database partition again (~ 200 GB!!) and today I had time to take a look at this nasty vCSA.

The issue

Patrick Terlisten/ vcloudnine.de/ Creative Commons CC0

Within 2 days, the storage usage of the databse increased from 75% to 77%. First, I checked the size of the database:

vcsa:/opt/vmware/vpostgres/current/bin # /opt/vmware/vpostgres/current/bin/psql -h localhost -U vc VCDB
psql.bin (9.0.17)
Type "help" for help.

VCDB=> SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database;
  datname  |  size
-----------+---------
 template1 | 5353 kB
 template0 | 5345 kB
 postgres  | 5449 kB
 VCDB      | 2007 MB
(4 rows)

VCDB=>

As you can see, the database had only 2 GB. The pg_log directory was more interesting:

vcsa:/storage/db/vpostgres # du -shc /storage/db/vpostgres/*
4.0K    /storage/db/vpostgres/PG_VERSION
2.0G    /storage/db/vpostgres/base
704K    /storage/db/vpostgres/global
47M     /storage/db/vpostgres/pg_clog
4.0K    /storage/db/vpostgres/pg_hba.conf
4.0K    /storage/db/vpostgres/pg_ident.conf
141G    /storage/db/vpostgres/pg_log
252K    /storage/db/vpostgres/pg_multixact
12K     /storage/db/vpostgres/pg_notify
324K    /storage/db/vpostgres/pg_stat_tmp
20K     /storage/db/vpostgres/pg_subtrans
4.0K    /storage/db/vpostgres/pg_tblspc
4.0K    /storage/db/vpostgres/pg_twophase
81M     /storage/db/vpostgres/pg_xlog
20K     /storage/db/vpostgres/postgresql.conf
4.0K    /storage/db/vpostgres/postmaster.opts
4.0K    /storage/db/vpostgres/postmaster.pid
0       /storage/db/vpostgres/serverlog
143G    total

The directory was full with log files. The log files containted only one message:

vcsa:/storage/db/vpostgres/pg_log # more postgresql-2015-03-04_090525.log
 123462 tm:2015-03-04 09:05:25.488 UTC db:VCDB pid:1527 WARNING:  there is already a transaction in progress

The solution

This led me to VMware KB2092127 (After upgrading to vCenter Server Appliance 5.5 Update 2, pg_log file reports this error: WARNING: there is already a transaction in progress). And yes, this appliance was upgraded to U2 with high probability. The solution is described in KB2092127, and is really easy to implement. Please note that this is only a workaround. There’s currently no solution, as mentioned in the article.