In our VCD event logs we see this message every minute:
User 'clusterauthor' (83137b58-3506-416b-a31d-e68962bce07b) authorize
together with this event:
Hi,
I have a similar issue. vCloud director "database full".
In my case the table "audit_trail" in the "vcloud" database in vCloud director postgres database is now 160 GB. Database size has been increased multiples time but this can't be the solution in long term vision.
Stopping the vAPPs associated to the Tanzu kubernetes cluster "beta" stops the generation of new logs.
I see also many "Access token created ..." events but i am not sure if they are related to beta or legacy CSE clusters.
But i think the ones consuming the largest amount of data are the events of type "definedEntity/modify ''beta006'' (9ebee87d-9d05-4c3f-b8e7-01ea477ac48c)" - beta006 is one of the cluster create with CSE beta.
Because the Details are very large. See attached file.
Solution attempted so far:
I have already reduced in "Administration">"General settings">"Activity logs", logs history to keep and shown to 20 days but it doesn't seem that the entries older than 20 days in audit_trail are removed.
I guess there is a script responsible for cleaning old events, if it is the case and someone knows how to manually start it please let me know.
I am not even sure if it could work because i was in the assumption these settings works with the "audit_event" table, and when i was looking at row there, there are none in this table.
Questions:
Is it expect that cluster created with CSE beta will create events and therefore many rows in "audit_trail" database?
(Note, it is possible that cloud director is/was configured with advanced settings i am not aware, like adding extra logging during a previous support call)
What is the best way of cleaning the "audit_trail" database?
Are "Activity logs" settings supposed to have an impact on the "audit_trail" table? If yes, how to manually start the cleaning script?
What would be the impact of deleting the oldest rows using SQL commands against "audit_trail"?
If it could be done without breaking Cloud DIrector it would be an easy workaround.
Thanks for bringing this up. We are looking into it and might come back with some questions. Could you share the version of VCDs you are using?
Hi @ccalvetbeta @rickvanvliet,
Thanks for this report and we will fix the repeated logins at top priority.
Based on our understanding, an audit trial log for a login should be ~1KB. So even with so many logins we should not use up the database to that extent. So our suspicion is that something else could be going on to raise this size to the 160GB mentioned.
Do you have a sense of what tables could be large in this database? What are the frequent operations that you perform and what is the scale that you run.
Hi @akrishnakuma @agoel
As mentioned i doubt the login events are the one consuming the most space in my case.
Database: vcloud
Table: audit_trail
(I am wondering is this audit_trail is expected or maybe was due to previous troubleshooting on this cloud director instance, i don't have full control of history)
The weird thing is if i look at the first rows they are always the same, they are never purged and therefore database grows in size.
If i looks at latest row, (i have just build a new cluster)
I see a lot of "modify" event, and i think there are the one filling the database because the payload is large. (I can't display it with the query, because it breaks all formatting)
Example of such event
(Details of a similar event could be seen attached to my previous post)
My immediate concern is how to clean this database. Could i just run a query to remove the first (oldest) 1000 rows for example?
Note, this cloud director database is not in a cluster anymore.
Update: Just discovered with another sql query that there are older logs in this table.
So just using "limit 5" does not display the first entries.
By using instead
SELECT id, event_type, event_time, org_member_id, tenant_id FROM audit_trail ORDER BY event_time limit 100;
i end up with the real first events, which seems already related to the beta. So maybe starting when the beta and clusters were first deployed.
select event_type, count(*) as num from audit_trail group by event_type order by num desc
This should give you the aggregates without having to rely on limits, the query should be pretty fast. I can run the query in under 10 seconds on a 30GB db
KB to cleanup audit table
https://kb.vmware.com/s/article/2106123
The KB is little old, for postgres the following should work
DELETE from audit_trail WHERE event_time < '2022-09-01 06:00:00.000';
Hi,
I did manage to remove rows.
However the table itself is still displayed at the same size.
Could you please let me know what should be the next step?
Is it normal that CSE beta is using the audit_trail and not only audit_event?
Hi, any update on this topic?
Hi, I have this problem. Please, how to fix this? Vcloud director version 10.4.1
Cse version 4.0.1
I believe it's fixed in a newer version of CSE