Database maintenance is essential for ensuring the optimal performance, reliability and security of your database systems.
All databases requires some kind of maintenance tasks be performed regularly to achieve optimum performance.
Maintenance task are ongoing and repetitive which are ideally automated and schedule from cron scripts and task scheduler
- Routine vacuuming
- Reindex
- Log File Maintenance
Vacuuming Basics : PostgreSQL databases require periodic maintenance known as vacuuming.
PostgreSQL includes an auto vacuum process that automatically runs vacuum and analyze operations.
VACUUM command has to process each table on a regular basis for several reasons:
- To recover or reuse disk space occupied by updated or deleted rows.
- To update data statistics used by the PostgreSQL query planner.
- To update the visibility map, which speeds up index-only scans.
- To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
Note : Delete row it is not seen but occupied space and same case with update . This is vacuuming come to picture it is removed all this rows .
Variants of Vacuum :
There are two variants of VACUUM:
- Vacuum
- Full Vacuum
Vacuum : Bast operation to reclaim storage and update statistics.
Syntax : VACUUM table_name
Vacuum Full : More intensive operation that rewrites the entire table, reclaiming more space but required more resource and locking the table.
Syntax : VACUUM FULL table_name;
Create table sample1(stri varchar(12),cpu int, memory bigint);
ALTER TABLE sample1 SET (autovacuum_enabled = false);
insert into sample1 select 'Hello World',random()* 100 as cpu , random() * 10 as memory from generate_series(1,10000);
\dt+ sample1 -- 536 kb
#delete from sample1 where cpu < 40;
#select count(*) from sample1;
#explain select * from sample1; Note :This is not give correct values
Vacuum Freeze:
- Vacuum freeze is a special kind of vacuum which marks rows as frozen.
- Vacuum Freeze marks a table's contents with a very special transaction timestamp that tells postgres that it does not need to be vacuumed ever.
- Vacuum Freeze reduces Autovacuum Overhead, Since frozen rows don't need future cleanup, PostgreSQL spends less time running autovacuum freeing up resource for other task.
- Queries run faster because PostgreSQL doesn't have to check transaction IDs for frozen rows , reducing unnecessary computations.
- Think of it like preserving old documents - Once archived, They don't need further updates, making retrieval faster and safer.
Autovacuum Daemon :
- Autovacuum feature is used to automate the execution of VACUUM and ANALYZE commands.
- Autovacuum checks for tables that have had a large number of insert , updated or deleted tuples based on statistics collection.
- Autovacuum launcher is in charge of starting autovacuum worker process for all databases.
- Launcher will start once worker within each database every autovacuum_naptime seconds.
- Workers check for inserts, update and deletes and execute vacuum and analyze if needed.
View Autovaccum settings.
select * from pg_settings where name like '%autovacuum%'
Best practices for Autovaccum in PostgreSQL :
Enable Autovacuum : Postgresql autovacuum feature is designed to automatically manage dead tuples and ensure tables remain efficient. It is crucial to verify that this feature is enable and properly configure in your database.
- Default Configuration: Autovacuum is enable by default in PostgreSQL , but confirming its activation is essential. Check your postgresql.conf or use the Show autovacuum; command to verify .
- Advantages : Autovaccum reduces the need for manual intervention and ensures continuous maintenance , particularly for busy databases.
- Configuration Tips : Fine-tuning setting like 'autovacuum_max_workers' ensures that multiple tables can be vacuumed concurrently , preventing resource contenton.
- Monitor Table Bloat : Table bloat, caused by the accumulation of dead tuples, can severely impact database performance, Regularly monitoring table bloat allows you to identify when manual vacuuming is necessary .
- Use Built-in Tools: PostgreSQL provides the pg_stat_all_tables view to check the presence of dead tuples in tables. By querying this view , you can pinpoint tables needing attention.
- Third-Party Utilities : pg_bloat_check or pgstatuple offer more detailed insights into bloated tables.
- Adjust Autovaccuum setting : Fine-tuning autovacuum parameters ensure that the vacuuming processing aligns with your database workload and performance requirments.
- Thresholds and Scale Factors: Two critical settings, Autovaccum_vacuum_threshold autovacuum_vacuum_scale_factor determine when autovacuum triggers. Lower values make autovacuum more aggressive , suitable for frequently updated tables.
- Example : autovaccum_vacuum_threshold = 100
- autovacuum_vacuum_scale_factor = 0.2
- table size = 10,000 rows
- Then 100 + (0.2 * 10000 ) = 100 + 2000 = 2100
- Autovacuum will trigger only if the number of dead tuples execeeds 2100
User VACUUM FULL sparingly :
- vacuum full offers a complete by rewriting tables and removing all bloat, it locks the tables during the operation, potentially disruption database activity .
- Appropriate use Cases : Reserve Vaccum full for situations requiring drastic cleanup , such as after mass deletions.
- Impact : Since it looks the table , plan its execution during maintenance periods or downtime to minimize disruption.
- Periodic Execution : Schedule vaccuming tasks during low-traffic periods to avoid performance degradation.
- Benefits of ANALYSE : Running ANALYSE collects updated statistics, enabling the query planner to make better decisions for query optimization.
- Configuration : Adjust the maintance_work_mem parameter in postgresql.conf to allocate sufficient memory, especially for large tables.
Analyze Table Usage : Understanding the activity patterns of your tables helps in tailoring vacuuming strategis.
- Frequent updates : Tables that undergo frequent inserting, deletions, or updates may require more aggressive vacuum policies.
- Rarely Modified Tables : for tables with minimal changes , less frequent vacuuming is adequate.
How to Calculate Query and Cost and Explain Plan :
1. Suppose in your table 5000 Rows
TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';
reltuples | relpages
-----------+----------
849 | 41
(1 row)
TestDB=# explain select * from cropsmooth_annual_2025;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on cropsmooth_annual_2025 (cost=0.00..49.49 rows=849 width=153051)
(1 row)
Sorting Data : Using order by CREATE INDEX
TestDB=# explain select * from cropsmooth_annual_2025 order by vill_id ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using cropsmooth_annual_2025 on cropsmooth_annual_2025 (cost=0.42..140845.93 rows=481418 width=222614)
(1 row)
Note : It is increase execution time , If you reduce time you we have to index . Index is speed up query
How to query cost calculated : Cost of Query execution
Cost = number of pages * seq_page_cost + number of rows * cpu_tuple_cost
cpu_tuple_cost
is a PostgreSQL configuration parameter that tells the query planner the estimated cost of processing each row (tuple) during a query execution step (like reading, filtering, or sorting).=> select relpages from pg_class where relname ='cropsmooth_annual_2025' ;
Step 1: Find rel pages :
TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';
reltuples | relpages
-----------+----------
849 | 41
(1 row)
Step 2:
TestDB=# show seq_page_cost;
seq_page_cost
---------------
1
Step 3: Find no of rows
TestDB=# select count(*) from cropsmooth_annual_2025;
count
-------
849
Step 4 : CPU Tuple Cost
TestDB=# show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
TestDB=# explain select * from cropsmooth_annual_2025;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on cropsmooth_annual_2025 (cost=0.00..49.49 rows=849 width=153051)
(1 row)
TestDB=# select 41 * 1 + 849 * 0.01;
?column?
----------
49.49
Routing Reindexing :
- Insert , updated and deleted operations fragments the index over a period of time.
- A Fragmented index will have pages where logical order based on key value differs from the physical ordering inside the data file.
- Heavily fragmented indexes can degrade query performance because additional I/O is required to locate data to which the index points.
- Reindex rebuilds an index using the data stored in index table and eliminates empty spaces between pages.
- If performance is stable , you can afford to let fragmentation go higher (e.g. 20%)
- if queries are slowing down, keeping the threshold at 5% ensures indexes stay optimized.
- if storage space is a concern , a looser threshold like 20% might be better to avoid unnecessary reindexing.
- High-write database (frequent inserts/updates) - More fragmentation happens faster, so 5% might be preferable.
- reindex index <INDEX NAME>;
- If you notice performance degradation , stick to 5% to keep indexes optimal.
- If your workload is light, and reindexing takes too much time, 20% could be fine
Log File Maintenance :
- PostgreSQL logs are records of the database server's activity, errors, and system messages.
- These logs provide detailed information about the database activity and are essential for monitoring , debugging , and auditing .
- They are crucial for diagnosing problems in SQL queries , connection issues, and performance bottlenecks
- Message severity levels categorize log messages by their importance , helping administrators prioritize and address issues effectively .
- log verbosity setting is used to control how much information is captured in the logs.
- Log File Rotation : To prevent logs from consuming excessive disk space , PostgreSQL supports log rotation. The two primary method include :
- Time Based Rotation : PostgreSQL rotates logs at regular intervals using parameters like log_rotation_age.
- Size Base Rotation : Logs are rotated when they reach a specific size (log_rotation_size).
Time based Rotation :
#Enable logging
logging_collector = on
logging_collector = on
#Set log destination
log_destination = 'stderr'
#Define log storage directory
log_directory = 'C:/PostgreSQL/logs'
#Naming convention for log files
log_filename ='postgresql-%Y-%m-%d.log'
#Rotate logs every 24 hours
log_rotation_age = 1440
Size Based Rotation :
logging_collector = on
log_destination = 'stderr'
log_directory = '/var/log/postgresql'
log_filename = 'postgresql.log'
#Rotate logs when they reach 100MB
log_rotation_size = 100MB
#Overwrite old logs after rotation
log_truncate_on_rotation = on
No comments:
Post a Comment