PostgreSQL Maintenance

Database maintenance is essential for ensuring the optimal performance, reliability and security of your database systems.

All databases requires some kind of maintenance tasks be performed regularly to achieve optimum performance.

Maintenance task are ongoing and repetitive which are ideally automated and schedule from cron scripts and task scheduler

Routine vacuuming
Reindex
Log File Maintenance

Vacuuming Basics : PostgreSQL databases require periodic maintenance known as vacuuming.

PostgreSQL includes an auto vacuum process that automatically runs vacuum and analyze operations.

VACUUM command has to process each table on a regular basis for several reasons:

To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.

Note : Delete row it is not seen but occupied space and same case with update . This is vacuuming come to picture it is removed all this rows .

Variants of Vacuum :

There are two variants of VACUUM:

Vacuum
Full Vacuum

Vacuum : Bast operation to reclaim storage and update statistics.

Syntax : VACUUM table_name

Vacuum Full : More intensive operation that rewrites the entire table, reclaiming more space but required more resource and locking the table.

Syntax : VACUUM FULL table_name;

Demo : Vacuuming

Create table sample1(stri varchar(12),cpu int, memory bigint);

ALTER TABLE sample1 SET (autovacuum_enabled = false);

insert into sample1 select 'Hello World',random()* 100 as cpu , random() * 10 as memory from generate_series(1,10000);

\dt+ sample1 -- 536 kb

#delete from sample1 where cpu < 40;

#select count(*) from sample1;

#explain select * from sample1; Note :This is not give correct values

aws=# +\x
Expanded display is on.

aws=# select * from pg_stat_user_tables where relname='sample1';

aws=# vacuum(full,analyze,verbose) sample1;

Vacuum Freeze:

Vacuum freeze is a special kind of vacuum which marks rows as frozen.
Vacuum Freeze marks a table's contents with a very special transaction timestamp that tells postgres that it does not need to be vacuumed ever.
Vacuum Freeze reduces Autovacuum Overhead, Since frozen rows don't need future cleanup, PostgreSQL spends less time running autovacuum freeing up resource for other task.
Queries run faster because PostgreSQL doesn't have to check transaction IDs for frozen rows , reducing unnecessary computations.
Think of it like preserving old documents - Once archived, They don't need further updates, making retrieval faster and safer.

Autovacuum Daemon :

Autovacuum feature is used to automate the execution of VACUUM and ANALYZE commands.
Autovacuum checks for tables that have had a large number of insert , updated or deleted tuples based on statistics collection.
Autovacuum launcher is in charge of starting autovacuum worker process for all databases.
Launcher will start once worker within each database every autovacuum_naptime seconds.
Workers check for inserts, update and deletes and execute vacuum and analyze if needed.

View Autovaccum settings.

select * from pg_settings where name like '%autovacuum%'

Best practices for Autovaccum in PostgreSQL :

Enable Autovacuum : Postgresql autovacuum feature is designed to automatically manage dead tuples and ensure tables remain efficient. It is crucial to verify that this feature is enable and properly configure in your database.

Default Configuration: Autovacuum is enable by default in PostgreSQL , but confirming its activation is essential. Check your postgresql.conf or use the Show autovacuum; command to verify .
Advantages : Autovaccum reduces the need for manual intervention and ensures continuous maintenance , particularly for busy databases.
Configuration Tips : Fine-tuning setting like 'autovacuum_max_workers' ensures that multiple tables can be vacuumed concurrently , preventing resource contenton.
Monitor Table Bloat : Table bloat, caused by the accumulation of dead tuples, can severely impact database performance, Regularly monitoring table bloat allows you to identify when manual vacuuming is necessary .
Use Built-in Tools: PostgreSQL provides the pg_stat_all_tables view to check the presence of dead tuples in tables. By querying this view , you can pinpoint tables needing attention.
Third-Party Utilities : pg_bloat_check or pgstatuple offer more detailed insights into bloated tables.
Adjust Autovaccuum setting : Fine-tuning autovacuum parameters ensure that the vacuuming processing aligns with your database workload and performance requirments.
Thresholds and Scale Factors: Two critical settings, Autovaccum_vacuum_threshold autovacuum_vacuum_scale_factor determine when autovacuum triggers. Lower values make autovacuum more aggressive , suitable for frequently updated tables.
Example : autovaccum_vacuum_threshold = 100
autovacuum_vacuum_scale_factor = 0.2
table size = 10,000 rows
Then 100 + (0.2 * 10000 ) = 100 + 2000 = 2100
Autovacuum will trigger only if the number of dead tuples execeeds 2100

User VACUUM FULL sparingly :

vacuum full offers a complete by rewriting tables and removing all bloat, it locks the tables during the operation, potentially disruption database activity .
Appropriate use Cases : Reserve Vaccum full for situations requiring drastic cleanup , such as after mass deletions.
Impact : Since it looks the table , plan its execution during maintenance periods or downtime to minimize disruption.

Schedule Regular Maintenance : Even with autovacuum enabled , supplementing it with manual VACUUM ANALYZE ensures optimal database performance

Periodic Execution : Schedule vaccuming tasks during low-traffic periods to avoid performance degradation.
Benefits of ANALYSE : Running ANALYSE collects updated statistics, enabling the query planner to make better decisions for query optimization.

Increase maintenance_work_men : Allocating more memory to vacuum operations can significantly speed up the process.

Configuration : Adjust the maintance_work_mem parameter in postgresql.conf to allocate sufficient memory, especially for large tables.

Performace Gains : Higher memory allocation reduces the time needed to process larger datasets during vacuuming.
Analyze Table Usage : Understanding the activity patterns of your tables helps in tailoring vacuuming strategis.

Frequent updates : Tables that undergo frequent inserting, deletions, or updates may require more aggressive vacuum policies.
Rarely Modified Tables : for tables with minimal changes , less frequent vacuuming is adequate.

How to Calculate Query and Cost and Explain Plan :

1. Suppose in your table 5000 Rows

TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';

reltuples | relpages

-----------+----------

849 | 41

(1 row)

TestDB=# explain select * from cropsmooth_annual_2025;

QUERY PLAN

-----------------------------------------------------------------------------------------------

Seq Scan on cropsmooth_annual_2025 (cost=0.00..49.49 rows=849 width=153051)

(1 row)

Sorting Data : Using order by CREATE INDEX

TestDB=# explain select * from cropsmooth_annual_2025 order by vill_id ;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using cropsmooth_annual_2025 on cropsmooth_annual_2025 (cost=0.42..140845.93 rows=481418 width=222614)

(1 row)

Note : It is increase execution time , If you reduce time you we have to index . Index is speed up query

How to query cost calculated : Cost of Query execution

Cost = number of pages * seq_page_cost + number of rows * cpu_tuple_cost

cpu_tuple_cost is a PostgreSQL configuration parameter that tells the query planner the estimated cost of processing each row (tuple) during a query execution step (like reading, filtering, or sorting).

=> select relpages from pg_class where relname ='cropsmooth_annual_2025' ;

Step 1: Find rel pages :

TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';

reltuples | relpages

-----------+----------

849 | 41

(1 row)

Step 2:

TestDB=# show seq_page_cost;

seq_page_cost

---------------

Step 3: Find no of rows

TestDB=# select count(*) from cropsmooth_annual_2025;

count

-------

849

Step 4 : CPU Tuple Cost

TestDB=# show cpu_tuple_cost;

cpu_tuple_cost

----------------

0.01

TestDB=# explain select * from cropsmooth_annual_2025;

QUERY PLAN

-----------------------------------------------------------------------------------------------

Seq Scan on cropsmooth_annual_2025 (cost=0.00..49.49 rows=849 width=153051)

(1 row)

TestDB=# select 41 * 1 + 849 * 0.01;

?column?

----------

49.49

Routing Reindexing :

Insert , updated and deleted operations fragments the index over a period of time.
A Fragmented index will have pages where logical order based on key value differs from the physical ordering inside the data file.
Heavily fragmented indexes can degrade query performance because additional I/O is required to locate data to which the index points.
Reindex rebuilds an index using the data stored in index table and eliminates empty spaces between pages.
If performance is stable , you can afford to let fragmentation go higher (e.g. 20%)
if queries are slowing down, keeping the threshold at 5% ensures indexes stay optimized.
if storage space is a concern , a looser threshold like 20% might be better to avoid unnecessary reindexing.
High-write database (frequent inserts/updates) - More fragmentation happens faster, so 5% might be preferable.
reindex index <INDEX NAME>;

Recommended Strategy

If you notice performance degradation , stick to 5% to keep indexes optimal.
If your workload is light, and reindexing takes too much time, 20% could be fine

Log File Maintenance :

PostgreSQL logs are records of the database server's activity, errors, and system messages.
These logs provide detailed information about the database activity and are essential for monitoring , debugging , and auditing .
They are crucial for diagnosing problems in SQL queries , connection issues, and performance bottlenecks
Message severity levels categorize log messages by their importance , helping administrators prioritize and address issues effectively .
log verbosity setting is used to control how much information is captured in the logs.

Log Management :

Log File Rotation : To prevent logs from consuming excessive disk space , PostgreSQL supports log rotation. The two primary method include :
Time Based Rotation : PostgreSQL rotates logs at regular intervals using parameters like log_rotation_age.
Size Base Rotation : Logs are rotated when they reach a specific size (log_rotation_size).

Time based Rotation :

#Enable logging
logging_collector = on

#Set log destination

log_destination = 'stderr'

#Define log storage directory

log_directory = 'C:/PostgreSQL/logs'

#Naming convention for log files

log_filename ='postgresql-%Y-%m-%d.log'

#Rotate logs every 24 hours

log_rotation_age = 1440

Size Based Rotation :

logging_collector = on

log_destination = 'stderr'

log_directory = '/var/log/postgresql'

log_filename = 'postgresql.log'

#Rotate logs when they reach 100MB

log_rotation_size = 100MB

#Overwrite old logs after rotation

log_truncate_on_rotation = on

linuxlove

Wednesday, August 13, 2025

PostgreSQL Maintenance

No comments:

Post a Comment

Report Abuse

About Me

Labels

Contact Form

Search This Blog

Blog Archive

Ad Code

Recent Posts

Pages

Labels

BTemplates.com

linux lover is provide real world problem solution

Recent

Popular

Comments

Follow Us

Subscribe Us

Facebook