technical skills grow

Responsive Ads Here

Wednesday, August 13, 2025

PostgreSQL Maintenance

 Database maintenance is essential for ensuring the optimal performance, reliability and security of your database systems.

All databases requires some kind of maintenance tasks be performed regularly to achieve optimum performance.

Maintenance task are ongoing and repetitive which are ideally automated and schedule from cron scripts and task scheduler 

  • Routine vacuuming
  • Reindex
  • Log File Maintenance 
Vacuuming Basics : PostgreSQL databases require periodic maintenance known as vacuuming. 

PostgreSQL includes an auto vacuum process that automatically runs vacuum and analyze operations.

VACUUM command has to process each table on a regular basis for several reasons:

  • To recover or reuse disk space occupied by updated or deleted rows.
  • To update data statistics used by the PostgreSQL query planner.
  • To update the visibility map, which speeds up index-only scans.
  • To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
Note : Delete row it is not seen but occupied space and same case with update . This is vacuuming come to picture it is removed all this rows .

Variants of Vacuum :
There are two variants of VACUUM:
  • Vacuum
  • Full Vacuum 
Vacuum : Bast operation to reclaim storage and update statistics.
                    Syntax : VACUUM table_name
Vacuum Full : More intensive operation that rewrites the entire table, reclaiming more space but required more resource and locking the table.
                   Syntax : VACUUM FULL table_name;



Demo : Vacuuming

Create table sample1(stri varchar(12),cpu int, memory bigint);

ALTER TABLE sample1 SET (autovacuum_enabled = false);

insert into sample1 select 'Hello World',random()* 100 as cpu , random() * 10 as memory from generate_series(1,10000);

\dt+ sample1 -- 536 kb

#delete from sample1 where cpu < 40;

#select count(*) from sample1;

#explain select * from sample1; Note :This is not give correct values


aws=# +\x
Expanded display is on.

aws=# select * from pg_stat_user_tables where relname='sample1';


aws=# vacuum(full,analyze,verbose) sample1;

Vacuum Freeze:
  • Vacuum freeze is a special kind of vacuum which marks rows as frozen.
  • Vacuum Freeze marks a table's contents with a very special transaction timestamp that tells postgres that it does not need to be vacuumed ever.
  • Vacuum Freeze reduces Autovacuum Overhead, Since frozen rows don't need future cleanup, PostgreSQL spends less time running autovacuum freeing up resource for other task.
  • Queries run faster because PostgreSQL doesn't have to check transaction IDs for frozen rows , reducing unnecessary computations.
  • Think of it like preserving old documents - Once archived, They don't need further updates, making retrieval faster and safer.

Autovacuum Daemon :
  • Autovacuum feature is used to automate the execution of VACUUM and ANALYZE commands.
  • Autovacuum checks for tables that have had a large number of insert , updated or deleted tuples based on statistics collection.
  • Autovacuum launcher is in charge of starting autovacuum worker process for all databases.
  • Launcher will start once worker within each database every autovacuum_naptime seconds.
  • Workers check for inserts, update and deletes and execute vacuum and analyze if needed.
View Autovaccum settings.

    select * from pg_settings where name like '%autovacuum%'

Best practices for Autovaccum in PostgreSQL :

Enable Autovacuum : Postgresql autovacuum feature is designed to automatically manage dead tuples and ensure tables remain efficient. It is crucial to verify that this feature is enable and properly configure in your database.

  • Default Configuration: Autovacuum is enable by default in PostgreSQL , but confirming its activation is essential. Check your postgresql.conf or use the Show autovacuum; command to verify .
  • Advantages : Autovaccum reduces the need for manual intervention and ensures continuous maintenance , particularly for busy databases.
  • Configuration Tips : Fine-tuning setting like 'autovacuum_max_workers' ensures      that multiple tables can be vacuumed concurrently , preventing resource contenton.
  • Monitor Table Bloat : Table bloat, caused by the accumulation of dead tuples, can severely impact database performance, Regularly monitoring table bloat allows you to identify when manual vacuuming is necessary .
  • Use Built-in Tools: PostgreSQL provides the pg_stat_all_tables view to check the presence of dead tuples in tables. By querying this view , you can pinpoint tables needing attention.
  • Third-Party Utilities : pg_bloat_check or pgstatuple offer more detailed insights into bloated tables.
  • Adjust Autovaccuum setting : Fine-tuning autovacuum parameters ensure that the vacuuming processing aligns with your database workload and performance requirments.
  • Thresholds and Scale Factors: Two critical settings, Autovaccum_vacuum_threshold autovacuum_vacuum_scale_factor determine when autovacuum triggers. Lower values make autovacuum more aggressive , suitable for frequently updated tables.
  • Example : autovaccum_vacuum_threshold = 100
  • autovacuum_vacuum_scale_factor = 0.2 
  • table size = 10,000 rows
  • Then 100 + (0.2 * 10000 ) = 100 + 2000 = 2100
  • Autovacuum will trigger only if the number of dead tuples execeeds 2100                              
User VACUUM FULL sparingly : 
  • vacuum full offers a complete by rewriting tables and removing all bloat, it locks the tables during the operation, potentially disruption database activity .
  • Appropriate use Cases : Reserve Vaccum full for situations requiring drastic cleanup , such as after mass deletions.
  • Impact : Since it looks the table , plan its execution during maintenance periods or downtime to minimize disruption.
Schedule Regular Maintenance : Even with autovacuum enabled , supplementing it with manual VACUUM ANALYZE ensures optimal database performance 
  •  Periodic Execution : Schedule vaccuming tasks during low-traffic periods to avoid performance degradation.
  • Benefits of ANALYSE : Running ANALYSE collects updated statistics, enabling the query planner to make better decisions for query optimization.
Increase maintenance_work_men : Allocating more memory to vacuum operations can significantly speed up the process.
  • Configuration : Adjust the maintance_work_mem parameter in postgresql.conf to allocate sufficient memory, especially for large tables.
Performace Gains : Higher memory allocation reduces the time needed to process larger datasets during vacuuming.
Analyze Table Usage : Understanding the activity patterns of your tables helps in tailoring vacuuming strategis.
  • Frequent updates : Tables that undergo frequent inserting, deletions, or updates may require more aggressive vacuum policies.
  • Rarely Modified Tables : for tables with minimal changes , less frequent vacuuming is adequate.

How to Calculate Query and Cost and Explain Plan : 

1. Suppose in your table 5000 Rows 


TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';
 reltuples | relpages
-----------+----------
       849 |       41
(1 row)


TestDB=# explain select * from cropsmooth_annual_2025;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on cropsmooth_annual_2025  (cost=0.00..49.49 rows=849 width=153051)
(1 row)


Sorting Data : Using order by  CREATE INDEX 


TestDB=# explain select * from cropsmooth_annual_2025  order by  vill_id  ;
                                                                         QUERY PLAN                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cropsmooth_annual_2025 on cropsmooth_annual_2025  (cost=0.42..140845.93 rows=481418 width=222614)
(1 row)

Note : It is increase execution time  , If you reduce time you we have to index . Index is speed up query 

How to query cost calculated :  Cost of Query execution 

Cost = number of pages * seq_page_cost + number of rows * cpu_tuple_cost

cpu_tuple_cost is a PostgreSQL configuration parameter that tells the query planner the estimated cost of processing each row (tuple) during a query execution step (like reading, filtering, or sorting).

=> select relpages from pg_class where relname ='cropsmooth_annual_2025' ;

Step 1: Find rel pages : 

TestDB=# select reltuples,relpages from pg_class where relname='cropsmooth_annual_2025';
 reltuples | relpages
-----------+----------
       849 |       41
(1 row)

Step 2: 
TestDB=# show seq_page_cost;
 seq_page_cost
---------------
 1

Step 3: Find no of rows 
TestDB=# select count(*) from cropsmooth_annual_2025;
 count
-------
   849

Step 4 : CPU Tuple Cost 
TestDB=# show cpu_tuple_cost;
 cpu_tuple_cost
----------------
 0.01

TestDB=# explain select * from cropsmooth_annual_2025;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on cropsmooth_annual_2025  (cost=0.00..49.49 rows=849 width=153051)
(1 row)

TestDB=# select 41 * 1 + 849 * 0.01;
 ?column?
----------
    49.49


Routing Reindexing :
  • Insert , updated and deleted operations fragments the index over a period of time.
  • A Fragmented index will have pages where logical order based on key value differs from the physical ordering inside the data file.
  • Heavily fragmented indexes can degrade query performance because additional I/O is required to locate data to which the index points.
  • Reindex rebuilds an index using the data stored in index table and eliminates empty spaces between pages.
  • If performance is stable , you can afford to let fragmentation go higher (e.g. 20%)
  • if queries are slowing down, keeping the threshold at 5% ensures indexes stay optimized.
  • if storage space is a concern , a looser threshold like  20% might be better to avoid unnecessary reindexing.
  • High-write database (frequent inserts/updates) - More fragmentation happens faster, so 5% might be preferable.
  • reindex index <INDEX NAME>;
Recommended Strategy 
  • If you notice performance degradation , stick to 5% to keep indexes optimal.
  • If your workload is light, and reindexing takes too much time, 20% could be fine
Log File Maintenance :
  • PostgreSQL logs are records of the database server's activity, errors, and system messages.
  • These logs provide detailed information about the database activity and are essential for monitoring , debugging , and auditing .
  • They are crucial for diagnosing problems in SQL queries , connection issues, and performance bottlenecks 
  • Message severity levels categorize log messages by their importance , helping administrators prioritize and address issues effectively .
  • log verbosity setting is used to control how much information is captured in the logs.
Log Management :
  • Log File Rotation : To prevent logs from consuming excessive disk space , PostgreSQL supports log rotation. The two primary method include :
  • Time Based Rotation : PostgreSQL rotates logs at regular intervals using parameters like log_rotation_age.
  • Size Base Rotation : Logs are rotated when they reach a specific size (log_rotation_size).
Time based Rotation : 
#Enable logging 
logging_collector = on 
#Set log destination 
log_destination = 'stderr'
#Define log storage directory 
log_directory = 'C:/PostgreSQL/logs'
#Naming convention for log files
log_filename ='postgresql-%Y-%m-%d.log'
#Rotate logs every 24 hours 
log_rotation_age = 1440      

Size Based Rotation :
logging_collector = on 
log_destination = 'stderr'
log_directory = '/var/log/postgresql'
log_filename = 'postgresql.log'
#Rotate logs when they reach 100MB
log_rotation_size = 100MB
#Overwrite old logs after rotation
log_truncate_on_rotation = on



No comments:

Post a Comment

Powered by Blogger.

Labels

Contact Form

Name

Email *

Message *

Search This Blog

Blog Archive

Ad Code

Responsive Advertisement

Recent Posts