0 前言
1、2部分是對(duì)XLOG生成和清理邏輯的分析,XLOG暴漲的處理直接看第3部分。
1 WAL歸檔
# 在自動(dòng)的WAL檢查點(diǎn)之間的日志文件段的最大數(shù)量
checkpoint_segments =
# 在自動(dòng)WAL檢查點(diǎn)之間的最長(zhǎng)時(shí)間
checkpoint_timeout =
# 緩解io壓力
checkpoint_completion_target =
# 日志文件段的保存最小數(shù)量,為了備庫(kù)保留更多段
wal_keep_segments =
# 已完成的WAL段通過(guò)archive_command發(fā)送到歸檔存儲(chǔ)
archive_mode =
# 強(qiáng)制timeout切換到新的wal段文件
archive_timeout =
max_wal_size =
min_wal_size =
1.1 不開(kāi)啟歸檔時(shí)
文件數(shù)量受下面幾個(gè)參數(shù)控制,通常不超過(guò)
(2 + checkpoint_completion_target) * checkpoint_segments + 1
或
checkpoint_segments + wal_keep_segments + 1
個(gè)文件。
如果一個(gè)舊段文件不再需要了會(huì)重命名然后繼續(xù)覆蓋使用,如果由于短期的日志輸出高峰導(dǎo)致了超過(guò)
3 * checkpoint_segments + 1
個(gè)文件,直接刪除文件。
1.2 開(kāi)啟歸檔時(shí)
文件數(shù)量:刪除歸檔成功的段文件
抽象來(lái)看一個(gè)運(yùn)行的PG生成一個(gè)無(wú)限長(zhǎng)的WAL日志序列。每段16M,這些段文件的名字是數(shù)值命名的,反映在WAL序列中的位置。在不用WAL歸檔的時(shí)候,系統(tǒng)通常只是創(chuàng)建幾個(gè)段文件然后循環(huán)使用,方法是把不再使用的段文件重命名為更高的段編號(hào)。
當(dāng)且僅當(dāng)歸檔命令成功時(shí),歸檔命令返回零。 在得到一個(gè)零值結(jié)果之后,PostgreSQL將假設(shè)該WAL段文件已經(jīng)成功歸檔,稍后將刪除段文件。一個(gè)非零值告訴PostgreSQL該文件沒(méi)有被歸檔,會(huì)周期性的重試直到成功。
2 PG源碼分析
2.1 刪除邏輯
觸發(fā)刪除動(dòng)作
RemoveOldXlogFiles
> CreateCheckPoint
> CreateRestartPoint
wal_keep_segments判斷(調(diào)用這個(gè)函數(shù)修改_logSegNo,然后再傳入RemoveOldXlogFiles)
static void
KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
{
XLogSegNo segno;
XLogRecPtr keep;
XLByteToSeg(recptr, segno);
keep = XLogGetReplicationSlotMinimumLSN();
/* compute limit for wal_keep_segments first */
if (wal_keep_segments > 0)
{
/* avoid underflow, don't go below 1 */
if (segno = wal_keep_segments)
segno = 1;
else
segno = segno - wal_keep_segments;
}
/* then check whether slots limit removal further */
if (max_replication_slots > 0 keep != InvalidXLogRecPtr)
{
XLogSegNo slotSegNo;
XLByteToSeg(keep, slotSegNo);
if (slotSegNo = 0)
segno = 1;
else if (slotSegNo segno)
segno = slotSegNo;
}
/* don't delete WAL segments newer than the calculated segment */
if (segno *logSegNo)
*logSegNo = segno;
}
刪除邏輯
static void
RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
{
...
...
while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
{
/* Ignore files that are not XLOG segments */
if (strlen(xlde->d_name) != 24 ||
strspn(xlde->d_name, "0123456789ABCDEF") != 24)
continue;
/*
* We ignore the timeline part of the XLOG segment identifiers in
* deciding whether a segment is still needed. This ensures that we
* won't prematurely remove a segment from a parent timeline. We could
* probably be a little more proactive about removing segments of
* non-parent timelines, but that would be a whole lot more
* complicated.
*
* We use the alphanumeric sorting property of the filenames to decide
* which ones are earlier than the lastoff segment.
*/
if (strcmp(xlde->d_name + 8, lastoff + 8) = 0)
{
if (XLogArchiveCheckDone(xlde->d_name))
# 歸檔關(guān)閉返回真
# 存在done文件返回真
# 存在.ready返回假
# recheck存在done文件返回真
# 重建.ready文件返回假
{
/* Update the last removed location in shared memory first */
UpdateLastRemovedPtr(xlde->d_name);
# 回收 或者 直接刪除,清理.done和.ready文件
RemoveXlogFile(xlde->d_name, endptr);
}
}
}
...
...
}
2.2 歸檔邏輯
static void
pgarch_ArchiverCopyLoop(void)
{
char xlog[MAX_XFN_CHARS + 1];
# 拿到最老那個(gè)沒(méi)有被歸檔的xlog文件名
while (pgarch_readyXlog(xlog))
{
int failures = 0;
for (;;)
{
/*
* Do not initiate any more archive commands after receiving
* SIGTERM, nor after the postmaster has died unexpectedly. The
* first condition is to try to keep from having init SIGKILL the
* command, and the second is to avoid conflicts with another
* archiver spawned by a newer postmaster.
*/
if (got_SIGTERM || !PostmasterIsAlive())
return;
/*
* Check for config update. This is so that we'll adopt a new
* setting for archive_command as soon as possible, even if there
* is a backlog of files to be archived.
*/
if (got_SIGHUP)
{
got_SIGHUP = false;
ProcessConfigFile(PGC_SIGHUP);
}
# archive_command沒(méi)設(shè)的話不再執(zhí)行
# 我們的command沒(méi)有設(shè)置,走的是這個(gè)分支
if (!XLogArchiveCommandSet())
{
/*
* Change WARNING to DEBUG1, since we will left archive_command empty to
* let external tools to manage archive
*/
ereport(DEBUG1,
(errmsg("archive_mode enabled, yet archive_command is not set")));
return;
}
# 執(zhí)行歸檔命令!
if (pgarch_archiveXlog(xlog))
{
# 成功了,把.ready改名為.done
pgarch_archiveDone(xlog);
/*
* Tell the collector about the WAL file that we successfully
* archived
*/
pgstat_send_archiver(xlog, false);
break; /* out of inner retry loop */
}
else
{
/*
* Tell the collector about the WAL file that we failed to
* archive
*/
pgstat_send_archiver(xlog, true);
if (++failures >= NUM_ARCHIVE_RETRIES)
{
ereport(WARNING,
(errmsg("archiving transaction log file \"%s\" failed too many times, will try again later",
xlog)));
return; /* give up archiving for now */
}
pg_usleep(1000000L); /* wait a bit before retrying */
}
}
}
}
2.3 ready生成邏輯
static void
XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
{
...
if (finishing_seg)
{
issue_xlog_fsync(openLogFile, openLogSegNo);
/* signal that we need to wakeup walsenders later */
WalSndWakeupRequest();
LogwrtResult.Flush = LogwrtResult.Write; /* end of page */
# 歸檔打開(kāi) wal_level >= archive
if (XLogArchivingActive())
# 生成ready文件
XLogArchiveNotifySeg(openLogSegNo);
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
...
2.4 總結(jié)
ready文件只要滿足archive_mode=on和wal_lever>=archive,就總會(huì)生成(XLogWrite函數(shù)調(diào)用生成)
因?yàn)閍rchive_command設(shè)置空,所以ready文件的消費(fèi)完全由外部程序控制
done文件的處理由PG完成,兩個(gè)地方會(huì)觸發(fā)done文件處理,檢查點(diǎn)和重啟點(diǎn)
處理多少done文件受wal_keep_segments和replication_slot控制(KeepLogSeg函數(shù))
3 WAL段累積的原因(長(zhǎng)求總?)
注意:無(wú)論如何注意不要手動(dòng)刪除xlog文件
注意:checkpoint產(chǎn)生的日志回不立即生成ready文件,是在下一個(gè)xlog后一塊生成的
3.1 ReplicationSlot
打開(kāi)流了復(fù)制槽
-- 流復(fù)制插槽
-- 如果restart_lsn和當(dāng)前XLOG相差非常大的字節(jié)數(shù), 需要排查slot的訂閱者是否能正常接收XLOG,
-- 或者訂閱者是否正常. 長(zhǎng)時(shí)間不將slot的數(shù)據(jù)取走, pg_xlog目錄可能會(huì)撐爆
select pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn), *
from pg_replication_slots;
刪除
select pg_drop_replication_slot('xxx');
刪除后PG會(huì)在下一個(gè)checkpoint清理xlog
3.2 較大的wal_keep_segments
檢查參數(shù)配置,注意打開(kāi)這個(gè)參數(shù)會(huì)使xlog和ready有一定延遲
3.3 回收出現(xiàn)問(wèn)題
如果不使用PG自動(dòng)回收機(jī)制,數(shù)據(jù)庫(kù)依賴外部程序修改.ready文件,需要檢測(cè)回收進(jìn)程
(archive_mode=on archive_command='')
3.4 檢查點(diǎn)間隔過(guò)長(zhǎng)
檢查參數(shù)配置
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教。
您可能感興趣的文章:- postgresql 性能參數(shù)配置方式
- postgresql SQL語(yǔ)句變量的使用說(shuō)明
- postgresql 實(shí)現(xiàn)查詢出的數(shù)據(jù)為空,則設(shè)為0的操作
- postgresql 補(bǔ)齊空值、自定義查詢字段并賦值操作
- Postgresql去重函數(shù)distinct的用法說(shuō)明
- postgresql 12版本搭建及主備部署操作
- 淺談PostgreSQL和SQLServer的一些差異