Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: 归档过程偶现的死锁重试优化 #3403

Open
jsonwan opened this issue Feb 6, 2025 · 1 comment
Open

perf: 归档过程偶现的死锁重试优化 #3403

jsonwan opened this issue Feb 6, 2025 · 1 comment
Assignees
Labels
backlog 需求初始状态,等待产品进行评估 kind/enhancement 功能改进特性

Comments

@jsonwan
Copy link
Collaborator

jsonwan commented Feb 6, 2025

归档服务的错误日志告警阈值比较敏感,偶现的死锁错误日志容易造成误告。
错误堆栈:

[2025-02-06 02:05:14.867] ERROR [job-backup,c5072912f08fa141d508cf58bdcc4d79,390a40979cb6b5ff] 7 --- [ArchiveWorker-1:20250106:4:0:ds_standalone] t.b.j.b.a.AbstractJobInstanceArchiveTask : [1:20250106:4:0:ds_standalone] Error while execute archive task

org.jooq.exception.DataAccessException: SQL [delete from `job_execute`.`gse_script_execute_obj_task` where `job_execute`.`gse_script_execute_obj_task`.`task_instance_id` in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) limit ?]; Deadlock found when trying to get lock; try restarting transaction
at org.jooq_3.14.8.MYSQL.debug(Unknown Source)
at org.jooq.impl.Tools.translate(Tools.java:2903)
at org.jooq.impl.DefaultExecuteContext.sqlException(DefaultExecuteContext.java:757)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:389)
at org.jooq.impl.AbstractDelegatingQuery.execute(AbstractDelegatingQuery.java:119)
at com.tencent.bk.job.backup.archive.dao.impl.AbstractJobInstanceHotRecordDAO.deleteWithLimit(AbstractJobInstanceHotRecordDAO.java:85)
at com.tencent.bk.job.backup.archive.dao.impl.AbstractJobInstanceHotRecordDAO.deleteRecords(AbstractJobInstanceHotRecordDAO.java:69)
at com.tencent.bk.job.backup.archive.impl.AbstractJobInstanceSubTableArchiver.deleteRecords(AbstractJobInstanceSubTableArchiver.java:109)
at com.tencent.bk.job.backup.archive.JobInstanceMainDataArchiveTask.lambda$deleteJobInstanceHotData$1(JobInstanceMainDataArchiveTask.java:114)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at com.tencent.bk.job.backup.archive.JobInstanceMainDataArchiveTask.deleteJobInstanceHotData(JobInstanceMainDataArchiveTask.java:113)
at com.tencent.bk.job.backup.archive.AbstractJobInstanceArchiveTask.backupAndDelete(AbstractJobInstanceArchiveTask.java:283)
at com.tencent.bk.job.backup.archive.AbstractJobInstanceArchiveTask.archive(AbstractJobInstanceArchiveTask.java:210)
at com.tencent.bk.job.backup.archive.AbstractJobInstanceArchiveTask.execute(AbstractJobInstanceArchiveTask.java:134)
at com.tencent.bk.job.backup.archive.ArchiveTaskWorker.run(ArchiveTaskWorker.java:52)
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:123)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:354)
at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryStatement.wrapCall(OpenTelemetryStatement.java:294)
at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryPreparedStatement.execute(OpenTelemetryPreparedStatement.java:70)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
at org.jooq.tools.jdbc.DefaultPreparedStatement.execute(DefaultPreparedStatement.java:214)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:458)
at org.jooq.impl.AbstractDMLQuery.execute(AbstractDMLQuery.java:947)
at org.jooq.impl.AbstractQuery.execute(AbstractQuery.java:375)
... 11 common frames omitted

DBA反馈:不是死锁, 重试下就行了,这个时间点是在校验, 会有短时间锁表获取一致性镜像,加大 lock waittimeout, 然后增加重试即可。

@jsonwan jsonwan added backlog 需求初始状态,等待产品进行评估 kind/enhancement 功能改进特性 labels Feb 6, 2025
@jsonwan jsonwan self-assigned this Feb 6, 2025
@jsonwan
Copy link
Collaborator Author

jsonwan commented Feb 24, 2025

MySQL 不自动重试死锁事务,需由应用程序捕获错误并决定是否重试。
重试逻辑应谨慎设计,确保幂等性,限制最大次数,避免衍生问题。

@jsonwan jsonwan changed the title perf: 归档过程偶现的死锁错误日志优化 perf: 归档过程偶现的死锁重试优化 Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog 需求初始状态,等待产品进行评估 kind/enhancement 功能改进特性
Projects
None yet
Development

No branches or pull requests

1 participant