版本:Doris Version: 2.1.2

环境:DorisFE 2台 DorisBE 4台

Doris集群版本搭建详细教程:Apache Doris 2.x 版本【保姆级】安装+使用教程_system has no available disk capacity or no availa-CSDN博客

在确认服务器资源都没有问题的情况下,发生下面情况:

一:问题复现:

在2024-04月时,Doris数据库升级到了2.1.2版本。这次版本升级可能引发了一些问题,影响到了动态分区的创建。通常情况下,系统会根据我的设置自动每月创建未来两个月的分区,但到了7月份时,分区未能正常创建,导致数据无法正确写入。简单来说,就是在4月份的时候我的动态分区的表根据我设置的"dynamic_partition.end" = "2"值已经创建好了5、6月的分区,之后我升级到了1.2.1。之后5月6月没有在继续创建动态分区,以至于在当前7月份数据没有成功写入。


二:问题原因:

问题追溯:

此问题涉及到创建分区表时使用的关键配置项 dynamic_partition.start 和 dynamic_partition.history_partition_num。由于去年对 dynamic_partition.start 的设置值与当时时间的间隔过短,这可能导致分区失效。比如我当时设置的dynamic_partition.start = -10,则分区范围在次偏移之前的分区将会被删除,也就是说-10只保存历史10个分区。所以当时发现这个问题后将 start 值调整为 -656521,以尝试解决这一问题。

问题原因:

在 Doris 2.1.2 版本中,由于 "dynamic_partition.start" 的值设定为 "-656521" 过小,这导致了问题。在此版本中,"dynamic_partition.start" 的值不应过小,因为如果当前时间加上这个偏移值超过了1970-01-01,系统可能会出现问题。"dynamic_partition.history_partition_num" 的情况也是类似的。理论上,start 和 history_partition_num 两者功能相似,我们推荐只保留其中一个即可。
由于 start 值过小,导致了动态分区的轮询线程异常终止,从而不再执行其他表的操作。为解决这个问题,只需将这些值调整到合适的范围,避免设置过小,即可恢复正常。

而数据没有写入进去,在DorisFE的fe.warn.log中有详细报错:

2024-07-04 11:13:37,845 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_job
2024-07-04 11:13:37,857 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_tmp
2024-07-04 11:13:37,864 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240701
2024-07-04 11:13:37,866 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_students
2024-07-04 11:13:37,874 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_20240703
2024-07-04 11:13:37,883 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_demo
2024-07-04 11:13:37,889 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240628
2024-07-04 11:13:37,896 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs
2024-07-04 11:13:37,934 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_students_old
2024-07-04 11:13:37,940 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_old
2024-07-04 11:13:37,947 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations_20240628
2024-07-04 11:13:37,966 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: enterprise_bak
2024-07-04 11:13:37,980 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_20240701
2024-07-04 11:13:37,994 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_educations
2024-07-04 11:13:37,998 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_2
2024-07-04 11:13:38,005 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: gxy_history, table: gxy_plan_teacher_student
2024-07-04 11:13:38,017 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_20240703
2024-07-04 11:13:38,019 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: person_jobs_20240703
2024-07-04 11:13:38,024 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIMEV2]; keys: [3438-04-01 00:00:00]; ..types: [DATETIMEV2]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons
2024-07-04 11:13:38,033 WARN (DynamicPartitionScheduler|40) [DynamicPartitionScheduler.getDropPartitionClause():440] Error in gen reservePartitionKeyRange. Error=Invalid range: [types: [DATETIME]; keys: [3438-04-01 00:00:00]; ..types: [DATETIME]; keys: [2024-10-01 00:00:00]; ), db: core_db, table: persons_candidate
2024-07-04 11:13:38,036 ERROR (DynamicPartitionScheduler|40) [Daemon.run():118] daemon thread got exception. name: DynamicPartitionScheduler
org.apache.doris.nereids.exceptions.AnalysisException: date/datetime literal [+52687-06-01 00:00:00] is invalid
	at org.apache.doris.nereids.trees.expressions.literal.DateLiteral.normalize(DateLiteral.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.nereids.trees.expressions.literal.DateTimeLiteral.determineScale(DateTimeLiteral.java:107) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.nereids.types.DateTimeV2Type.forTypeFromString(DateTimeV2Type.java:90) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.nereids.trees.expressions.literal.DateTimeV2Literal.<init>(DateTimeV2Literal.java:38) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.PartitionKey.getDateTimeLiteral(PartitionKey.java:121) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.PartitionKey.createPartitionKey(PartitionKey.java:99) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.clone.DynamicPartitionScheduler.getDropPartitionClause(DynamicPartitionScheduler.java:431) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.clone.DynamicPartitionScheduler.executeDynamicPartition(DynamicPartitionScheduler.java:555) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.clone.DynamicPartitionScheduler.runAfterCatalogReady(DynamicPartitionScheduler.java:641) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]


从日志中可以观察到,由于 "dynamic_partition.start" 值过小,导致动态分区的轮询线程daemon thread got exception. name: DynamicPartitionScheduler出现故障,进而停止执行后续其他表的操作。为了解决这个问题,您需要调整这些配置值,确保它们不要设置得太小,这样可以避免类似的系统异常和操作中断。

可以通过命令看到,这时候我大部分表的dynamic_partition.start值都是-656521。

-- 查看所有表分区状态
SHOW DYNAMIC PARTITION TABLES;

问题本质:

问题本质上是由于一个特定表(我们称其为A表)的start配置-656521,这个错误引起的连锁反应,导致另一个表(B表)的分区创建和删除操作未能成功执行。哪怕B表的start配置不是-656521。根据源码判断,这个问题的处理流程如下:

  1. 假设A表在B表前被调度;
  2. A表成功创建新分区;
  3. A表删除历史分区失败;
  4. 影响到B表的操作:由于第三步A在删除历史分区时发生故障,导致系统的调度程序未能继续对后续表(如B)执行分区创建和删除操作。
  5. B的分区操作被跳过:由于第三步A在删除历史分区时发生故障,导致B表本应执行的创建新分区和删除旧分区操作均被系统跳过。

问题修复:

通过命令将Doris下所有库下start值过小的表全部重新设置"dynamic_partition.start"值

ALTER TABLE table_name SET ("dynamic_partition.start" = "-240"); 

查看结果:

修复完成之后,可以看到没有start值过小的数据了。

 查看表结构,也已经自动创建了未来两个月的分区。成功修复了该问题。

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐