Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](routine load) introduce routine load abnormal job monitor metrics #48171

Merged
merged 2 commits into from
Mar 13, 2025

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Feb 21, 2025

What problem does this PR solve?

related #48511

Introduce some metrics so that abnormal routine load jobs can be monitored.

metrics:

  1. On the basis of job state, add two states USER_PAUSED and ABNORMA_PAUSED
{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"ABNORMAL_PAUSED"
        },
        "unit":"nounit",
        "value":1
},

{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"USER_PAUSED"
        },
        "unit":"nounit",
        "value":1
},
  1. Sum of all progress of the routine load job
doris_fe_routine_load_progress
  1. Sum of all lags for the routine load job
doris_fe_routine_load_lag
  1. Sum of all abort tasks num for the routine load job
doris_fe_routine_load_abort_task_num

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from d640074 to 983a4cc Compare February 21, 2025 03:41
@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui changed the title [improve](routine load)(observability) introduce routine load abnormal job monitor [improve](routine load) introduce routine load abnormal job monitor Feb 21, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 31586 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

------ Round 1 ----------------------------------
q1	17642	5194	5114	5114
q2	2044	299	168	168
q3	10413	1305	755	755
q4	10208	1023	531	531
q5	7524	2455	2265	2265
q6	190	169	135	135
q7	902	757	609	609
q8	9296	1345	1173	1173
q9	5054	4660	4652	4652
q10	6835	2350	1924	1924
q11	481	267	252	252
q12	346	362	219	219
q13	17748	3718	3072	3072
q14	223	225	219	219
q15	498	465	461	461
q16	616	623	590	590
q17	563	873	344	344
q18	6552	6210	6176	6176
q19	1222	950	532	532
q20	316	318	185	185
q21	2880	2160	1912	1912
q22	360	336	298	298
Total cold run time: 101913 ms
Total hot run time: 31586 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5093	5140	5117	5117
q2	233	329	233	233
q3	2179	2674	2272	2272
q4	1414	1831	1348	1348
q5	4227	4180	4180	4180
q6	207	171	126	126
q7	1884	1844	1659	1659
q8	2632	2576	2504	2504
q9	7389	7173	7074	7074
q10	2985	3229	2729	2729
q11	585	524	479	479
q12	681	728	632	632
q13	3513	3910	3268	3268
q14	273	305	295	295
q15	507	476	475	475
q16	637	665	642	642
q17	1115	1570	1335	1335
q18	7614	7316	7267	7267
q19	805	799	954	799
q20	1981	2013	1894	1894
q21	5430	4987	4922	4922
q22	626	577	568	568
Total cold run time: 52010 ms
Total hot run time: 49818 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183367 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

query1	951	362	383	362
query2	6526	1935	1833	1833
query3	6802	213	205	205
query4	26635	23746	22896	22896
query5	4305	689	473	473
query6	301	191	187	187
query7	4611	499	307	307
query8	289	246	233	233
query9	8617	2561	2582	2561
query10	489	313	259	259
query11	15365	15106	14983	14983
query12	174	108	105	105
query13	1651	500	382	382
query14	8991	6243	6101	6101
query15	206	194	176	176
query16	7127	621	446	446
query17	896	701	533	533
query18	1947	383	289	289
query19	178	193	162	162
query20	117	114	116	114
query21	207	120	97	97
query22	4155	4313	4446	4313
query23	34260	33460	33255	33255
query24	7703	2385	2401	2385
query25	534	466	376	376
query26	1240	261	148	148
query27	2581	473	332	332
query28	4339	2404	2394	2394
query29	771	536	421	421
query30	232	181	157	157
query31	931	833	809	809
query32	68	60	59	59
query33	543	383	296	296
query34	768	836	503	503
query35	799	801	750	750
query36	973	1014	922	922
query37	117	95	74	74
query38	4249	4172	4094	4094
query39	1447	1409	1407	1407
query40	215	116	107	107
query41	54	54	50	50
query42	122	104	106	104
query43	507	519	469	469
query44	1294	778	773	773
query45	173	172	158	158
query46	864	1030	657	657
query47	1752	1797	1739	1739
query48	401	416	305	305
query49	784	521	415	415
query50	677	738	430	430
query51	4227	4207	4098	4098
query52	109	103	94	94
query53	224	261	180	180
query54	477	480	403	403
query55	87	82	77	77
query56	258	261	232	232
query57	1142	1149	1089	1089
query58	237	242	238	238
query59	2635	2758	2544	2544
query60	274	274	257	257
query61	118	122	113	113
query62	786	726	651	651
query63	244	188	187	187
query64	4471	985	641	641
query65	3216	3108	3133	3108
query66	1114	398	316	316
query67	15650	15497	15270	15270
query68	2214	777	544	544
query69	434	307	284	284
query70	1215	1157	1056	1056
query71	320	298	328	298
query72	5913	3589	3686	3589
query73	652	745	350	350
query74	9058	9166	8949	8949
query75	3114	3249	2712	2712
query76	2245	1151	750	750
query77	350	371	288	288
query78	9954	10170	9287	9287
query79	1127	937	598	598
query80	650	546	464	464
query81	487	329	236	236
query82	1272	129	99	99
query83	229	169	150	150
query84	287	92	73	73
query85	729	334	310	310
query86	329	314	283	283
query87	4470	4487	4421	4421
query88	3002	2231	2235	2231
query89	404	319	296	296
query90	1734	204	198	198
query91	138	146	105	105
query92	62	63	58	58
query93	1101	991	577	577
query94	487	401	305	305
query95	354	269	265	265
query96	507	546	296	296
query97	2767	2904	2734	2734
query98	231	204	205	204
query99	1451	1440	1253	1253
Total cold run time: 261494 ms
Total hot run time: 183367 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 983a4cc990c8c6ca386856f404544742ebaafd5d, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.03	0.03
query3	0.24	0.07	0.08
query4	1.61	0.10	0.10
query5	0.42	0.42	0.40
query6	1.16	0.65	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.52	0.54
query10	0.58	0.58	0.57
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.62
query14	2.68	2.70	2.70
query15	0.90	0.83	0.83
query16	0.37	0.38	0.40
query17	1.04	1.02	1.02
query18	0.21	0.20	0.20
query19	1.87	1.80	1.99
query20	0.02	0.01	0.02
query21	15.38	0.91	0.55
query22	0.75	1.14	0.62
query23	15.03	1.41	0.66
query24	7.52	1.16	1.00
query25	0.55	0.24	0.11
query26	0.63	0.17	0.14
query27	0.05	0.05	0.05
query28	10.04	0.87	0.42
query29	12.55	3.98	3.33
query30	0.25	0.09	0.06
query31	2.84	0.57	0.37
query32	3.22	0.55	0.48
query33	3.04	3.01	3.00
query34	15.78	5.16	4.51
query35	4.57	4.57	4.59
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.12
query41	0.09	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.14 s
Total hot run time: 30.83 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 983a4cc to 96abe50 Compare February 21, 2025 06:12
@sollhui
Copy link
Contributor Author

sollhui commented Feb 21, 2025

run buildall

@sollhui sollhui marked this pull request as draft February 21, 2025 06:17
@doris-robot
Copy link

TPC-H: Total hot run time: 31192 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

------ Round 1 ----------------------------------
q1	17614	5504	5084	5084
q2	2047	284	166	166
q3	10429	1231	765	765
q4	10211	1025	539	539
q5	7538	2463	2250	2250
q6	186	185	139	139
q7	896	743	609	609
q8	9304	1309	1171	1171
q9	4825	4555	4519	4519
q10	6836	2298	1877	1877
q11	504	279	261	261
q12	352	357	219	219
q13	17761	3712	3054	3054
q14	223	227	211	211
q15	507	464	447	447
q16	624	595	577	577
q17	579	847	332	332
q18	6618	6027	6113	6027
q19	1848	942	541	541
q20	301	319	187	187
q21	2858	2122	1918	1918
q22	366	326	299	299
Total cold run time: 102427 ms
Total hot run time: 31192 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5155	5097	5109	5097
q2	230	323	231	231
q3	2156	2660	2271	2271
q4	1429	1811	1365	1365
q5	4263	4143	4125	4125
q6	208	160	125	125
q7	1851	1806	1687	1687
q8	2556	2517	2547	2517
q9	7150	7166	7093	7093
q10	3022	3199	2774	2774
q11	574	510	500	500
q12	717	783	605	605
q13	3551	3768	3344	3344
q14	288	300	273	273
q15	521	474	459	459
q16	619	675	626	626
q17	1120	1536	1369	1369
q18	7555	7369	7223	7223
q19	798	822	873	822
q20	1967	1979	1854	1854
q21	5394	5291	4753	4753
q22	616	563	532	532
Total cold run time: 51740 ms
Total hot run time: 49645 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191826 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

query1	1299	948	981	948
query2	6211	1916	1847	1847
query3	10973	4465	4293	4293
query4	53104	25847	23667	23667
query5	5216	564	494	494
query6	372	210	191	191
query7	5131	509	300	300
query8	330	237	220	220
query9	6300	2774	2763	2763
query10	424	306	262	262
query11	15538	15100	14974	14974
query12	162	109	116	109
query13	1144	559	417	417
query14	10337	6920	6467	6467
query15	198	194	180	180
query16	7096	645	490	490
query17	1096	772	619	619
query18	1510	436	324	324
query19	235	203	188	188
query20	131	126	124	124
query21	209	129	105	105
query22	4549	4497	4272	4272
query23	34167	33480	33424	33424
query24	5679	2427	2494	2427
query25	476	478	405	405
query26	678	305	167	167
query27	1892	503	357	357
query28	2949	2571	2499	2499
query29	571	572	440	440
query30	207	187	152	152
query31	909	841	801	801
query32	76	66	83	66
query33	476	389	324	324
query34	774	877	552	552
query35	822	862	763	763
query36	996	1026	904	904
query37	126	104	70	70
query38	4306	4374	4461	4374
query39	1477	1454	1456	1454
query40	213	115	107	107
query41	56	51	48	48
query42	124	110	108	108
query43	516	524	511	511
query44	1389	869	890	869
query45	186	176	169	169
query46	922	1079	669	669
query47	1837	1856	1775	1775
query48	414	430	337	337
query49	693	549	462	462
query50	717	762	439	439
query51	4351	4283	4249	4249
query52	113	107	109	107
query53	250	271	194	194
query54	498	499	432	432
query55	88	88	91	88
query56	282	283	268	268
query57	1179	1184	1105	1105
query58	257	253	253	253
query59	2785	2953	2865	2865
query60	283	284	281	281
query61	125	115	130	115
query62	722	758	708	708
query63	241	207	197	197
query64	1447	1026	688	688
query65	3194	3140	3112	3112
query66	733	396	293	293
query67	15833	15450	15335	15335
query68	5323	800	544	544
query69	521	361	268	268
query70	1233	1126	1119	1119
query71	429	304	269	269
query72	6281	3657	3509	3509
query73	1038	756	369	369
query74	9204	9128	9143	9128
query75	3217	3183	2700	2700
query76	3829	1174	752	752
query77	540	387	288	288
query78	9934	10149	9236	9236
query79	2410	863	640	640
query80	604	536	463	463
query81	506	277	240	240
query82	475	133	99	99
query83	176	170	163	163
query84	279	94	74	74
query85	758	341	306	306
query86	377	309	284	284
query87	4459	4468	4515	4468
query88	3904	2404	2366	2366
query89	407	317	302	302
query90	1805	196	195	195
query91	140	133	112	112
query92	70	58	55	55
query93	1949	1005	574	574
query94	696	401	305	305
query95	347	277	271	271
query96	516	579	294	294
query97	2825	2851	2748	2748
query98	242	206	199	199
query99	1656	1398	1269	1269
Total cold run time: 293744 ms
Total hot run time: 191826 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.52 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 96abe50dd47182a565e6f57e7a16258da71d0e53, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.06	0.06
query4	1.67	0.10	0.10
query5	0.41	0.42	0.41
query6	1.15	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.60	0.50	0.52
query10	0.56	0.57	0.57
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.72	2.74	2.72
query15	0.92	0.83	0.85
query16	0.37	0.37	0.38
query17	1.03	1.02	1.04
query18	0.22	0.20	0.19
query19	1.91	1.77	2.01
query20	0.01	0.02	0.01
query21	15.35	0.93	0.55
query22	0.76	1.23	0.64
query23	14.98	1.39	0.60
query24	7.54	1.58	0.76
query25	0.52	0.20	0.13
query26	0.57	0.15	0.14
query27	0.06	0.05	0.05
query28	9.96	0.82	0.44
query29	12.52	3.96	3.29
query30	0.26	0.09	0.06
query31	2.83	0.60	0.38
query32	3.22	0.54	0.47
query33	3.15	3.06	3.05
query34	15.79	5.15	4.53
query35	4.54	4.50	4.57
query36	0.66	0.51	0.48
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.02	0.03
Total cold run time: 106.1 s
Total hot run time: 30.52 s

@sollhui
Copy link
Contributor Author

sollhui commented Feb 26, 2025

run buildall

@sollhui sollhui marked this pull request as ready for review February 26, 2025 12:05
@doris-robot
Copy link

TPC-H: Total hot run time: 31668 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

------ Round 1 ----------------------------------
q1	17612	5108	5121	5108
q2	2047	294	168	168
q3	10511	1271	696	696
q4	10288	1002	528	528
q5	8565	2414	2366	2366
q6	187	173	134	134
q7	895	730	585	585
q8	9310	1260	1116	1116
q9	5060	4857	4647	4647
q10	6812	2284	1881	1881
q11	470	273	261	261
q12	342	350	220	220
q13	17765	3684	3077	3077
q14	217	245	202	202
q15	506	463	479	463
q16	632	610	590	590
q17	578	866	350	350
q18	6966	6259	6284	6259
q19	1535	953	568	568
q20	323	327	195	195
q21	2790	2291	1951	1951
q22	366	336	303	303
Total cold run time: 103777 ms
Total hot run time: 31668 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5195	5185	5124	5124
q2	233	337	230	230
q3	2167	2694	2258	2258
q4	1439	1852	1404	1404
q5	4232	4132	4132	4132
q6	201	162	124	124
q7	1849	1787	1780	1780
q8	2585	2575	2540	2540
q9	7249	7185	7252	7185
q10	3037	3205	2788	2788
q11	575	506	480	480
q12	692	748	612	612
q13	3502	3824	3178	3178
q14	272	326	284	284
q15	513	469	464	464
q16	677	706	640	640
q17	1125	1582	1394	1394
q18	7475	7283	7222	7222
q19	799	786	826	786
q20	1929	2021	1886	1886
q21	5394	5000	4919	4919
q22	604	598	550	550
Total cold run time: 51744 ms
Total hot run time: 49980 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183377 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

query1	967	407	386	386
query2	6510	1914	1856	1856
query3	6816	206	204	204
query4	26859	23405	23342	23342
query5	4330	688	489	489
query6	307	194	193	193
query7	4604	511	289	289
query8	300	230	226	226
query9	8603	2582	2571	2571
query10	459	326	256	256
query11	15725	15009	14794	14794
query12	156	104	102	102
query13	1652	517	406	406
query14	9683	6245	6238	6238
query15	217	182	181	181
query16	7194	638	475	475
query17	1203	720	564	564
query18	1956	413	303	303
query19	194	193	163	163
query20	121	114	113	113
query21	207	124	106	106
query22	4109	4152	4418	4152
query23	34456	33493	32815	32815
query24	7779	2385	2350	2350
query25	526	442	380	380
query26	1234	270	153	153
query27	2451	466	320	320
query28	4219	2455	2394	2394
query29	792	529	412	412
query30	229	182	156	156
query31	955	841	752	752
query32	71	63	63	63
query33	553	387	303	303
query34	777	851	484	484
query35	798	814	717	717
query36	977	1011	916	916
query37	120	101	75	75
query38	4113	4110	4059	4059
query39	1423	1367	1399	1367
query40	211	110	124	110
query41	58	54	53	53
query42	124	98	102	98
query43	479	518	477	477
query44	1271	780	772	772
query45	173	175	162	162
query46	845	1022	630	630
query47	1798	1826	1738	1738
query48	373	400	294	294
query49	770	513	410	410
query50	707	735	402	402
query51	4237	4205	4163	4163
query52	103	103	88	88
query53	226	263	178	178
query54	476	481	407	407
query55	83	80	80	80
query56	274	263	246	246
query57	1136	1145	1053	1053
query58	253	240	241	240
query59	2726	2851	2746	2746
query60	281	302	285	285
query61	125	122	137	122
query62	812	736	661	661
query63	225	192	181	181
query64	4460	1004	666	666
query65	3185	3163	3098	3098
query66	1149	385	304	304
query67	15781	15484	15258	15258
query68	8383	895	488	488
query69	470	297	263	263
query70	1232	1168	1110	1110
query71	470	298	254	254
query72	5549	3548	3771	3548
query73	800	723	348	348
query74	9292	8886	8988	8886
query75	3867	3234	2691	2691
query76	3758	1161	739	739
query77	796	398	273	273
query78	10120	10142	9293	9293
query79	2513	833	591	591
query80	599	530	459	459
query81	533	277	249	249
query82	674	127	97	97
query83	178	171	152	152
query84	251	90	75	75
query85	803	355	309	309
query86	384	295	295	295
query87	4449	4660	4254	4254
query88	3619	2190	2184	2184
query89	399	315	284	284
query90	1867	193	194	193
query91	139	141	113	113
query92	80	60	58	58
query93	1698	1052	565	565
query94	649	425	297	297
query95	346	259	259	259
query96	481	554	264	264
query97	3351	3457	3253	3253
query98	239	253	204	204
query99	1352	1406	1282	1282
Total cold run time: 275327 ms
Total hot run time: 183377 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f67d0d0d1bfcfccc3f7308b9d4a41bb67249f456, data reload: false

query1	0.04	0.05	0.03
query2	0.07	0.03	0.04
query3	0.24	0.06	0.07
query4	1.61	0.10	0.10
query5	0.56	0.55	0.57
query6	1.18	0.73	0.72
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.57	0.54	0.51
query10	0.57	0.57	0.57
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.61	0.60
query14	2.66	2.82	2.66
query15	0.92	0.85	0.84
query16	0.38	0.38	0.39
query17	1.06	1.02	1.06
query18	0.21	0.19	0.20
query19	1.91	1.83	1.98
query20	0.02	0.01	0.02
query21	15.38	0.90	0.55
query22	0.74	1.22	0.64
query23	14.90	1.41	0.65
query24	7.32	1.32	0.82
query25	0.50	0.17	0.13
query26	0.64	0.17	0.14
query27	0.05	0.05	0.05
query28	9.16	0.85	0.45
query29	12.54	3.94	3.28
query30	0.24	0.08	0.07
query31	2.82	0.61	0.38
query32	3.23	0.54	0.47
query33	2.99	3.02	3.00
query34	15.84	5.15	4.55
query35	4.54	4.53	4.50
query36	0.66	0.49	0.49
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.02 s
Total hot run time: 30.84 s

@sollhui
Copy link
Contributor Author

sollhui commented Feb 27, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31727 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

------ Round 1 ----------------------------------
q1	17589	5207	5110	5110
q2	2066	290	168	168
q3	10412	1225	746	746
q4	10212	1010	527	527
q5	7545	2326	2340	2326
q6	191	170	131	131
q7	932	724	604	604
q8	9290	1236	1108	1108
q9	4852	4847	4853	4847
q10	6809	2305	1882	1882
q11	465	282	246	246
q12	352	355	222	222
q13	17758	3659	3032	3032
q14	227	222	208	208
q15	523	464	456	456
q16	638	613	587	587
q17	569	857	354	354
q18	6840	6163	6207	6163
q19	1218	954	533	533
q20	305	329	193	193
q21	2868	2188	1985	1985
q22	368	329	299	299
Total cold run time: 102029 ms
Total hot run time: 31727 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5136	5143	5110	5110
q2	237	329	235	235
q3	2156	2699	2282	2282
q4	1466	1866	1368	1368
q5	4234	4114	4174	4114
q6	200	162	126	126
q7	1871	1793	1660	1660
q8	2585	2616	2452	2452
q9	7268	7290	7239	7239
q10	2982	3200	2751	2751
q11	575	526	496	496
q12	691	778	649	649
q13	3504	3840	3353	3353
q14	275	287	281	281
q15	518	482	472	472
q16	646	669	660	660
q17	1135	1617	1310	1310
q18	7543	7351	7429	7351
q19	836	833	869	833
q20	1959	2059	1876	1876
q21	5436	5112	4762	4762
q22	636	601	539	539
Total cold run time: 51889 ms
Total hot run time: 49919 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190321 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

query1	1355	978	933	933
query2	6186	1876	1872	1872
query3	10979	4451	4481	4451
query4	55976	26770	23215	23215
query5	5053	513	474	474
query6	330	191	191	191
query7	4876	504	289	289
query8	314	252	239	239
query9	5494	2597	2602	2597
query10	411	340	261	261
query11	15141	15192	14845	14845
query12	163	111	112	111
query13	1048	526	386	386
query14	10701	6312	6257	6257
query15	221	198	180	180
query16	7219	704	445	445
query17	1087	730	548	548
query18	1717	428	303	303
query19	196	197	153	153
query20	129	126	118	118
query21	218	126	102	102
query22	4535	4568	4530	4530
query23	33885	33312	33483	33312
query24	5725	2421	2423	2421
query25	464	467	390	390
query26	720	284	155	155
query27	1918	474	329	329
query28	2937	2484	2420	2420
query29	580	579	423	423
query30	212	204	159	159
query31	877	898	787	787
query32	70	60	60	60
query33	458	360	300	300
query34	759	858	503	503
query35	836	868	753	753
query36	950	1011	871	871
query37	127	97	73	73
query38	4106	4169	4140	4140
query39	1653	1430	1438	1430
query40	207	123	107	107
query41	54	52	51	51
query42	124	109	105	105
query43	499	515	487	487
query44	1303	805	810	805
query45	195	177	172	172
query46	890	1061	664	664
query47	1845	1867	1802	1802
query48	399	421	314	314
query49	741	535	452	452
query50	715	759	423	423
query51	4371	4355	4239	4239
query52	122	103	103	103
query53	239	293	188	188
query54	507	502	445	445
query55	96	80	82	80
query56	291	283	261	261
query57	1194	1188	1140	1140
query58	246	250	245	245
query59	2668	2649	2750	2649
query60	290	272	274	272
query61	142	127	115	115
query62	763	752	680	680
query63	225	200	192	192
query64	1798	1031	675	675
query65	3263	3256	3230	3230
query66	707	390	314	314
query67	16101	15704	15284	15284
query68	8245	882	494	494
query69	541	304	271	271
query70	1171	1137	1131	1131
query71	502	305	257	257
query72	5937	3634	3787	3634
query73	1534	761	348	348
query74	9059	9177	9024	9024
query75	3664	3174	2677	2677
query76	4116	1187	752	752
query77	682	374	286	286
query78	10176	10219	9199	9199
query79	2376	822	589	589
query80	605	531	442	442
query81	525	276	240	240
query82	525	125	93	93
query83	174	173	158	158
query84	290	91	71	71
query85	790	347	315	315
query86	413	299	282	282
query87	4397	4540	4479	4479
query88	3776	2223	2208	2208
query89	413	319	285	285
query90	1793	194	197	194
query91	137	140	114	114
query92	80	73	56	56
query93	1775	1054	568	568
query94	667	417	289	289
query95	388	267	261	261
query96	482	558	274	274
query97	3399	3406	3329	3329
query98	221	203	200	200
query99	1405	1413	1281	1281
Total cold run time: 299879 ms
Total hot run time: 190321 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b14ccec4481d58b7355a7445e77420f3ce5aa877, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.04	0.04
query3	0.24	0.07	0.07
query4	1.62	0.10	0.10
query5	0.56	0.55	0.55
query6	1.20	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.59	0.54	0.53
query10	0.58	0.58	0.59
query11	0.15	0.11	0.11
query12	0.14	0.11	0.12
query13	0.61	0.60	0.60
query14	2.80	2.68	2.80
query15	0.93	0.86	0.85
query16	0.38	0.38	0.38
query17	1.02	1.01	1.01
query18	0.21	0.19	0.20
query19	1.98	1.96	1.83
query20	0.01	0.01	0.01
query21	15.35	0.88	0.53
query22	0.75	1.16	0.77
query23	14.88	1.40	0.60
query24	6.47	2.38	1.18
query25	0.50	0.27	0.12
query26	0.58	0.16	0.13
query27	0.06	0.05	0.05
query28	10.32	0.79	0.43
query29	12.54	3.94	3.32
query30	0.25	0.09	0.06
query31	2.85	0.60	0.38
query32	3.23	0.54	0.45
query33	3.09	2.96	3.00
query34	15.78	5.12	4.53
query35	4.50	4.51	4.51
query36	0.68	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.02	0.03
query40	0.16	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 105.5 s
Total hot run time: 31.24 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from b14ccec to 0ab9771 Compare February 28, 2025 10:33
@sollhui
Copy link
Contributor Author

sollhui commented Feb 28, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 0ab9771 to 0b1ccf5 Compare March 1, 2025 02:36
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 0b1ccf5 to 5156681 Compare March 1, 2025 02:54
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 515668184526994b7231d0dfb670c6bc46d68cbb, data reload: false

------ Round 1 ----------------------------------
q1	17592	5163	5065	5065
q2	2059	310	168	168
q3	10659	1224	757	757
q4	10224	1021	507	507
q5	7604	2428	2343	2343
q6	194	168	131	131
q7	912	754	605	605
q8	9288	1255	1094	1094
q9	4952	4883	4871	4871
q10	6822	2288	1867	1867
q11	505	277	260	260
q12	349	350	219	219
q13	17783	3723	3050	3050
q14	228	230	209	209
q15	507	455	452	452
q16	621	616	596	596
q17	595	860	344	344
q18	7051	6216	6139	6139
q19	1665	947	522	522
q20	313	323	189	189
q21	2823	2131	1911	1911
q22	378	342	309	309
Total cold run time: 103124 ms
Total hot run time: 31608 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5116	5119	5102	5102
q2	239	324	234	234
q3	2199	2669	2358	2358
q4	1428	1807	1345	1345
q5	4287	4117	4156	4117
q6	210	161	124	124
q7	1884	1834	1725	1725
q8	2639	2567	2574	2567
q9	7216	7214	7243	7214
q10	2987	3196	2815	2815
q11	572	536	494	494
q12	692	780	635	635
q13	3441	3892	3289	3289
q14	288	293	280	280
q15	524	477	459	459
q16	636	695	650	650
q17	1179	1621	1355	1355
q18	7687	7371	7323	7323
q19	793	808	892	808
q20	1971	2119	1863	1863
q21	5508	5018	4853	4853
q22	659	592	568	568
Total cold run time: 52155 ms
Total hot run time: 50178 ms

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from e4a4825 to c8a0866 Compare March 12, 2025 03:11
@sollhui sollhui changed the title [improve](routine load) introduce routine load abnormal job monitor [improve](routine load) introduce routine load abnormal job monitor metrics Mar 12, 2025
@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from c8a0866 to 5b8ad51 Compare March 12, 2025 06:27
@sollhui
Copy link
Contributor Author

sollhui commented Mar 12, 2025

run buildall

1 similar comment
@sollhui
Copy link
Contributor Author

sollhui commented Mar 12, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32415 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 60f4a4958688e3ea23912f2f73cc9d6fdb8d1f80, data reload: false

------ Round 1 ----------------------------------
q1	17573	5344	5071	5071
q2	2046	302	172	172
q3	10397	1287	775	775
q4	10198	1037	529	529
q5	7484	2429	2331	2331
q6	191	166	134	134
q7	923	751	605	605
q8	9305	1285	1049	1049
q9	5035	4975	4685	4685
q10	6825	2310	1881	1881
q11	482	267	254	254
q12	354	358	216	216
q13	17778	3646	3059	3059
q14	242	229	213	213
q15	544	488	491	488
q16	634	622	599	599
q17	575	881	339	339
q18	6913	6554	6338	6338
q19	1566	979	564	564
q20	320	337	204	204
q21	2704	2115	1923	1923
q22	1074	1038	986	986
Total cold run time: 103163 ms
Total hot run time: 32415 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5184	5139	5173	5139
q2	240	338	234	234
q3	2153	2653	2272	2272
q4	1449	1833	1381	1381
q5	4208	4144	4154	4144
q6	208	173	124	124
q7	1881	1898	1776	1776
q8	2644	2597	2593	2593
q9	7173	7235	7131	7131
q10	3036	3287	2790	2790
q11	574	498	478	478
q12	692	774	577	577
q13	3379	3989	3350	3350
q14	296	301	284	284
q15	544	484	491	484
q16	654	688	673	673
q17	1163	1636	1350	1350
q18	7832	7648	7612	7612
q19	840	816	829	816
q20	2053	2034	1876	1876
q21	5425	5133	4855	4855
q22	1113	1094	1059	1059
Total cold run time: 52741 ms
Total hot run time: 50998 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191982 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 60f4a4958688e3ea23912f2f73cc9d6fdb8d1f80, data reload: false

query1	1405	1030	1015	1015
query2	6238	1980	1906	1906
query3	11133	4759	4445	4445
query4	55333	24921	23198	23198
query5	5087	660	493	493
query6	343	199	185	185
query7	4886	508	299	299
query8	309	246	235	235
query9	5438	2630	2624	2624
query10	412	338	265	265
query11	15211	15291	14972	14972
query12	163	111	104	104
query13	1045	509	406	406
query14	11321	6432	6599	6432
query15	204	191	198	191
query16	7125	672	486	486
query17	1048	712	566	566
query18	1592	396	330	330
query19	194	203	163	163
query20	125	127	126	126
query21	209	121	102	102
query22	4324	4312	4289	4289
query23	34033	33279	33281	33279
query24	6077	2439	2443	2439
query25	460	460	441	441
query26	720	269	155	155
query27	2186	513	330	330
query28	2746	2482	2467	2467
query29	602	606	427	427
query30	277	222	195	195
query31	884	869	782	782
query32	79	63	64	63
query33	438	349	334	334
query34	778	856	524	524
query35	802	837	751	751
query36	941	1016	902	902
query37	125	108	85	85
query38	4210	4354	4188	4188
query39	1508	1449	1440	1440
query40	221	121	106	106
query41	56	59	54	54
query42	130	107	109	107
query43	498	518	491	491
query44	1364	806	810	806
query45	235	180	171	171
query46	846	1044	654	654
query47	1836	1867	1801	1801
query48	398	417	306	306
query49	689	522	426	426
query50	725	749	429	429
query51	4277	4328	4320	4320
query52	108	107	96	96
query53	242	267	198	198
query54	484	495	435	435
query55	85	78	80	78
query56	305	251	298	251
query57	1183	1198	1102	1102
query58	253	246	239	239
query59	2806	2794	2786	2786
query60	283	286	276	276
query61	136	130	122	122
query62	749	733	680	680
query63	235	196	192	192
query64	2141	1100	695	695
query65	4576	4469	4436	4436
query66	749	409	293	293
query67	15870	15346	15177	15177
query68	6808	875	500	500
query69	532	299	266	266
query70	1237	1118	1161	1118
query71	499	296	260	260
query72	5992	3754	3806	3754
query73	1517	743	354	354
query74	9146	8926	8626	8626
query75	4086	3165	2724	2724
query76	4228	1207	753	753
query77	783	359	274	274
query78	10189	10177	9212	9212
query79	2379	828	572	572
query80	767	582	438	438
query81	490	254	232	232
query82	657	123	94	94
query83	215	173	154	154
query84	288	91	78	78
query85	779	358	332	332
query86	353	280	279	279
query87	4491	4448	4358	4358
query88	3497	2313	2314	2313
query89	425	323	283	283
query90	1985	227	228	227
query91	152	143	109	109
query92	162	60	55	55
query93	1243	1039	581	581
query94	673	427	311	311
query95	349	278	270	270
query96	500	577	278	278
query97	3371	3378	3310	3310
query98	233	211	210	210
query99	1464	1407	1296	1296
Total cold run time: 300981 ms
Total hot run time: 191982 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 60f4a4958688e3ea23912f2f73cc9d6fdb8d1f80, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.06
query4	1.62	0.11	0.11
query5	0.56	0.53	0.54
query6	1.19	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.05	0.03
query9	0.59	0.51	0.54
query10	0.58	0.60	0.58
query11	0.16	0.11	0.11
query12	0.15	0.11	0.10
query13	0.61	0.60	0.59
query14	2.66	2.70	2.69
query15	0.91	0.87	0.86
query16	0.38	0.37	0.38
query17	1.03	1.03	1.06
query18	0.21	0.20	0.20
query19	1.90	1.87	1.99
query20	0.02	0.02	0.01
query21	15.35	0.91	0.54
query22	0.75	1.32	0.73
query23	14.72	1.38	0.62
query24	6.96	1.78	0.65
query25	0.52	0.08	0.19
query26	0.73	0.16	0.14
query27	0.05	0.05	0.05
query28	8.88	0.87	0.42
query29	12.61	4.00	3.33
query30	0.26	0.09	0.06
query31	2.82	0.60	0.37
query32	3.23	0.54	0.49
query33	2.99	2.98	2.98
query34	15.83	5.14	4.52
query35	4.60	4.57	4.54
query36	0.65	0.49	0.48
query37	0.09	0.07	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.43 s
Total hot run time: 30.76 s

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 60f4a49 to 8d8907e Compare March 12, 2025 15:10
@sollhui
Copy link
Contributor Author

sollhui commented Mar 12, 2025

run buildall

@sollhui sollhui force-pushed the rl_abnormal_job_monitor branch from 68015ea to b95ba3c Compare March 13, 2025 03:20
@sollhui
Copy link
Contributor Author

sollhui commented Mar 13, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32443 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea05fe3400bd9691caae6e12358f7c789b68828d, data reload: false

------ Round 1 ----------------------------------
q1	17595	5168	5216	5168
q2	2043	281	162	162
q3	10436	1235	730	730
q4	10239	1002	524	524
q5	7909	2393	2313	2313
q6	193	162	136	136
q7	907	742	620	620
q8	9431	1298	1048	1048
q9	4942	4675	4706	4675
q10	6859	2283	1872	1872
q11	478	283	257	257
q12	349	355	217	217
q13	17767	3639	3125	3125
q14	227	229	206	206
q15	528	486	470	470
q16	612	623	596	596
q17	581	871	339	339
q18	6845	6589	6400	6400
q19	1831	964	543	543
q20	313	320	188	188
q21	2747	2151	1888	1888
q22	1035	1023	966	966
Total cold run time: 103867 ms
Total hot run time: 32443 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5186	5119	5142	5119
q2	227	329	236	236
q3	2171	2684	2311	2311
q4	1420	1797	1368	1368
q5	4192	4143	4162	4143
q6	204	165	121	121
q7	1857	1897	1770	1770
q8	2603	2656	2539	2539
q9	7224	7270	7027	7027
q10	3010	3222	2789	2789
q11	571	512	487	487
q12	698	784	601	601
q13	3319	3957	3300	3300
q14	286	290	293	290
q15	534	507	508	507
q16	645	658	639	639
q17	1128	1614	1332	1332
q18	7716	7729	7483	7483
q19	830	826	812	812
q20	1963	2017	1864	1864
q21	5437	5097	4636	4636
q22	1078	1053	1009	1009
Total cold run time: 52299 ms
Total hot run time: 50383 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191807 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea05fe3400bd9691caae6e12358f7c789b68828d, data reload: false

query1	1403	999	970	970
query2	6193	1834	1880	1834
query3	10949	4637	4423	4423
query4	52449	25528	22948	22948
query5	5159	599	489	489
query6	339	202	195	195
query7	4898	511	291	291
query8	314	256	247	247
query9	5615	2641	2640	2640
query10	432	319	251	251
query11	15236	15106	15091	15091
query12	159	110	105	105
query13	1044	513	388	388
query14	10676	6491	6997	6491
query15	211	185	175	175
query16	7009	631	461	461
query17	1051	719	551	551
query18	1517	400	311	311
query19	192	198	160	160
query20	132	123	119	119
query21	242	123	107	107
query22	4436	4467	4343	4343
query23	34109	33296	33314	33296
query24	6244	2458	2417	2417
query25	464	459	403	403
query26	713	277	156	156
query27	1738	493	332	332
query28	2875	2465	2470	2465
query29	605	596	460	460
query30	279	232	201	201
query31	905	908	786	786
query32	74	63	66	63
query33	465	372	319	319
query34	763	863	506	506
query35	793	883	766	766
query36	974	994	911	911
query37	129	109	78	78
query38	4225	4214	4201	4201
query39	1495	1427	1435	1427
query40	203	114	97	97
query41	53	51	52	51
query42	122	104	111	104
query43	499	531	469	469
query44	1300	792	799	792
query45	179	177	169	169
query46	858	1035	631	631
query47	1793	1835	1838	1835
query48	389	417	305	305
query49	691	489	440	440
query50	697	772	411	411
query51	4288	4292	4282	4282
query52	109	115	101	101
query53	236	257	196	196
query54	512	498	419	419
query55	87	83	79	79
query56	260	289	281	281
query57	1195	1186	1080	1080
query58	285	251	240	240
query59	2652	2962	2783	2783
query60	294	297	277	277
query61	120	117	125	117
query62	723	761	667	667
query63	227	191	195	191
query64	1756	1035	740	740
query65	4573	4435	4469	4435
query66	780	390	309	309
query67	15910	15450	15238	15238
query68	8350	865	501	501
query69	549	292	265	265
query70	1204	1111	1133	1111
query71	497	311	275	275
query72	5957	3611	3768	3611
query73	1447	751	354	354
query74	9023	9134	8777	8777
query75	4006	3161	2706	2706
query76	4250	1208	792	792
query77	673	364	281	281
query78	10008	10017	9263	9263
query79	2736	838	598	598
query80	697	550	460	460
query81	473	262	227	227
query82	641	128	99	99
query83	257	245	150	150
query84	282	97	71	71
query85	804	364	315	315
query86	366	285	290	285
query87	4444	4496	4318	4318
query88	3649	2291	2287	2287
query89	416	317	294	294
query90	1921	225	228	225
query91	145	138	109	109
query92	77	64	55	55
query93	1572	1072	584	584
query94	682	431	318	318
query95	345	278	269	269
query96	502	562	280	280
query97	3389	3385	3347	3347
query98	250	210	211	210
query99	1484	1424	1254	1254
Total cold run time: 298393 ms
Total hot run time: 191807 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea05fe3400bd9691caae6e12358f7c789b68828d, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.03
query3	0.24	0.06	0.06
query4	1.61	0.10	0.11
query5	0.55	0.55	0.56
query6	1.19	0.73	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.04
query9	0.58	0.52	0.53
query10	0.59	0.61	0.59
query11	0.15	0.11	0.11
query12	0.15	0.12	0.11
query13	0.63	0.61	0.61
query14	2.66	2.67	2.79
query15	0.93	0.87	0.85
query16	0.39	0.38	0.38
query17	1.01	1.02	1.01
query18	0.21	0.20	0.19
query19	1.95	1.85	2.03
query20	0.02	0.01	0.02
query21	15.35	0.89	0.55
query22	0.74	1.21	0.63
query23	14.95	1.37	0.60
query24	6.74	1.92	0.63
query25	0.51	0.21	0.07
query26	0.59	0.15	0.13
query27	0.06	0.05	0.05
query28	9.33	0.83	0.43
query29	12.54	4.07	3.40
query30	0.25	0.09	0.06
query31	2.82	0.60	0.40
query32	3.22	0.53	0.46
query33	3.00	3.09	3.08
query34	15.75	5.10	4.50
query35	4.54	4.48	4.56
query36	0.66	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.14	0.14
query41	0.08	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.02
Total cold run time: 104.58 s
Total hot run time: 30.7 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 13, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 678f179 into apache:master Mar 13, 2025
28 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 13, 2025
…etrics (#48171)

### What problem does this PR solve?

related #48511

Introduce some metrics so that abnormal routine load jobs can be
monitored.

**metrics:**

1. On the basis of job state, add two states `USER_PAUSED` and
`ABNORMA_PAUSED`
```
{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"ABNORMAL_PAUSED"
        },
        "unit":"nounit",
        "value":1
},

{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"USER_PAUSED"
        },
        "unit":"nounit",
        "value":1
},
```
2. Sum of all progress of the routine load job
```
doris_fe_routine_load_progress
```
3. Sum of all lags for the routine load job
```
doris_fe_routine_load_lag
```
4. Sum of all abort tasks num for the routine load job
```
doris_fe_routine_load_abort_task_num
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.x dev/2.1.x-conflict dev/3.0.x reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants