Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](parquet-reader) Fix definition level rle decode dead loop in parquet-reader. #39523

Merged
merged 1 commit into from
Aug 26, 2024

Conversation

kaka11chen
Copy link
Contributor

Proposed changes

[Fix] (parquet-reader) Fix definition level rle decode dead loop in parquet-reader.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen kaka11chen force-pushed the fix_parquet_reader_rle_decoder branch from 6ed42fd to f8aaa25 Compare August 17, 2024 14:03
morningman
morningman previously approved these changes Aug 17, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 17, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_parquet_reader_rle_decoder branch from f8aaa25 to 0e2bd4d Compare August 17, 2024 15:20
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 17, 2024
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38131 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0e2bd4da4a373a7966c2c1e2a413ccbf6c76c59b, data reload: false

------ Round 1 ----------------------------------
q1	17856	4517	4347	4347
q2	2063	208	206	206
q3	11712	972	1123	972
q4	10535	754	763	754
q5	7769	2833	2823	2823
q6	269	161	162	161
q7	1026	657	648	648
q8	9591	2146	2101	2101
q9	8674	6558	6553	6553
q10	7065	2309	2201	2201
q11	495	284	258	258
q12	420	247	250	247
q13	17778	3002	3003	3002
q14	306	277	247	247
q15	548	520	520	520
q16	517	423	407	407
q17	983	664	655	655
q18	7453	6765	6877	6765
q19	6852	1042	1018	1018
q20	688	341	360	341
q21	4243	2878	2889	2878
q22	1130	1047	1027	1027
Total cold run time: 117973 ms
Total hot run time: 38131 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4623	4346	4298	4298
q2	394	300	308	300
q3	2841	2673	2714	2673
q4	2042	1718	1699	1699
q5	5666	5694	5689	5689
q6	248	155	156	155
q7	2228	1814	1730	1730
q8	3287	3470	3486	3470
q9	8843	8763	8720	8720
q10	3583	3328	3334	3328
q11	634	518	563	518
q12	829	668	652	652
q13	17120	3234	3128	3128
q14	337	295	283	283
q15	558	524	523	523
q16	508	474	473	473
q17	1875	1529	1567	1529
q18	8602	7990	7695	7695
q19	9524	1863	1749	1749
q20	2998	1860	1869	1860
q21	13508	5215	5349	5215
q22	1192	1090	1064	1064
Total cold run time: 91440 ms
Total hot run time: 56751 ms

@kaka11chen kaka11chen marked this pull request as ready for review August 26, 2024 05:13
@kaka11chen kaka11chen force-pushed the fix_parquet_reader_rle_decoder branch from 0e2bd4d to 1d1ade5 Compare August 26, 2024 05:16
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37788 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1d1ade546d9fa6541b771ae5dbcabddda566eaff, data reload: false

------ Round 1 ----------------------------------
q1	17914	4615	4329	4329
q2	2027	182	187	182
q3	11653	948	1148	948
q4	10448	709	741	709
q5	7745	2908	2875	2875
q6	221	139	134	134
q7	971	620	608	608
q8	9310	2098	2092	2092
q9	7199	6555	6573	6555
q10	7015	2134	2193	2134
q11	440	234	242	234
q12	390	220	224	220
q13	17782	3045	2995	2995
q14	276	242	233	233
q15	530	482	492	482
q16	507	396	384	384
q17	992	685	683	683
q18	7602	6747	6827	6747
q19	1394	1056	1069	1056
q20	703	335	332	332
q21	4062	2843	3068	2843
q22	1112	1033	1013	1013
Total cold run time: 110293 ms
Total hot run time: 37788 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4377	4302	4276	4276
q2	385	281	269	269
q3	2884	2653	2614	2614
q4	2012	1702	1697	1697
q5	5637	5662	5727	5662
q6	222	141	138	138
q7	2290	1808	1810	1808
q8	3322	3416	3484	3416
q9	8914	8850	8801	8801
q10	3583	3406	3351	3351
q11	624	503	509	503
q12	851	660	654	654
q13	13805	3112	3209	3112
q14	313	281	290	281
q15	540	498	473	473
q16	504	453	443	443
q17	1852	1575	1570	1570
q18	8085	7687	7818	7687
q19	1758	1677	1452	1452
q20	2163	1927	1884	1884
q21	5717	5434	5520	5434
q22	1118	1024	1044	1024
Total cold run time: 70956 ms
Total hot run time: 56549 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191313 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1d1ade546d9fa6541b771ae5dbcabddda566eaff, data reload: false

query1	1263	863	853	853
query2	6323	1907	1934	1907
query3	10597	4083	4036	4036
query4	59421	24081	23127	23127
query5	5389	500	498	498
query6	405	159	162	159
query7	5871	293	302	293
query8	283	208	204	204
query9	8901	2464	2455	2455
query10	476	276	262	262
query11	17327	14937	15083	14937
query12	157	113	109	109
query13	1558	380	386	380
query14	11063	7188	7502	7188
query15	234	235	179	179
query16	7176	494	531	494
query17	1143	568	575	568
query18	1930	303	297	297
query19	299	146	153	146
query20	113	111	116	111
query21	214	104	106	104
query22	4829	4378	4389	4378
query23	34170	33699	33267	33267
query24	5943	2787	2830	2787
query25	545	410	399	399
query26	705	152	155	152
query27	1796	278	279	278
query28	3550	2054	2031	2031
query29	658	398	396	396
query30	212	148	146	146
query31	925	757	734	734
query32	79	55	57	55
query33	471	289	281	281
query34	848	471	473	471
query35	845	723	714	714
query36	1041	958	925	925
query37	134	80	81	80
query38	3979	3848	3836	3836
query39	1422	1395	1402	1395
query40	195	113	113	113
query41	47	49	45	45
query42	115	94	93	93
query43	513	469	485	469
query44	1107	735	745	735
query45	195	169	166	166
query46	1085	769	741	741
query47	1912	1828	1833	1828
query48	367	304	289	289
query49	769	444	475	444
query50	820	417	403	403
query51	7219	7078	7005	7005
query52	97	87	87	87
query53	250	180	177	177
query54	565	464	448	448
query55	78	76	77	76
query56	283	270	256	256
query57	1178	1059	1054	1054
query58	231	227	234	227
query59	3010	2761	2868	2761
query60	296	275	267	267
query61	121	120	119	119
query62	763	666	668	666
query63	221	183	186	183
query64	3427	1821	1780	1780
query65	3196	3177	3120	3120
query66	699	347	344	344
query67	15428	15145	15120	15120
query68	3247	569	582	569
query69	417	283	289	283
query70	1131	1128	1066	1066
query71	353	278	278	278
query72	2828	2254	2331	2254
query73	715	324	320	320
query74	9252	8661	8707	8661
query75	3333	2697	2695	2695
query76	1409	943	966	943
query77	523	321	308	308
query78	10294	9336	9035	9035
query79	1252	533	533	533
query80	967	503	514	503
query81	552	230	228	228
query82	559	139	132	132
query83	233	146	144	144
query84	259	77	73	73
query85	912	289	285	285
query86	400	295	306	295
query87	4384	4240	4431	4240
query88	2989	2347	2324	2324
query89	389	291	286	286
query90	1878	194	191	191
query91	117	96	101	96
query92	63	49	48	48
query93	1175	526	532	526
query94	829	290	293	290
query95	355	262	259	259
query96	588	270	265	265
query97	3194	3076	3078	3076
query98	214	201	198	198
query99	1752	1281	1261	1261
Total cold run time: 303045 ms
Total hot run time: 191313 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1d1ade546d9fa6541b771ae5dbcabddda566eaff, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.08	0.08
query5	0.50	0.48	0.49
query6	1.13	0.74	0.72
query7	0.02	0.02	0.02
query8	0.05	0.05	0.05
query9	0.55	0.47	0.50
query10	0.53	0.52	0.53
query11	0.16	0.12	0.12
query12	0.14	0.12	0.12
query13	0.60	0.58	0.58
query14	0.77	0.80	0.79
query15	0.85	0.83	0.81
query16	0.37	0.37	0.38
query17	0.99	1.06	0.97
query18	0.23	0.21	0.21
query19	1.87	1.81	1.74
query20	0.01	0.00	0.01
query21	15.39	0.65	0.64
query22	4.79	6.92	1.75
query23	18.22	1.33	1.28
query24	2.07	0.23	0.22
query25	0.15	0.08	0.07
query26	0.27	0.18	0.18
query27	0.08	0.07	0.08
query28	13.27	1.03	1.00
query29	12.61	3.31	3.31
query30	0.24	0.05	0.05
query31	2.87	0.41	0.42
query32	3.24	0.47	0.47
query33	2.94	3.06	3.01
query34	16.97	4.40	4.39
query35	4.44	4.45	4.43
query36	0.66	0.47	0.47
query37	0.19	0.16	0.16
query38	0.16	0.15	0.16
query39	0.04	0.04	0.03
query40	0.16	0.12	0.12
query41	0.09	0.06	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.05
Total cold run time: 109.76 s
Total hot run time: 30.53 s

@morningman morningman force-pushed the fix_parquet_reader_rle_decoder branch from 1d1ade5 to 5ecc1b5 Compare August 26, 2024 10:42
@morningman
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37785 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5ecc1b5d230a69e0af537b522cd229ae000afaf2, data reload: false

------ Round 1 ----------------------------------
q1	17637	4590	4312	4312
q2	2031	182	180	180
q3	11513	938	1094	938
q4	10512	669	776	669
q5	7763	2830	2838	2830
q6	226	140	142	140
q7	981	636	610	610
q8	9323	2077	2071	2071
q9	7205	6569	6561	6561
q10	7009	2202	2261	2202
q11	444	245	249	245
q12	389	223	222	222
q13	18994	3024	3042	3024
q14	274	235	239	235
q15	531	477	487	477
q16	509	406	388	388
q17	1007	700	682	682
q18	7374	6769	6945	6769
q19	1391	1033	996	996
q20	663	329	328	328
q21	3929	2942	2920	2920
q22	1106	986	1032	986
Total cold run time: 110811 ms
Total hot run time: 37785 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4405	4296	4342	4296
q2	396	287	269	269
q3	2921	2660	2676	2660
q4	1954	1688	1686	1686
q5	5671	5709	5780	5709
q6	229	139	146	139
q7	2241	1900	1835	1835
q8	3284	3439	3524	3439
q9	8842	8838	8843	8838
q10	3586	3371	3368	3368
q11	637	508	529	508
q12	841	666	647	647
q13	14747	3153	3242	3153
q14	331	275	295	275
q15	543	493	487	487
q16	506	449	440	440
q17	1852	1543	1537	1537
q18	8211	7756	7684	7684
q19	1737	1520	1614	1520
q20	2136	1914	1903	1903
q21	5770	5497	5420	5420
q22	1120	1065	1013	1013
Total cold run time: 71960 ms
Total hot run time: 56826 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5ecc1b5d230a69e0af537b522cd229ae000afaf2, data reload: false

query1	1225	904	873	873
query2	6327	1952	1887	1887
query3	10613	4001	3893	3893
query4	59999	23833	23209	23209
query5	5580	511	493	493
query6	413	172	171	171
query7	5786	296	288	288
query8	283	199	202	199
query9	8662	2453	2451	2451
query10	476	265	252	252
query11	18236	14999	15091	14999
query12	156	104	110	104
query13	1515	392	392	392
query14	10924	7310	7271	7271
query15	231	173	187	173
query16	7669	491	509	491
query17	1166	572	582	572
query18	2066	301	298	298
query19	295	149	153	149
query20	116	108	109	108
query21	209	104	102	102
query22	4489	4414	4397	4397
query23	34407	33343	33392	33343
query24	5933	2909	2846	2846
query25	538	422	378	378
query26	686	158	155	155
query27	1784	285	276	276
query28	3851	2049	2029	2029
query29	698	391	393	391
query30	244	153	154	153
query31	929	765	739	739
query32	78	55	57	55
query33	485	296	279	279
query34	861	468	474	468
query35	835	711	715	711
query36	1069	912	939	912
query37	141	93	82	82
query38	3889	3902	3892	3892
query39	1446	1386	1399	1386
query40	199	118	115	115
query41	44	47	44	44
query42	121	101	95	95
query43	517	460	473	460
query44	1075	741	739	739
query45	200	167	166	166
query46	1112	787	746	746
query47	1848	1807	1809	1807
query48	381	295	302	295
query49	777	457	439	439
query50	828	428	412	412
query51	7266	7093	7029	7029
query52	105	90	88	88
query53	255	183	184	183
query54	564	464	464	464
query55	76	76	78	76
query56	291	271	272	271
query57	1200	1073	1075	1073
query58	240	228	246	228
query59	3036	2821	2853	2821
query60	302	281	286	281
query61	119	121	120	120
query62	750	667	669	667
query63	217	183	185	183
query64	3435	1851	1798	1798
query65	3225	3167	3147	3147
query66	701	342	344	342
query67	15298	15187	15024	15024
query68	3062	580	603	580
query69	405	296	282	282
query70	1115	1135	1055	1055
query71	361	277	270	270
query72	6302	2449	2237	2237
query73	759	322	316	316
query74	9261	8768	8946	8768
query75	3386	2673	2681	2673
query76	1427	953	909	909
query77	501	315	325	315
query78	11386	9545	9000	9000
query79	2077	533	527	527
query80	861	508	499	499
query81	564	223	224	223
query82	313	138	138	138
query83	255	145	146	145
query84	264	80	73	73
query85	704	326	280	280
query86	399	293	281	281
query87	4391	4283	4322	4283
query88	3634	2341	2322	2322
query89	378	294	290	290
query90	1947	192	191	191
query91	122	96	97	96
query92	63	48	51	48
query93	1106	536	534	534
query94	792	309	308	308
query95	356	257	261	257
query96	594	268	266	266
query97	3229	3107	3084	3084
query98	216	210	201	201
query99	1543	1276	1242	1242
Total cold run time: 310265 ms
Total hot run time: 191673 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.93 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5ecc1b5d230a69e0af537b522cd229ae000afaf2, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.09
query5	0.50	0.50	0.50
query6	1.14	0.73	0.74
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.54	0.49	0.49
query10	0.53	0.53	0.53
query11	0.15	0.11	0.12
query12	0.15	0.12	0.11
query13	0.60	0.59	0.59
query14	0.76	0.82	0.78
query15	0.88	0.82	0.82
query16	0.36	0.36	0.38
query17	1.03	1.01	1.02
query18	0.23	0.21	0.20
query19	1.93	1.80	1.80
query20	0.02	0.01	0.01
query21	15.39	0.65	0.64
query22	4.41	6.93	1.86
query23	18.27	1.51	1.37
query24	2.11	0.23	0.23
query25	0.16	0.08	0.08
query26	0.27	0.19	0.18
query27	0.07	0.08	0.07
query28	13.17	1.04	1.00
query29	12.63	3.42	3.39
query30	0.25	0.05	0.05
query31	2.88	0.39	0.39
query32	3.28	0.48	0.48
query33	2.98	2.98	3.04
query34	17.01	4.40	4.40
query35	4.42	4.42	4.45
query36	0.66	0.49	0.49
query37	0.20	0.16	0.16
query38	0.16	0.15	0.16
query39	0.04	0.03	0.04
query40	0.16	0.13	0.14
query41	0.09	0.04	0.04
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.62 s
Total hot run time: 30.93 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit d2fe243 into apache:master Aug 26, 2024
27 of 30 checks passed
morningman pushed a commit to morningman/doris that referenced this pull request Aug 26, 2024
morningman added a commit that referenced this pull request Aug 27, 2024
dataroaring pushed a commit that referenced this pull request Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants