Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](multi table) do not use strlen to calculate the length of msg #40367

Merged
merged 1 commit into from
Sep 5, 2024
Merged

[fix](multi table) do not use strlen to calculate the length of msg #40367

merged 1 commit into from
Sep 5, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Sep 4, 2024

Meet code dump when using single stream multi table load:

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
  1. It is hard to guaranteed that msg is a C-style string ending in '\0' character. If not, it may cause the core dump to access memory out of bounds.
  2. It is not need to calculate the length of msg twice.

Therefore, deleting the logic that using strlen to calculate the length of msg.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Sep 4, 2024

run buildall

Copy link
Contributor

github-actions bot commented Sep 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.84% (9393/25495)
Line Coverage: 28.28% (77477/273937)
Region Coverage: 27.67% (39968/144434)
Branch Coverage: 24.32% (20339/83640)
Coverage Report: http://coverage.selectdb-in.cc/coverage/434994907262cf5c00209eb9f5f2d9201ca2389b_434994907262cf5c00209eb9f5f2d9201ca2389b/report/index.html

@sollhui
Copy link
Contributor Author

sollhui commented Sep 5, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.84% (9394/25497)
Line Coverage: 28.28% (77459/273917)
Region Coverage: 27.67% (39965/144437)
Branch Coverage: 24.31% (20334/83644)
Coverage Report: http://coverage.selectdb-in.cc/coverage/434994907262cf5c00209eb9f5f2d9201ca2389b_434994907262cf5c00209eb9f5f2d9201ca2389b/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38003 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 434994907262cf5c00209eb9f5f2d9201ca2389b, data reload: false

------ Round 1 ----------------------------------
q1	17598	4341	4301	4301
q2	2024	188	174	174
q3	11869	939	1099	939
q4	10522	771	770	770
q5	7753	2831	2812	2812
q6	227	142	142	142
q7	961	631	608	608
q8	9354	2074	2070	2070
q9	7003	6471	6479	6471
q10	6987	2205	2210	2205
q11	446	240	248	240
q12	392	230	231	230
q13	17761	3115	3072	3072
q14	282	226	243	226
q15	545	492	488	488
q16	576	504	513	504
q17	975	682	718	682
q18	7270	6914	6973	6914
q19	1394	1020	945	945
q20	674	342	339	339
q21	3848	3106	2838	2838
q22	1135	1046	1033	1033
Total cold run time: 109596 ms
Total hot run time: 38003 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4346	4263	4352	4263
q2	379	274	277	274
q3	2861	2664	2674	2664
q4	1899	1594	1741	1594
q5	5558	5669	5719	5669
q6	229	142	143	142
q7	2266	1832	1816	1816
q8	3310	3417	3405	3405
q9	8736	8808	8864	8808
q10	3544	3371	3419	3371
q11	620	508	514	508
q12	822	669	687	669
q13	14041	3305	3206	3206
q14	305	288	296	288
q15	526	498	497	497
q16	633	578	602	578
q17	1842	1545	1533	1533
q18	8191	7820	7872	7820
q19	1722	1557	1665	1557
q20	2202	1915	1885	1885
q21	5880	5555	5517	5517
q22	1132	1041	1036	1036
Total cold run time: 71044 ms
Total hot run time: 57100 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193052 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 434994907262cf5c00209eb9f5f2d9201ca2389b, data reload: false

query1	1250	896	886	886
query2	6298	1995	1860	1860
query3	10608	3960	4011	3960
query4	59606	23707	23078	23078
query5	5377	498	506	498
query6	420	164	165	164
query7	5778	305	298	298
query8	293	218	222	218
query9	9029	2511	2499	2499
query10	508	306	274	274
query11	18123	15018	15274	15018
query12	176	111	122	111
query13	1590	423	421	421
query14	11155	7164	7686	7164
query15	239	172	175	172
query16	7564	496	475	475
query17	1123	568	588	568
query18	1731	294	298	294
query19	291	151	152	151
query20	122	108	107	107
query21	210	106	101	101
query22	4494	4494	4525	4494
query23	34641	33160	33558	33160
query24	5932	2876	2820	2820
query25	523	382	376	376
query26	694	150	178	150
query27	1783	276	282	276
query28	3635	2061	2058	2058
query29	654	408	421	408
query30	237	157	155	155
query31	934	755	786	755
query32	82	52	51	51
query33	427	315	288	288
query34	863	485	475	475
query35	844	713	732	713
query36	1068	934	918	918
query37	149	93	86	86
query38	3898	3884	3870	3870
query39	1439	1391	1392	1391
query40	195	115	116	115
query41	47	48	45	45
query42	116	96	94	94
query43	511	469	469	469
query44	1094	742	744	742
query45	197	167	169	167
query46	1097	727	731	727
query47	1866	1796	1804	1796
query48	404	306	304	304
query49	758	425	433	425
query50	828	425	420	420
query51	7082	6992	6904	6904
query52	99	88	86	86
query53	249	183	187	183
query54	582	465	454	454
query55	76	75	78	75
query56	278	249	264	249
query57	1174	1070	1042	1042
query58	218	229	239	229
query59	2931	2678	2859	2678
query60	296	268	274	268
query61	108	100	101	100
query62	750	612	670	612
query63	243	189	184	184
query64	2792	690	674	674
query65	3202	3136	3170	3136
query66	688	335	351	335
query67	15493	15226	15402	15226
query68	2961	588	593	588
query69	398	281	285	281
query70	1166	1181	1090	1090
query71	354	272	275	272
query72	5242	4042	4028	4028
query73	738	331	378	331
query74	9234	8811	8876	8811
query75	3350	2672	2686	2672
query76	1341	962	943	943
query77	528	337	326	326
query78	9708	10100	9794	9794
query79	1027	562	536	536
query80	689	511	516	511
query81	463	238	238	238
query82	254	152	151	151
query83	176	148	149	148
query84	264	77	81	77
query85	703	302	295	295
query86	312	305	273	273
query87	4405	4277	4306	4277
query88	3366	2383	2382	2382
query89	380	289	291	289
query90	1960	196	197	196
query91	142	118	113	113
query92	63	53	50	50
query93	1036	533	535	533
query94	731	301	310	301
query95	416	253	259	253
query96	587	265	266	265
query97	3177	3089	3124	3089
query98	218	205	212	205
query99	1485	1234	1292	1234
Total cold run time: 304036 ms
Total hot run time: 193052 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 434994907262cf5c00209eb9f5f2d9201ca2389b, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.04	0.04
query3	0.23	0.04	0.05
query4	1.68	0.09	0.08
query5	0.52	0.50	0.50
query6	1.12	0.72	0.72
query7	0.02	0.02	0.02
query8	0.06	0.05	0.04
query9	0.55	0.48	0.48
query10	0.54	0.53	0.54
query11	0.16	0.12	0.10
query12	0.14	0.12	0.12
query13	0.60	0.59	0.59
query14	2.05	2.04	2.05
query15	0.89	0.82	0.84
query16	0.38	0.38	0.38
query17	1.04	1.06	1.10
query18	0.21	0.22	0.22
query19	1.97	1.92	1.87
query20	0.01	0.00	0.01
query21	15.43	0.67	0.65
query22	4.50	6.58	2.60
query23	18.26	1.33	1.20
query24	2.07	0.22	0.22
query25	0.15	0.08	0.08
query26	0.26	0.18	0.18
query27	0.08	0.07	0.07
query28	13.29	1.02	1.00
query29	12.60	3.32	3.27
query30	0.24	0.06	0.05
query31	2.90	0.40	0.40
query32	3.25	0.50	0.49
query33	2.97	3.02	2.99
query34	17.02	4.38	4.36
query35	4.44	4.41	4.45
query36	0.66	0.46	0.49
query37	0.18	0.17	0.15
query38	0.15	0.15	0.14
query39	0.05	0.04	0.05
query40	0.16	0.13	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.04	0.04	0.05
Total cold run time: 111.15 s
Total hot run time: 32.71 s

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Sep 5, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 5, 2024
Copy link
Contributor

github-actions bot commented Sep 5, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit ed37c81 into apache:master Sep 5, 2024
25 of 29 checks passed
dataroaring pushed a commit that referenced this pull request Sep 7, 2024
…40367) (#40500)

pick #40367

Meet code dump when using single stream multi table load:
```
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
```

1. It is hard to guaranteed that msg is a C-style string ending in '\0'
character. If not, it may cause the core dump to access memory out of
bounds.
2. It is not need to calculate the length of msg twice.

Therefore, deleting the logic that using strlen to calculate the length
of msg.
yiguolei pushed a commit that referenced this pull request Sep 9, 2024
…40367) (#40511)

pick #40367

Meet code dump when using single stream multi table load:
```
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
```

1. It is hard to guaranteed that msg is a C-style string ending in '\0'
character. If not, it may cause the core dump to access memory out of
bounds.
2. It is not need to calculate the length of msg twice.

Therefore, deleting the logic that using strlen to calculate the length
of msg.
dataroaring pushed a commit that referenced this pull request Sep 23, 2024
…40367)

Meet code dump when using single stream multi table load:
```
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
```

1. It is hard to guaranteed that msg is a C-style string ending in '\0'
character. If not, it may cause the core dump to access memory out of
bounds.
2. It is not need to calculate the length of msg twice.

Therefore, deleting the logic that using strlen to calculate the length
of msg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants