Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](group commit) fix group commit core if be inject FragmentMgr.exec_plan_fragment.failed #39339

Merged
merged 1 commit into from
Aug 15, 2024

Conversation

mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Aug 14, 2024

*** SIGSEGV address not mapped to object (@0x0) received by PID 1898955 (TID 1900522 OR 0x7f4f94abc640) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5335001520 in /lib/x86_64-linux-gnu/libc.so.6
 4# brpc::Socket::Write(brpc::SocketMessagePtr<void>&, brpc::Socket::WriteOptions const*) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 5# brpc::policy::HttpResponseSender::~HttpResponseSender() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 6# brpc::policy::HttpResponseSenderAsDone::~HttpResponseSenderAsDone() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 7# std::_Function_handler<void (), doris::PInternalService::group_commit_insert(google::protobuf::RpcController*, doris::PGroupCommitInsertRequest const*, doris::PGroupCommitInsertResponse*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 8# doris::WorkThreadPool<false>::work_thread(int) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/util/work_thread_pool.hpp:159

@github-actions github-actions bot added the doing label Aug 14, 2024
Copy link
Collaborator

@Yukang-Lian Yukang-Lian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Aug 14, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37794 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a0651e33369016b034e879c2df5015f9b12c10bd, data reload: false

------ Round 1 ----------------------------------
q1	18732	4524	4405	4405
q2	2478	178	183	178
q3	10987	1182	1079	1079
q4	11228	774	666	666
q5	7767	2846	2857	2846
q6	228	146	152	146
q7	953	606	594	594
q8	9320	2045	2060	2045
q9	7264	6554	6522	6522
q10	7060	2220	2177	2177
q11	456	245	252	245
q12	394	223	222	222
q13	18987	2966	2960	2960
q14	293	243	247	243
q15	527	475	490	475
q16	521	386	377	377
q17	993	708	705	705
q18	7617	6877	6842	6842
q19	5812	1009	945	945
q20	702	340	326	326
q21	4367	2872	2798	2798
q22	1106	1002	998	998
Total cold run time: 117792 ms
Total hot run time: 37794 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4530	4240	4261	4240
q2	379	270	258	258
q3	2865	2637	2651	2637
q4	1879	1650	1604	1604
q5	5379	5394	5388	5388
q6	220	131	129	129
q7	2061	1727	1669	1669
q8	3206	3344	3343	3343
q9	8516	8420	8393	8393
q10	3414	3165	3162	3162
q11	597	492	493	492
q12	763	605	588	588
q13	16431	2981	3015	2981
q14	300	275	273	273
q15	524	485	473	473
q16	464	408	421	408
q17	1797	1507	1481	1481
q18	7839	7615	7446	7446
q19	1658	1550	1611	1550
q20	1996	1833	1835	1833
q21	5342	4960	5044	4960
q22	1090	986	1024	986
Total cold run time: 71250 ms
Total hot run time: 54294 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184675 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a0651e33369016b034e879c2df5015f9b12c10bd, data reload: false

query1	909	384	364	364
query2	6461	1887	1788	1788
query3	6638	207	210	207
query4	33764	23232	22974	22974
query5	4206	520	507	507
query6	266	162	156	156
query7	4585	287	287	287
query8	230	214	199	199
query9	8667	2435	2400	2400
query10	424	271	261	261
query11	17816	14958	15038	14958
query12	148	99	99	99
query13	1631	369	360	360
query14	9631	7228	7192	7192
query15	231	166	170	166
query16	7681	489	490	489
query17	1184	565	556	556
query18	1910	292	289	289
query19	203	150	144	144
query20	118	107	108	107
query21	203	112	102	102
query22	4206	4090	4105	4090
query23	34038	33189	33219	33189
query24	12474	2863	2791	2791
query25	664	378	387	378
query26	1782	156	153	153
query27	2970	273	271	271
query28	7879	2054	2045	2045
query29	1098	424	401	401
query30	301	149	150	149
query31	1001	737	762	737
query32	92	55	53	53
query33	752	282	277	277
query34	1003	453	459	453
query35	841	719	695	695
query36	1112	925	942	925
query37	276	86	81	81
query38	3877	3830	3787	3787
query39	1444	1375	1367	1367
query40	273	118	117	117
query41	48	48	44	44
query42	110	93	97	93
query43	497	447	449	447
query44	1267	727	730	727
query45	196	163	163	163
query46	1109	770	770	770
query47	1829	1729	1713	1713
query48	356	290	281	281
query49	1181	417	413	413
query50	809	409	407	407
query51	6810	6735	6716	6716
query52	107	93	92	92
query53	264	180	185	180
query54	1030	452	444	444
query55	75	75	74	74
query56	267	255	279	255
query57	1162	1060	1043	1043
query58	234	250	220	220
query59	2959	2757	2643	2643
query60	293	257	267	257
query61	97	95	94	94
query62	844	624	643	624
query63	222	181	184	181
query64	6363	2289	1746	1746
query65	3257	3159	3170	3159
query66	1361	341	317	317
query67	15199	14993	14773	14773
query68	4512	536	546	536
query69	399	276	272	272
query70	1164	1096	1063	1063
query71	438	277	275	275
query72	6441	2244	1993	1993
query73	735	324	316	316
query74	9229	8753	8706	8706
query75	3467	2743	2727	2727
query76	2744	997	1048	997
query77	520	322	313	313
query78	11103	9411	8894	8894
query79	3205	526	524	524
query80	1746	498	506	498
query81	590	231	225	225
query82	655	140	139	139
query83	325	148	148	148
query84	276	80	77	77
query85	715	289	269	269
query86	481	302	299	299
query87	4391	4240	4290	4240
query88	4304	2297	2411	2297
query89	431	283	280	280
query90	2008	199	199	199
query91	123	95	94	94
query92	68	49	52	49
query93	4584	534	534	534
query94	850	297	293	293
query95	352	265	261	261
query96	614	268	265	265
query97	3195	2997	3048	2997
query98	233	203	198	198
query99	1686	1268	1257	1257
Total cold run time: 302519 ms
Total hot run time: 184675 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a0651e33369016b034e879c2df5015f9b12c10bd, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.50	0.49	0.47
query6	1.13	0.74	0.74
query7	0.02	0.01	0.01
query8	0.05	0.05	0.04
query9	0.56	0.48	0.47
query10	0.52	0.55	0.53
query11	0.15	0.11	0.12
query12	0.15	0.13	0.13
query13	0.60	0.61	0.59
query14	0.76	0.79	0.77
query15	0.85	0.81	0.81
query16	0.37	0.36	0.38
query17	0.97	1.03	1.04
query18	0.22	0.21	0.22
query19	1.85	1.79	1.69
query20	0.01	0.02	0.01
query21	15.40	0.75	0.66
query22	4.66	7.61	1.72
query23	18.26	1.50	1.29
query24	1.88	0.27	0.22
query25	0.15	0.09	0.08
query26	0.29	0.21	0.22
query27	0.46	0.23	0.22
query28	13.31	1.03	1.00
query29	12.60	3.39	3.35
query30	0.24	0.05	0.05
query31	2.90	0.41	0.39
query32	3.25	0.49	0.48
query33	2.94	2.93	2.96
query34	17.08	4.37	4.33
query35	4.43	4.38	4.42
query36	0.66	0.49	0.47
query37	0.18	0.17	0.15
query38	0.16	0.15	0.14
query39	0.05	0.03	0.04
query40	0.16	0.12	0.13
query41	0.09	0.04	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.03
Total cold run time: 110.01 s
Total hot run time: 30.51 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 15, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit d0d3ad0 into apache:master Aug 15, 2024
26 of 28 checks passed
mymeiyi added a commit to mymeiyi/doris that referenced this pull request Aug 15, 2024
…ec_plan_fragment.failed (apache#39339)

```
*** SIGSEGV address not mapped to object (@0x0) received by PID 1898955 (TID 1900522 OR 0x7f4f94abc640) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5335001520 in /lib/x86_64-linux-gnu/libc.so.6
 4# brpc::Socket::Write(brpc::SocketMessagePtr<void>&, brpc::Socket::WriteOptions const*) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 5# brpc::policy::HttpResponseSender::~HttpResponseSender() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 6# brpc::policy::HttpResponseSenderAsDone::~HttpResponseSenderAsDone() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 7# std::_Function_handler<void (), doris::PInternalService::group_commit_insert(google::protobuf::RpcController*, doris::PGroupCommitInsertRequest const*, doris::PGroupCommitInsertResponse*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 8# doris::WorkThreadPool<false>::work_thread(int) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/util/work_thread_pool.hpp:159
```
dataroaring pushed a commit that referenced this pull request Aug 15, 2024
dataroaring pushed a commit that referenced this pull request Aug 17, 2024
…ec_plan_fragment.failed (#39339)

```
*** SIGSEGV address not mapped to object (@0x0) received by PID 1898955 (TID 1900522 OR 0x7f4f94abc640) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5335001520 in /lib/x86_64-linux-gnu/libc.so.6
 4# brpc::Socket::Write(brpc::SocketMessagePtr<void>&, brpc::Socket::WriteOptions const*) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 5# brpc::policy::HttpResponseSender::~HttpResponseSender() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 6# brpc::policy::HttpResponseSenderAsDone::~HttpResponseSenderAsDone() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
 7# std::_Function_handler<void (), doris::PInternalService::group_commit_insert(google::protobuf::RpcController*, doris::PGroupCommitInsertRequest const*, doris::PGroupCommitInsertResponse*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 8# doris::WorkThreadPool<false>::work_thread(int) at /home/zcp/repo_center/doris_branch-3.0.2-tmp/doris/be/src/util/work_thread_pool.hpp:159
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.2-merged doing reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants