Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](merge-on-write) Fix duplicate key problem after adding sequence column for merge-on-write table #39958

Merged

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Aug 27, 2024

Proposed changes

Currently, BaseTablet::lookup_row_key() use tablet_meta's schema to decide whether a tablet has sequence column. But users can use ALTER TABLE tbl ENABLE FEATURE "SEQUENCE_LOAD" WITH ... to add hidden sequence column on MOW table. This is a light schema change which will not change the BE's tablet meta, thus causing wrong behavior in BaseTablet::lookup_row_key().
This PR use the schema of the current load, which is the latest schema, to decide whether a tablet has sequence column and correct the lookup procedure in BaseTablet::lookup_row_key() and Segment::lookup_row_key().

branch-2.1-pick: #40010
branch-2.0-pick: #40015

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@bobhan1
Copy link
Contributor Author

bobhan1 commented Aug 27, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38196 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8207b0d452639081d207d750f49442c8d8872c33, data reload: false

------ Round 1 ----------------------------------
q1	17621	4735	4326	4326
q2	2024	186	175	175
q3	10472	1163	1165	1163
q4	10128	729	821	729
q5	7784	2958	2883	2883
q6	239	137	134	134
q7	985	653	612	612
q8	9687	2113	2093	2093
q9	7290	6575	6588	6575
q10	7001	2253	2247	2247
q11	468	245	246	245
q12	396	221	224	221
q13	17764	3060	2991	2991
q14	295	228	233	228
q15	528	493	479	479
q16	494	394	392	392
q17	997	669	670	669
q18	7574	6801	6837	6801
q19	1393	1056	1037	1037
q20	692	339	333	333
q21	3994	3083	2871	2871
q22	1091	1013	992	992
Total cold run time: 108917 ms
Total hot run time: 38196 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4363	4324	4283	4283
q2	395	284	270	270
q3	2925	2700	2705	2700
q4	1939	1652	1692	1652
q5	5413	5390	5421	5390
q6	214	129	130	129
q7	2090	1749	1768	1749
q8	3222	3400	3364	3364
q9	8506	8449	8480	8449
q10	3428	3208	3206	3206
q11	589	501	506	501
q12	819	625	619	619
q13	12972	3063	3085	3063
q14	316	268	286	268
q15	514	495	494	494
q16	468	412	432	412
q17	1796	1500	1461	1461
q18	8038	7502	7547	7502
q19	1648	1594	1575	1575
q20	2032	1846	1817	1817
q21	5441	5316	5302	5302
q22	1126	1008	1033	1008
Total cold run time: 68254 ms
Total hot run time: 55214 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187418 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8207b0d452639081d207d750f49442c8d8872c33, data reload: false

query1	921	371	369	369
query2	6477	2031	1918	1918
query3	6651	210	217	210
query4	29759	23401	23226	23226
query5	4178	516	504	504
query6	263	170	185	170
query7	4595	309	304	304
query8	258	201	209	201
query9	8661	2490	2465	2465
query10	447	286	271	271
query11	16024	15031	14963	14963
query12	151	104	108	104
query13	1642	405	374	374
query14	9977	8097	7143	7143
query15	286	171	179	171
query16	7663	477	493	477
query17	1601	580	572	572
query18	1670	297	299	297
query19	237	152	156	152
query20	125	114	114	114
query21	220	112	102	102
query22	4420	4053	4190	4053
query23	34204	33195	33581	33195
query24	11224	2918	2836	2836
query25	645	407	415	407
query26	1347	162	170	162
query27	2682	290	285	285
query28	7240	2056	2049	2049
query29	872	441	425	425
query30	315	158	148	148
query31	980	775	785	775
query32	99	60	63	60
query33	775	310	296	296
query34	957	485	480	480
query35	869	728	738	728
query36	1066	948	929	929
query37	160	95	90	90
query38	3970	3964	3882	3882
query39	1432	1400	1391	1391
query40	279	128	124	124
query41	51	51	49	49
query42	124	101	102	101
query43	514	483	479	479
query44	1200	755	764	755
query45	201	169	173	169
query46	1119	739	731	731
query47	1884	1791	1773	1773
query48	378	304	303	303
query49	1101	452	469	452
query50	819	432	428	428
query51	7096	7075	7082	7075
query52	107	95	93	93
query53	263	188	183	183
query54	1067	478	473	473
query55	82	83	81	81
query56	298	282	275	275
query57	1207	1056	1071	1056
query58	249	238	267	238
query59	3044	2839	2823	2823
query60	321	295	291	291
query61	125	125	124	124
query62	824	654	646	646
query63	238	196	186	186
query64	6607	2345	1745	1745
query65	3206	3183	3140	3140
query66	1294	344	378	344
query67	15635	15153	15260	15153
query68	5460	551	546	546
query69	597	366	305	305
query70	1120	1068	1054	1054
query71	456	290	285	285
query72	6577	2352	2070	2070
query73	808	322	324	322
query74	9324	8823	8819	8819
query75	3697	2652	2737	2652
query76	3609	1049	977	977
query77	683	327	313	313
query78	9645	9164	9453	9164
query79	2120	529	528	528
query80	874	530	553	530
query81	585	227	229	227
query82	1040	144	141	141
query83	250	154	152	152
query84	239	76	80	76
query85	1524	299	292	292
query86	468	304	301	301
query87	4287	4251	4205	4205
query88	3905	2344	2331	2331
query89	391	292	289	289
query90	1829	205	203	203
query91	128	108	147	108
query92	67	53	56	53
query93	1998	556	548	548
query94	822	296	301	296
query95	366	273	276	273
query96	597	277	271	271
query97	3213	3036	3063	3036
query98	221	207	206	206
query99	1509	1257	1290	1257
Total cold run time: 292805 ms
Total hot run time: 187418 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.92 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8207b0d452639081d207d750f49442c8d8872c33, data reload: false

query1	0.05	0.04	0.05
query2	0.08	0.04	0.04
query3	0.22	0.06	0.06
query4	1.65	0.08	0.09
query5	0.52	0.51	0.49
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.06	0.05	0.05
query9	0.54	0.50	0.48
query10	0.55	0.55	0.54
query11	0.16	0.11	0.11
query12	0.16	0.13	0.13
query13	0.61	0.59	0.59
query14	0.78	0.81	0.80
query15	0.89	0.84	0.82
query16	0.36	0.38	0.38
query17	1.07	1.06	1.04
query18	0.22	0.22	0.21
query19	1.95	1.80	1.79
query20	0.01	0.02	0.01
query21	15.41	0.68	0.67
query22	3.94	6.95	1.93
query23	18.27	1.42	1.26
query24	2.15	0.23	0.21
query25	0.14	0.08	0.08
query26	0.27	0.19	0.18
query27	0.08	0.07	0.08
query28	13.28	1.03	1.02
query29	12.59	3.34	3.34
query30	0.25	0.05	0.06
query31	2.86	0.40	0.41
query32	3.25	0.48	0.48
query33	2.98	3.00	2.99
query34	16.91	4.38	4.36
query35	4.45	4.42	4.40
query36	0.67	0.50	0.49
query37	0.19	0.16	0.15
query38	0.17	0.15	0.15
query39	0.05	0.04	0.04
query40	0.17	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.04	0.05
query43	0.05	0.05	0.04
Total cold run time: 109.32 s
Total hot run time: 30.92 s

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Aug 27, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@zhannngchen zhannngchen merged commit 69ec030 into apache:master Aug 28, 2024
31 of 34 checks passed
dataroaring pushed a commit that referenced this pull request Aug 28, 2024
…fter adding sequence column for merge-on-write table #39958" (#40010)

## Proposed changes

picks #39958
xiaokang pushed a commit that referenced this pull request Aug 29, 2024
dataroaring pushed a commit that referenced this pull request Sep 3, 2024
… column for merge-on-write table (#39958)

## Proposed changes
Currently, `BaseTablet::lookup_row_key()` use tablet_meta's schema to
decide whether a tablet has sequence column. But users can use `ALTER
TABLE tbl ENABLE FEATURE "SEQUENCE_LOAD" WITH ...` to add hidden
sequence column on MOW table. This is a light schema change which will
not change the BE's tablet meta, thus causing wrong behavior in
`BaseTablet::lookup_row_key()`.
This PR use the schema of the current load, which is the latest schema,
to decide whether a tablet has sequence column and correct the lookup
procedure in `BaseTablet::lookup_row_key()` and
`Segment::lookup_row_key()`.

branch-2.1-pick: #40010
branch-2.0-pick: #40015
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants