Update ISSUE and Pull Request templates #3

tuhaihe · 2023-06-15T03:08:36Z

For a better user and developer experience, update the Issue and PR templates.

Files changed:

Add bug-report.yml
Modify pull_request_template.md
Delete bug-report.md
Delete enhancement.md
Delete feature-request.md

For a better user and developer experience, update the Issue and PR templates. Files changed: * Add bug-report.yml * Modify pull_request_template.md * Delete bug-report.md * Delete enhancement.md * Delete feature-request.md

For test case: create table t0(c0 inet) distributed randomly; create table t2(c0 inet) distributed randomly; create table t3(c0 inet) distributed randomly; SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971)))) UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL; will cause crash because of assert failure in 'create_plan_recurse'. 'apache#3 0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 apache#4 0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79 apache#5 0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers", errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48 apache#6 0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623 apache#7 0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380 apache#8 0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481 apache#9 0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608 apache#11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392' The parallel_workers should be set to zero because parallel full join is not supported yet.

For test case: create table t0(c0 inet) distributed randomly; create table t2(c0 inet) distributed randomly; create table t3(c0 inet) distributed randomly; SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971)))) UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL; will cause crash because of assert failure in 'create_plan_recurse'. '#3 0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers", errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48 #6 0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623 #7 0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380 #8 0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481 #9 0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608 #11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392' The parallel_workers should be set to zero because parallel full join is not supported yet.

For test case: create table t0(c0 inet) distributed randomly; create table t2(c0 inet) distributed randomly; create table t3(c0 inet) distributed randomly; SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971)))) UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL; will cause crash because of assert failure in 'create_plan_recurse'. 'apache#3 0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 apache#4 0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79 apache#5 0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers", errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48 apache#6 0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623 apache#7 0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380 apache#8 0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481 apache#9 0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608 apache#11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392' The parallel_workers should be set to zero because parallel full join is not supported yet.

Dumps CdbExplain_NodeSummary struct in EXPLAIN ANALYZE `set gp_enable_explain_node_summary=on;` `drop table if exists tt; create table tt (a int, b int) distributed randomly;` `explain (analyze,verbose) insert into tt select * from generate_series(1,1000)a,generate_series(1,1000)b;` `explain (analyze,verbose) select * from tt where a > b;` ``` QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8) (actual time=33.612..78.159 rows=1000 loops=1) Output: a, b Node Summary: vmax vsum vcnt imax ntuples: 1000 1000 1 -1 execmemused: 0 0 0 0 workmemused: 0 0 0 0 workmemwanted: 0 0 0 0 totalWorkfileCreated: 0 0 0 0 peakMemBalance: 44592 44592 1 -1 totalPartTableScanned: 0 0 0 0 segindex0: -1 ninst: 1 StatInsts: (segN) pstype starttime counter firsttuple startup total ntuples nloops execmemused workmemused workmemwanted workfileCreated firststart peakMemBalance numPartScanned sortMethod sortSpaceType sortSpaceUsed bnotes enotes (seg-1) 241 0.00 0.03 0.08 1000.00 1.00 0 0 0 0 0 0 44592.00 0 0 0 0 0 0 0 -> Seq Scan on public.tt (cost=0.00..431.00 rows=1 width=8) (actual time=2.512..77.640 rows=353 loops=1) Output: a, b Filter: (tt.a = 100) Node Summary: vmax vsum vcnt imax ntuples: 353 1000 3 0 execmemused: 0 0 0 0 workmemused: 0 0 0 0 workmemwanted: 0 0 0 0 totalWorkfileCreated: 0 0 0 0 peakMemBalance: 19072 57216 3 0 totalPartTableScanned: 0 0 0 0 segindex0: 0 ninst: 3 StatInsts: (segN) pstype starttime counter firsttuple startup total ntuples nloops execmemused workmemused workmemwanted workfileCreated firststart peakMemBalance numPartScanned sortMethod sortSpaceType sortSpaceUsed bnotes enotes (seg0) 211 0.00 0.00 0.08 353.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0 (seg1) 211 0.00 0.00 0.03 330.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0 (seg2) 211 0.00 0.00 0.05 317.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0 Planning time: 4.514 ms (slice0) Executor memory: 119K bytes. (slice1) Executor memory: 27K bytes avg x 3 workers, 27K bytes max (seg0). Memory used: 128000kB Optimizer: Pivotal Optimizer (GPORCA) Execution time: 78.584 ms (39 rows) ```

We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so apache#2 is a reasonable defense, and apache#3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]

We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so #2 is a reasonable defense, and #3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]

We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so apache#2 is a reasonable defense, and apache#3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]

## Problem An error occurs in python lib when a plpython function is executed. After our analysis, in the user's cluster, a plpython UDF was running with the unstable network, and got a timeout error: `failed to acquire resources on one or more segments`. Then a plpython UDF was run in the same session, and the UDF failed with GC error. Here is the core dump: ``` 2023-11-24 10:15:18.945507 CST,,,p2705198,th2081832064,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log: #0 0x7f7c68b6d55b in frame_dealloc /home/cc/repo/cpython/Objects/frameobject.c:509:5 #1 0x7f7c68b5109d in gen_send_ex /home/cc/repo/cpython/Objects/genobject.c:108:9 #2 0x7f7c68af9ddd in PyIter_Next /home/cc/repo/cpython/Objects/abstract.c:3118:14 #3 0x7f7c78caa5c0 in PLy_exec_function /home/cc/repo/gpdb6/src/pl/plpython/plpy_exec.c:134:11 #4 0x7f7c78cb5ffb in plpython_call_handler /home/cc/repo/gpdb6/src/pl/plpython/plpy_main.c:387:13 #5 0x562f5e008bb5 in ExecMakeTableFunctionResult /home/cc/repo/gpdb6/src/backend/executor/execQual.c:2395:13 #6 0x562f5e0dddec in FunctionNext_guts /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:142:5 #7 0x562f5e0da094 in FunctionNext /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:350:11 #8 0x562f5e03d4b0 in ExecScanFetch /home/cc/repo/gpdb6/src/backend/executor/execScan.c:84:9 #9 0x562f5e03cd8f in ExecScan /home/cc/repo/gpdb6/src/backend/executor/execScan.c:154:10 #10 0x562f5e0da072 in ExecFunctionScan /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:380:9 #11 0x562f5e001a1c in ExecProcNode /home/cc/repo/gpdb6/src/backend/executor/execProcnode.c:1071:13 #12 0x562f5dfe6377 in ExecutePlan /home/cc/repo/gpdb6/src/backend/executor/execMain.c:3202:10 #13 0x562f5dfe5bf4 in standard_ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:1171:5 #14 0x562f5dfe4877 in ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:992:4 #15 0x562f5e857e69 in PortalRunSelect /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1164:4 #16 0x562f5e856d3f in PortalRun /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1005:18 #17 0x562f5e84607a in exec_simple_query /home/cc/repo/gpdb6/src/backend/tcop/postgres.c:1848:10 ``` ## Reproduce We can use a simple procedure to reproduce the above problem: - set timeout GUC: `gpconfig -c gp_segment_connect_timeout -v 5` and `gpstop -ari` - prepare function: ``` CREATE EXTENSION plpythonu; CREATE OR REPLACE FUNCTION test_func() RETURNS SETOF int AS $$ plpy.execute("select pg_backend_pid()") for i in range(0, 5): yield (i) $$ LANGUAGE plpythonu; ``` - exit from the current psql session. - stop the postmaster of segment: `gdb -p "the pid of segment postmaster"` - enter a psql session. - call `SELECT test_func();` and get error ``` gpadmin=# select test_func(); ERROR: function "test_func" error fetching next item from iterator (plpy_elog.c:121) DETAIL: Exception: failed to acquire resources on one or more segments CONTEXT: Traceback (most recent call last): PL/Python function "test_func" ``` - quit gdb and make postmaster runnable. - call `SELECT test_func();` again and get panic ``` gpadmin=# SELECT test_func(); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> ``` ## Analysis - There is an SPI call in test_func(): `plpy.execute()`. - Then coordinator will start a subtransaction by PLy_spi_subtransaction_begin(); - Meanwhile, if the segment cannot receive the instruction from the coordinator, the subtransaction beginning procedure return fails. - BUT! The Python processor does not know whether an error happened and does not clean its environment. - Then the next plpython UDF in the same session will fail due to the wrong Python environment. ## Solution - Use try-catch to catch the exception caused by PLy_spi_subtransaction_begin() - set the python error indicator by PLy_spi_exception_set() Co-authored-by: Chen Mulong <[email protected]>

Update ISSUE and Pull Request templates

63a97ef

For a better user and developer experience, update the Issue and PR templates. Files changed: * Add bug-report.yml * Modify pull_request_template.md * Delete bug-report.md * Delete enhancement.md * Delete feature-request.md

tuhaihe merged commit 8fe9843 into pr Jun 15, 2023

tuhaihe deleted the dj-pr branch June 15, 2023 03:10

HuSen8891 mentioned this pull request Mar 1, 2024

Fix: initialize parallel_workers to zero for CdbPathLocus_HashedOJ #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ISSUE and Pull Request templates #3

Update ISSUE and Pull Request templates #3

tuhaihe commented Jun 15, 2023

Update ISSUE and Pull Request templates #3

Update ISSUE and Pull Request templates #3

Conversation

tuhaihe commented Jun 15, 2023