Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ISSUE and Pull Request templates #3

Merged
merged 1 commit into from
Jun 15, 2023
Merged

Update ISSUE and Pull Request templates #3

merged 1 commit into from
Jun 15, 2023

Conversation

tuhaihe
Copy link
Member

@tuhaihe tuhaihe commented Jun 15, 2023

For a better user and developer experience, update the Issue and PR templates.

Files changed:

  • Add bug-report.yml
  • Modify pull_request_template.md
  • Delete bug-report.md
  • Delete enhancement.md
  • Delete feature-request.md

For a better user and developer experience, update the Issue and PR templates.

Files changed:
* Add bug-report.yml
* Modify pull_request_template.md
* Delete bug-report.md
* Delete enhancement.md
* Delete feature-request.md
@tuhaihe tuhaihe merged commit 8fe9843 into pr Jun 15, 2023
@tuhaihe tuhaihe deleted the dj-pr branch June 15, 2023 03:10
HuSen8891 added a commit to HuSen8891/cloudberrydb that referenced this pull request Mar 1, 2024
For test case:

create table t0(c0 inet) distributed randomly;
create table t2(c0 inet) distributed randomly;
create table t3(c0 inet) distributed randomly;

SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971))))
UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971))))
UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL;

will cause crash because of assert failure in 'create_plan_recurse'.

'apache#3  0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
 apache#4  0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79
 apache#5  0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers",
     errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48
 apache#6  0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623
 apache#7  0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380
 apache#8  0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481
 apache#9  0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316
 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608
 apache#11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392'

The parallel_workers should be set to zero because parallel full join is not supported yet.
my-ship-it pushed a commit that referenced this pull request Mar 4, 2024
For test case:

create table t0(c0 inet) distributed randomly;
create table t2(c0 inet) distributed randomly;
create table t3(c0 inet) distributed randomly;

SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971))))
UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971))))
UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL;

will cause crash because of assert failure in 'create_plan_recurse'.

'#3  0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
 #4  0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79
 #5  0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers",
     errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48
 #6  0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623
 #7  0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380
 #8  0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481
 #9  0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316
 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608
 #11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392'

The parallel_workers should be set to zero because parallel full join is not supported yet.
foreyes pushed a commit to foreyes/cloudberrydb that referenced this pull request Jul 8, 2024
For test case:

create table t0(c0 inet) distributed randomly;
create table t2(c0 inet) distributed randomly;
create table t3(c0 inet) distributed randomly;

SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971))))
UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971))))
UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0))
WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL;

will cause crash because of assert failure in 'create_plan_recurse'.

'apache#3  0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
 apache#4  0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79
 apache#5  0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers",
     errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48
 apache#6  0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623
 apache#7  0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380
 apache#8  0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481
 apache#9  0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316
 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608
 apache#11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392'

The parallel_workers should be set to zero because parallel full join is not supported yet.
robozmey added a commit to robozmey/cloudberrydb that referenced this pull request Dec 13, 2024
Dumps CdbExplain_NodeSummary struct in EXPLAIN ANALYZE

`set gp_enable_explain_node_summary=on;`

`drop table if exists tt; create table tt (a int, b int) distributed randomly;`

`explain (analyze,verbose) insert into tt select * from generate_series(1,1000)a,generate_series(1,1000)b;`

`explain (analyze,verbose) select * from tt where a > b;`

```
                                                                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8) (actual time=33.612..78.159 rows=1000 loops=1)
   Output: a, b
   Node Summary:                  vmax       vsum       vcnt       imax
     ntuples:                     1000       1000          1         -1
     execmemused:                    0          0          0          0
     workmemused:                    0          0          0          0
     workmemwanted:                  0          0          0          0
     totalWorkfileCreated:           0          0          0          0
     peakMemBalance:             44592      44592          1         -1
     totalPartTableScanned:          0          0          0          0
     segindex0: -1
     ninst: 1
     StatInsts:
       (segN) pstype starttime counter firsttuple startup total ntuples nloops execmemused workmemused workmemwanted workfileCreated firststart peakMemBalance numPartScanned sortMethod sortSpaceType sortSpaceUsed bnotes enotes
       (seg-1) 241 0.00 0.03 0.08 1000.00 1.00 0 0 0 0 0 0 44592.00 0 0 0 0 0 0 0
   ->  Seq Scan on public.tt  (cost=0.00..431.00 rows=1 width=8) (actual time=2.512..77.640 rows=353 loops=1)
         Output: a, b
         Filter: (tt.a = 100)
         Node Summary:                  vmax       vsum       vcnt       imax
           ntuples:                      353       1000          3          0
           execmemused:                    0          0          0          0
           workmemused:                    0          0          0          0
           workmemwanted:                  0          0          0          0
           totalWorkfileCreated:           0          0          0          0
           peakMemBalance:             19072      57216          3          0
           totalPartTableScanned:          0          0          0          0
           segindex0: 0
           ninst: 3
           StatInsts:
             (segN) pstype starttime counter firsttuple startup total ntuples nloops execmemused workmemused workmemwanted workfileCreated firststart peakMemBalance numPartScanned sortMethod sortSpaceType sortSpaceUsed bnotes enotes
             (seg0) 211 0.00 0.00 0.08 353.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0
             (seg1) 211 0.00 0.00 0.03 330.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0
             (seg2) 211 0.00 0.00 0.05 317.00 1.00 0 0 0 0 0 0 19072.00 0 0 0 0 0 0 0
 Planning time: 4.514 ms
   (slice0)    Executor memory: 119K bytes.
   (slice1)    Executor memory: 27K bytes avg x 3 workers, 27K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 78.584 ms
(39 rows)
```
yjhjstz pushed a commit to yjhjstz/cloudberry that referenced this pull request Dec 20, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
yjhjstz pushed a commit to yjhjstz/cloudberry that referenced this pull request Dec 22, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
yjhjstz pushed a commit to yjhjstz/cloudberry that referenced this pull request Dec 24, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
my-ship-it pushed a commit that referenced this pull request Dec 24, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so #2 is a reasonable defense,
and #3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
yjhjstz pushed a commit to yjhjstz/cloudberry that referenced this pull request Dec 24, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
yjhjstz pushed a commit to yjhjstz/cloudberry that referenced this pull request Dec 24, 2024
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so apache#2 is a reasonable defense,
and apache#3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
avamingli pushed a commit that referenced this pull request Jan 23, 2025
## Problem
An error occurs in python lib when a plpython function is executed.
After our analysis, in the user's cluster, a plpython UDF 
was running with the unstable network, and got a timeout error:
`failed to acquire resources on one or more segments`.
Then a plpython UDF was run in the same session, and the UDF
failed with GC error.

Here is the core dump:
```
2023-11-24 10:15:18.945507 CST,,,p2705198,th2081832064,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
    #0 0x7f7c68b6d55b in frame_dealloc /home/cc/repo/cpython/Objects/frameobject.c:509:5
    #1 0x7f7c68b5109d in gen_send_ex /home/cc/repo/cpython/Objects/genobject.c:108:9
    #2 0x7f7c68af9ddd in PyIter_Next /home/cc/repo/cpython/Objects/abstract.c:3118:14
    #3 0x7f7c78caa5c0 in PLy_exec_function /home/cc/repo/gpdb6/src/pl/plpython/plpy_exec.c:134:11
    #4 0x7f7c78cb5ffb in plpython_call_handler /home/cc/repo/gpdb6/src/pl/plpython/plpy_main.c:387:13
    #5 0x562f5e008bb5 in ExecMakeTableFunctionResult /home/cc/repo/gpdb6/src/backend/executor/execQual.c:2395:13
    #6 0x562f5e0dddec in FunctionNext_guts /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:142:5
    #7 0x562f5e0da094 in FunctionNext /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:350:11
    #8 0x562f5e03d4b0 in ExecScanFetch /home/cc/repo/gpdb6/src/backend/executor/execScan.c:84:9
    #9 0x562f5e03cd8f in ExecScan /home/cc/repo/gpdb6/src/backend/executor/execScan.c:154:10
    #10 0x562f5e0da072 in ExecFunctionScan /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:380:9
    #11 0x562f5e001a1c in ExecProcNode /home/cc/repo/gpdb6/src/backend/executor/execProcnode.c:1071:13
    #12 0x562f5dfe6377 in ExecutePlan /home/cc/repo/gpdb6/src/backend/executor/execMain.c:3202:10
    #13 0x562f5dfe5bf4 in standard_ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:1171:5
    #14 0x562f5dfe4877 in ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:992:4
    #15 0x562f5e857e69 in PortalRunSelect /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1164:4
    #16 0x562f5e856d3f in PortalRun /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1005:18
    #17 0x562f5e84607a in exec_simple_query /home/cc/repo/gpdb6/src/backend/tcop/postgres.c:1848:10
```

## Reproduce
We can use a simple procedure to reproduce the above problem:
- set timeout GUC: `gpconfig -c gp_segment_connect_timeout -v 5` and `gpstop -ari`
- prepare function:
```
CREATE EXTENSION plpythonu;
CREATE OR REPLACE FUNCTION test_func() RETURNS SETOF int AS
$$
plpy.execute("select pg_backend_pid()")

for i in range(0, 5):
    yield (i)

$$ LANGUAGE plpythonu;
```
- exit from the current psql session.
- stop the postmaster of segment: `gdb -p "the pid of segment postmaster"`
- enter a psql session.
- call `SELECT test_func();` and get error
```
gpadmin=# select test_func();
ERROR:  function "test_func" error fetching next item from iterator (plpy_elog.c:121)
DETAIL:  Exception: failed to acquire resources on one or more segments
CONTEXT:  Traceback (most recent call last):
PL/Python function "test_func"
```
- quit gdb and make postmaster runnable.
- call  `SELECT test_func();` again and get panic
```
gpadmin=# SELECT test_func();
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 
```

## Analysis
- There is an SPI call in test_func(): `plpy.execute()`. 
- Then coordinator will start a subtransaction by PLy_spi_subtransaction_begin();
- Meanwhile, if the segment cannot receive the instruction from the coordinator,
  the subtransaction beginning procedure return fails.
- BUT! The Python processor does not know whether an error happened and
  does not clean its environment.
- Then the next plpython UDF in the same session will fail due to the wrong
  Python environment.

## Solution
- Use try-catch to catch the exception caused by PLy_spi_subtransaction_begin()
- set the python error indicator by PLy_spi_exception_set()


Co-authored-by: Chen Mulong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant