Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leung dust scheme causes CAM7 runs to fail with 1 task #2962

Open
peverwhee opened this issue Feb 12, 2025 · 1 comment
Open

Leung dust scheme causes CAM7 runs to fail with 1 task #2962

peverwhee opened this issue Feb 12, 2025 · 1 comment
Labels
bfb bit-for-bit bug something is working incorrectly investigation Needs to be verified and more investigation into what's going on. next this should get some attention in the next week or two. Normally each Thursday SE meeting.

Comments

@peverwhee
Copy link

Brief summary of bug

Running our coarsest CAM7 SE grid with 1 task on izumi/GNU results in a segfault very early in the run (before anything is logged to the atm logfile).

General bug information

CTSM version you are using: ctsm5.3.024

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: CAM7 configurations (which are the ones that use the Leung dust scheme)

Details of bug

We AMP SEs often run with our coarsest grids on a single task for testing, so this is impacting our ability to do that (and to use the TotalView debugger on izumi). Unfortunately, I cannot easily run a test with NAG because there's currently a known issue with running RRTMGP with NAG.

As described below, everything works OK on derecho, so it looks to be a memory issue.

@fvitt this might be relevant to you as well?

Important details of your setup / configuration so we can reproduce the bug

Here are the tests I have run:

compset compiler machine resolution misc config dust scheme result
FLTHIST gnu izumi ne3pg3 ./xmlchange NTASKS=1 Leung_2023 FAIL
FLTHIST gnu izumi ne3pg3 ./xmlchange NTASKS=2 Leung_2023 PASS
FLTHIST gnu izumi ne3pg3 ./xmlchange NTASKS=1 Zender_2003 PASS
FLTHIST gnu derecho ne3pg3 ./xmlchange NTASKS=1 Leung_2023 PASS
FLTHIST intel derecho ne3pg3 ./xmlchange NTASKS=1 Leung_2023 PASS

Important output or errors that show the problem

Error in cesm.log:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 1536870 RUNNING AT i018.cgd.ucar.edu
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Last log in lnd.log:

(GETFIL): attempting to find local file
surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc
(GETFIL): using /fs/cgd/csm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.3.0/surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc
 Opened existing file /fs/cgd/csm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.3.0/surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc         152
 ncd_inqvid: variable ETALAKE is not on dataset
 WARNING:: ETALAKE not found on surface data set. All lake columns will have eta set equal to default value as a function of depth.

For context, here's what a successful run looks like in the lnd log:

(GETFIL): attempting to find local file
surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc
(GETFIL): using /fs/cgd/csm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.3.0/surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc
 Opened existing file /fs/cgd/csm/inputdata/lnd/clm2/surfdata_esmf/ctsm5.3.0/surfdata_ne3np4.pg3_hist_1850_78pfts_c240908.nc         152
 ncd_inqvid: variable ETALAKE is not on dataset
 WARNING:: ETALAKE not found on surface data set. All lake columns will have eta set equal to default value as a function of depth.
 ncd_inqvid: variable LAKEFETCH is not on dataset
 WARNING:: LAKEFETCH not found on surface data set. All lake columns will have fetch set equal to default value as a function of depth.
 Attempting to initialize time invariant variables for lakes
 Successfully initialized time invariant variables for lakes

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 13, 2025

Hmmm. We do test simulations with one processor. And they work. But we must not do enough with the ne3 grid.

The case that's failing for you is an F compset case, and we need to try the equivalent I compset.

@ekluzek ekluzek added investigation Needs to be verified and more investigation into what's going on. bug something is working incorrectly next this should get some attention in the next week or two. Normally each Thursday SE meeting. bfb bit-for-bit labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit bug something is working incorrectly investigation Needs to be verified and more investigation into what's going on. next this should get some attention in the next week or two. Normally each Thursday SE meeting.
Projects
None yet
Development

No branches or pull requests

2 participants