Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi2prv: merging ends with segmentation fault #69

Open
vineetsoni opened this issue Jul 25, 2022 · 2 comments
Open

mpi2prv: merging ends with segmentation fault #69

vineetsoni opened this issue Jul 25, 2022 · 2 comments

Comments

@vineetsoni
Copy link

Environment:

Extrae-4.0.1 built with GCC 10.2.1, Intel MPI 2021.5, PAPI 6.0.0.1 and Libunwind-1.6.2 on AlmaLinux 8.5

Execution command

mpi2prv -syn -f TRACE.mpits -e <exe> -o <output>.prv

Error log

mpi2prv: Error! File -syn does not contain a valid extension!. Skipping.
mpi2prv: Retrieving hardware counters definitions for ptask 1 from global SYM.
mpi2prv: A total of 6 symbols were imported from TRACE.sym file
mpi2prv: 0 function symbols imported
mpi2prv: 6 HWC counter descriptions imported
merger: Output trace format is: Paraver
merger: Extrae 4.0.1
mpi2prv: Assigned nodes < myhostname >
mpi2prv: Assigned size per processor < <1 Mbyte >
mpi2prv: File /u/vinson3z/Downloads/Gearbox_explicit_20220722/extrae-results/set-0/[email protected] is object 1.37.1 on node myhostname assigned to processor 0
mpi2prv: Time synchronization has been turned off
mpi2prv: Checking for target directory existence... exists, ok!
mpi2prv: Selected output trace format is Paraver
mpi2prv: Stored trace format is Paraver
mpi2prv: Enabling Time Synchronization (Node).
WARNING: TimeSync_CalculateLatencies: Task 0 was not initialized. Synchronization disabled!
mpi2prv: Circular buffer enabled at tracing time? NO
mpi2prv: Parsing intermediate files
mpi2prv: Progress 1 of 2 ... 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% done
mpi2prv: Processor 0 succeeded to translate its assigned files
mpi2prv: Elapsed time translating files: 0 hours 0 minutes 0 seconds
mpi2prv: Elapsed time sorting addresses: 0 hours 0 minutes 0 seconds
mpi2prv: Generating tracefile (intermediate buffers of 6710784 events)
         This process can take a while. Please, be patient.
mpi2prv: Progress 2 of 2 ... 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% done
mpi2prv: Elapsed time merge step: 0 hours 0 minutes 0 seconds
mpi2prv: Resulting tracefile occupies 126494 bytes
mpi2prv: Removing temporal files... done
mpi2prv: Elapsed time removing temporal files: 0 hours 0 minutes 0 seconds
Segmentation fault (core dumped)

Error backtrace (gdb)

0x000000000041a614 in ObjectTable_dumpAddresses (fd=fd@entry=0x5e3210, eventstart=41000001, eventstart@entry=41000000) at ../../../src/merger/common/object_tree.c:294
294                             for (_address = 0; _address < task_info->binary_objects[0].nDataSymbols; _address++)
Missing separate debuginfos, use: dnf debuginfo-install zlib-1.2.11-18.el8_5.x86_64
(gdb) bt
#0  0x000000000041a614 in ObjectTable_dumpAddresses (fd=fd@entry=0x5e3210, eventstart=41000001, eventstart@entry=41000000) at ../../../src/merger/common/object_tree.c:294
#1  0x000000000040d922 in Labels_GeneratePCFfile (name=name@entry=0x7fffffff73a0 "sphflow.pcf", options=options@entry=1041) at ../../../src/merger/paraver/labels.c:1066
#2  0x0000000000410fa0 in Paraver_ProcessTraceFiles (nfiles=1, files=0x5d42f0, num_appl=<optimized out>, NodeCPUinfo=NodeCPUinfo@entry=0x5d5b60, numtasks=numtasks@entry=1,
    taskid=taskid@entry=0) at ../../../src/merger/paraver/trace_to_prv.c:678
#3  0x00000000004046c3 in merger_post (numtasks=numtasks@entry=1, taskid=taskid@entry=0) at ../../../src/merger/common/mpi2out.c:1485
#4  0x0000000000406337 in merger_post (numtasks=numtasks@entry=1, taskid=taskid@entry=0) at ../../../src/merger/common/mpi2out.c:1366
#5  0x0000000000403a6e in main (argc=8, argv=0x7fffffff8d68) at ../../../src/merger/merger.c:69

Info

The traces are generated without any error. The exact same error is also observed using Extrae-3.8.3.

Is there any fix for this problem? Or, is it me who's using it incorrectly?

@vineetsoni
Copy link
Author

More info on the generation of traces:

No extrae.xml file was used, but using it does not change the outcome.

Following environment variables were set before launching the trace generation:

export EXTRAE_HOME=/u/vinson3z/tools/install/extrae-4.0.1_gcc10
export EXTRAE_ON=1
export EXTRAE_COUNTERS=PAPI_L2_DCA,PAPI_L2_DCM,PAPI_L3_TCA,PAPI_L3_TCM,PAPI_TOT_CYC,PAPI_TOT_INS
export EXTRAE_INITIAL_MODE=detail
export EXTRAE_MPI_COUNTERS_ON=1
export EXTRAE_FUNCTIONS_COUNTERS_ON=1

@vineetsoni
Copy link
Author

I think the problem is coming from somewhere else. If I look at my old runs of Extrae, I had as many lines in $EXTRAE_FINAL_DIR/TRACE.mpits as the no. of MPI processes.

However, from the recent runs, $EXTRAE_FINAL_DIR/TRACE.mpits has only 1 line. Although there are as many .mpit and .sym files in the $EXTRAE_DIR .

Bug

Bug in generating $EXTRAE_FINAL_DIR/TRACE.mpits from $EXTRAE_DIR/*.mpit.

Manual fix

mpi2prv works if the remaining lines in $EXTRAE_FINAL_DIR/TRACE.mpits are added manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant