-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added non-blocking root communicator #1478
base: develop
Are you sure you want to change the base?
added non-blocking root communicator #1478
Conversation
Unit testing and documentation will be added to this PR in follow-up commits. |
7921ec5
to
926fd00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…or to NonCollectiveRootCommunicator
… and added unit testing
1b09d37
to
0c0da25
Compare
src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunicator.hpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Chris White <[email protected]>
Co-authored-by: Chris White <[email protected]>
…ator.hpp Co-authored-by: Chris White <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The axom team needs to discuss the implications of this further before this is merged.
@white238 @bmhan12 I talked with @gunney1 and we decided the best path forward is to duplicate the MPI communicator passed in the initialize() call, and have this duplicate owned by the Lumberjack communicator object. With this change, we can avoid having to create MPI tags for each non-collective communicator object, and instead have each communicator object have its own MPI communicator using the same default MPI tag. Please let me know if you have any further concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gberg617
Sounds like we'll need to discuss some details before merging this. I've added a few minor comments in the meantime.
Update on the failing tests: The multiple_communicators test I added sporadically fails on Azure. This test is important to capture certain behavior when multiple communicators are sending/receiving messages. Looking into this, it sometimes fails when run with other unit tests (and only on Azure), but consistently passes when it is run by itself. I have heard that gtest+MPI is somewhat fragile in some cases, so I attempted to remove gtest from the tests I added by converting the ASSERT macros to my own version, and having each test become a function that is called within |
For reference, this is the error that is seen on Azure Pipelines and when ran locally in Docker:
|
src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunicator.hpp
Outdated
Show resolved
Hide resolved
…ator.hpp attempting to fix sporadic failures on Azure. Co-authored-by: Brian Han <[email protected]>
All the major issues have been resolved. Let me know if there are any other thoughts on this MR. |
@@ -33,7 +34,8 @@ const char* mpiBlockingReceiveMessages(MPI_Comm comm); | |||
/*! | |||
***************************************************************************** | |||
* \brief Receives any Message sent to this rank, if there are any messages | |||
* that are sent. Returns null if no messages are sent. | |||
* that have arrived. Returns null if no messages are sent. Caller is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where "are sent" should be replaced with "have arrived".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please first address the change in MPIUtility.hpp.
Summary
This PR is a feature which adds a communicator for sending messages from any rank to the root rank non-collectively. This can be useful in cases where an arbitrary rank throws an error that needs to be sent to the root rank to output to a file.