Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add num_workers to GPT dataloader #48

Merged
merged 5 commits into from
Feb 9, 2023
Merged

Add num_workers to GPT dataloader #48

merged 5 commits into from
Feb 9, 2023

Conversation

szhengac
Copy link
Contributor

@szhengac szhengac commented Feb 9, 2023

The current GPT dataloader does not use prefetching. This PR fixes that by adding num_workers=2 and removes cuda in collate_fn, which will reinitialize cuda context in subprocesses.

  • PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just nits.

@comaniac comaniac merged commit d916110 into awslabs:main Feb 9, 2023
@comaniac
Copy link
Contributor

comaniac commented Feb 9, 2023

Thanks @szhengac

@szhengac szhengac deleted the data branch February 9, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants