-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gerrychain.random sets random seed globally #364
Comments
It turns out this is a more serious problem than it looks at first glance (in other words, I encountered a real-world use case where this matters). In particular, when searching for seed plans, the
This leads to |
Ah, so it looks like my bug with Line 5 in 3c42e99
|
Regarding the deletion of the random.py file: this is a fix for issue #364. The problem that the inclusion of the random.py file was supposed to address was the issue of reproducibility of Markov chain, and the idea was to set the global random seed to 2018 at any point where we would like to import the random module internally within the gerrychain package. However, this approach also causes the trees that are generated by the tree.py file to be fixed if the user does not set the random seed after the import of the gerrychain package. So an import pattern of import random random.seed(0) import gerrychain print(random.random()) print(random.random()) will output 0.5331579307274593 0.02768951210200299 as opposed to the expected 0.8444218515250481 0.7579544029403025 will actually force the random seed to be 2018 rather than the expected 0. This can often cause issues in jupyter notebooks where the user is not aware that the random seed has been forcibly set to 2018 after the import of gerrychain. Instead, it is best to allow to user to set the random seed themselves, and to not forcibly set the random seed within the gerrychain package since that can affect the execution of other packages and can cause the chain to hang when the 2018 seed does not produce a valid tree. This issue does not appear if we remove the random.py file and instead use the random module from the standard library within the tree.py and accept.py files. This is because of how python handles successive imports of the same module. Consider the following snipit: import random random.seed(0) import random print(random.random()) print(random.random()) This will output 0.8444218515250481 0.7579544029403025 as expected. This is because the random module is only imported once and then places its name in the internal list of imported modules. Subsequent imports of the random module within the same python session will not will simply retrieve the module from the list and will not re-execute the code contained within the module. Thus, the random seed is only set once and not reset when the random module is imported again. In terms of reproducibility, this means that the user will be required to set the random seed themselves if they want to reproduce the same chain, but this is a relatively standard expectation, and will be required when we move the package over to a rust backend in the future.
gerrychain.random
sets a default random seed (2018
) globally:GerryChain/gerrychain/random.py
Line 5 in 198f43d
This is bad—for starters, the state of other packages could be affected without users immediately noticing.
The text was updated successfully, but these errors were encountered: