Rjf/determinism #203

robfitzgerald · 2023-05-24T21:27:15Z

some efforts here to address non-determinism in hive. still seeing it after these changes. for now,

removed any iterations on sim.(vehicles|bases|stations|requests) directly, replaced with calls to
- sim.get_x_ids() in place of sim.x.keys()
- sim.get_xs() in place of sim.x.values()
set python and numpy random seed at "run" entry points (app run, cosim run) fed from a config.sim.seed value

any thoughts on what else i could do here?

edit: i just went through and added a ton more to this. details:

rewrote sim entity getters to use shared code
any sorts by keys that are not unique have entity id added (see doc note)
identified + updated many additional iterators that could be sorted
added developer documentation entry explaining deterministic sorts

i was able to observe 50 denver_demo runs in a row that had the same result. i'm also seeing those runs take 15 seconds which i believe is a bit longer than before. we may see a bigger performance hit with larger runs.

note: when i run hive at the command line twice in a row, i'm still seeing different results still. maybe we're not controlling for randomness as well when calling it there for some reason. to provide an example, i've added examples/test_for_determinism.py which runs a quick set of 5 iterations of denver and confirms whether all results match or not.

robfitzgerald · 2023-06-05T21:33:52Z

@nreinicke i think i've got it! can you run python examples/test_for_determinism.py in your hive conda env and confirm that everything comes out "good" after 5 iterations of denver demo? just checking that high-level stats like soc and vkt are exactly the same, but i figure that should be good enough to confirm we're getting deterministic runs.

nreinicke · 2023-06-05T21:52:23Z

Just pulled and ran and it looks like it's checking out!

finished iteration 4
mean_final_soc is good, all values match
requests_served_percent is good, all values match
total_vkt is good, all values match
total_kwh_expended is good, all values match
total_gge_expended is good, all values match
total_kwh_dispensed is good, all values match
total_gge_dispensed is good, all values match

What was the main cause?

robfitzgerald · 2023-06-05T21:59:11Z

maybe a terrible idea, but, i set up the github actions to call the determinism test at PR. it will run denver demo twice and confirm high-level metrics match. that'll slow us down a little but maybe worth it to make sure we don't backslide on determinism.

robfitzgerald · 2023-06-05T22:07:09Z

What was the main cause?

i went beyond SimulationState and did a search for .items(), .values(), .keys() and sorted(... across the codebase. there were a number of places where we were sorting by something without a fallback. i think a big one was sorting ChargeQueueing agents by queue time. copying the example i added here:

vs: List[Vehicle] = ... #
sorted(vs, key=lambda v: v.distance_traveled_km)          # bad
sorted(vs, key=lambda v: (v.distance_traveled_km, v.id))  # good

that's a problem if two vehicles had traveled the same distance, there's nothing to determine what order those two will fall into. gets worse when the sort is by "queue time", those are going to be integers, and so anytime two agents have the same queue time, there's no tiebreaker to use as a second-tier sort. i went through and added id sorts in those cases.

a similar problem would happen with calls to get_{vehicles|stations|bases|requests} methods that supplied sort_keys. and then there were just a few other iterations that didn't sort by id in absence of any requested sort values.

tl;dr: sortsortsortsortsortsortsortsortsortsortsortsortsortsortsort

nreinicke · 2023-06-05T22:30:58Z

Aha that makes a lot of sense, nice work getting deep into this one to figure it out!!

With respect to the determinism test in the action I think that's a good idea.

robfitzgerald · 2023-06-06T16:12:38Z

@nreinicke the determinism check passed, yay! it replaces the 'hive denver_demo.yaml' action. ready for your review.

nreinicke

Looks great!

robfitzgerald added 5 commits May 24, 2023 15:23

set random seeds at entry point

d209c5c

config.sim.seed value

b36a76d

remove random.seed call

3df3f74

provide sorted getter methods for entity ids

607bf52

use entity get methods

375644b

robfitzgerald requested a review from nreinicke May 24, 2023 21:27

robfitzgerald added 11 commits May 24, 2023 15:28

only set seed when requested

d7cea8c

black formatter

3fa1a21

add methods for orderly iterating on Maps

f03dab7

fixes to Map iteration

4803545

document Map iteration for devs

0ba8076

explain use of tuple sorts for specialized iterators

665a7da

deterministic iterators

bee2bd5

remove test stub

764b263

black formatter

a18bf3f

shows multiple runs can have same result

d0e9794

comments

726a1dd

wire determinism test into github action

495fcc3

install pandas github action dependency

6e69c0d

nreinicke approved these changes Jun 6, 2023

View reviewed changes

robfitzgerald merged commit c0bcf01 into main Jun 6, 2023

robfitzgerald deleted the rjf/determinism branch June 6, 2023 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rjf/determinism #203

Rjf/determinism #203

robfitzgerald commented May 24, 2023 •

edited

Loading

robfitzgerald commented Jun 5, 2023

nreinicke commented Jun 5, 2023

robfitzgerald commented Jun 5, 2023

robfitzgerald commented Jun 5, 2023

nreinicke commented Jun 5, 2023

robfitzgerald commented Jun 6, 2023

nreinicke left a comment

Rjf/determinism #203

Rjf/determinism #203

Conversation

robfitzgerald commented May 24, 2023 • edited Loading

robfitzgerald commented Jun 5, 2023

nreinicke commented Jun 5, 2023

robfitzgerald commented Jun 5, 2023

robfitzgerald commented Jun 5, 2023

nreinicke commented Jun 5, 2023

robfitzgerald commented Jun 6, 2023

nreinicke left a comment

Choose a reason for hiding this comment

robfitzgerald commented May 24, 2023 •

edited

Loading