Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code debt #68

Merged
merged 35 commits into from
Feb 9, 2017
Merged

Code debt #68

merged 35 commits into from
Feb 9, 2017

Conversation

fscottfoti
Copy link
Collaborator

numerous aspects of code debt payment including

  • settings up unit tests and pep8 for baus directory (still need more unit tests, of course)
  • moving a lot of the ad hoc code in datasources.py into preprocessing.py, which means some of these long-running tasks can be done up front
  • turn run.py into a proper command line utility which uses a python module to parse and document command line options
  • clean up a bunch of unused files
  • clean up travel model output variables
  • get rid of homesales dataframe
  • double check orca caching
  • remove building type ids for string identifiers
  • remove any case of hardcoded scenario configuration and put it in settings.yaml
  • add lots of comments, especially to settings.yaml!

the computation is pretty repetitive (it happens on every run), so cache the tables back to the current hdf store so you don't have to do all the data tweaking every time
still a work in progress
using buildings table for both estimation and simulation
many major preprocessing steps were performed when tables were accessed. this is a major benefit of orca - you can write transformations to data tables in code and perform them every time the table is accessed, and share the code via github.

there are limits to this, which mainly have to do with processing time.  for longer running processes, it's not efficient to run every time a table is accessed in order to do interactive exploratory work. there's no hard and fast rule for what crosses, the line, but for deed restricted units and adjustments of vacancy rates, we had probably crossed it.

there is now a "preprocessing" MODE in run.py which runs the steps which currently fall under this rubric, as well as a preprocessing.py file where the code lives.  when you get the .h5 file from s3, you will need to run this mode before running anything else.

as part of this change, run.py now has a proper usage statement using argparse to get arguments.  the "--mode preprocessing" argument is used to run preprocessing.
also modified parcel_is_allowed_func and a few other things
@fscottfoti fscottfoti merged commit 6b43485 into master Feb 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant