UDST
diff --git a/‎.travis.yml
+30 b/‎.travis.yml
+30
diff --git a/‎Makefile
-55 b/‎Makefile
-55
diff --git a/‎README.md
+37-111 b/‎README.md
+37-111
diff --git a/‎Vagrantfile
-25 b/‎Vagrantfile
-25
diff --git a/‎all.py
+1-1 b/‎all.py
+1-1
diff --git a/‎all2.py
-27 b/‎all2.py
-27
@@ -0,0 +1,30 @@
+language: python
+sudo: false
+python:
+- '2.7'
+install:
+- if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
+    wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
+  else
+    wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
+  fi
+- bash miniconda.sh -b -p $HOME/miniconda
+- export PATH="$HOME/miniconda/bin:$PATH"
+- hash -r
+- conda config --set always_yes yes --set changeps1 no
+- conda update -q conda
+# Useful for debugging any issues with conda
+- conda info -a
+# don't think we need all these packages, but copying from urbansim
+- >
+  conda create -q -c synthicity -n test-environment
+  python=$TRAVIS_PYTHON_VERSION
+  cytoolz ipython-notebook jinja2 matplotlib numpy pandas patsy pip scipy
+  statsmodels pytables pytest pyyaml pandana
+- source activate test-environment
+- pip install pep8
+- pip install -r requirements.txt
+script:
+- pep8 baus
+- pep8 scripts
+- py.test baus
@@ -1,126 +1,52 @@
-DRAFT Bay Area Urbansim Implementation
+DRAFT Bay Area UrbanSim (BAUS) Implementation
 =======
 
-This is the DRAFT UrbanSim implementation for the Bay Area. Documenation for the Bay Area model is available at http://metropolitantransportationcommission.github.io/baus_docs/ and documentation for the generic UrbanSim model is at https://udst.github.io/urbansim/index.html
+[![Build Status](https://travis-ci.org/MetropolitanTransportationCommission/bayarea_urbansim.svg?branch=master)](https://travis-ci.org/MetropolitanTransportationCommission/bayarea_urbansim)
 
-###Install Overview
-* https://mtcdrive.account.box.com/login
-* get anaconda (version as indicated in reqs below)
-* bash Anaconda2-4.0.0-Linux-x86_64.sh
-* yes to prepend install location to .bashrc
-* open new terminal
-* sudo apt-get update
-* sudo apt-get -y install git g++ python-dev unzip
-* git clone https://github.com/MetropolitanTransportationCommission/bayarea_urbansim.git
-* pip install -r requirements.txt (comment out pandana)
-* pip install pandana
-* get data
-* change RUNNUM so in 5000s etc
-* python run.py -s 4 & OR python all.py &
+This is the DRAFT UrbanSim implementation for the Bay Area. Policy documentation for the Bay Area model is available [here](http://data.mtc.ca.gov/bayarea_urbansim/) and documentation for the UrbanSim framework is available [here](https://udst.github.io/urbansim/).
 
-###Data
+### Install Overview
 
-We track the data for this project in the Makefile in this repository. The makefile will generally be the most up to date list of which data is needed, where it goes in the directory, etc.
+* Install Python for your OS ([Anaconda](https://www.continuum.io/downloads) highly suggested)
+* Clone this repository
+* Install dependencies using `pip install -r requirements.txt`
+* Get data using `python run.py -c --mode fetch-data` (you will need an appropriately configured AWS credentials file which you must get from your MTC contact)
+* Preprocess data using `python run.py -c --mode preprocessing`
+* Run a simulation using `python run.py -c` (default mode is simulation)
 
-To fetch data with [AWS CLI](https://aws.amazon.com/cli/) and Make, you can:
-`make data`.
+### An overview of run.py
+ 
+Run.py is a command line interface (cli) used to run Bay Area UrbanSim in various modes.  These modes currently include:
 
-Below we provide a list to links of the data in the Makefile for convenience, but in general the makefile is what is being used to run simulations. If you find that something below is out of date w/r/t the makefile, please feel free to update it and submit a pull request.
+* estimation, which runs a series of models to save parameter estimates for all statistical models
+* simulation, which runs all models to create a simulated regional growth forecast
+* fetch_data, which downloads large data files from Amazon S3 as inputs for BAUS
+* preprocessing, which performas long-running data cleaning steps and writes newly cleaned data back to the binary h5 file for use in the other steps
+* baseyearsim which runs a "base year simulation" which summarizes the data before the simulation runs (during simulation, summaries are written after each year, so the first year's summaries are *after* the base year is finished - a base year simulation writes the summaries before any models have run)
 
-####Data necessary for run.py to run
+### Outputs from Simulation (written to the runs directory)
 
-These data should be in the data/ folder:
+ALL OUTPUT IN THIS DIRECTORY IS NOT OFFICIAL OUTPUT. PLEASE CONTACT MTC FOR OFFICIAL OUTPUTS OF THE LAST PLAN BAY AREA.
 
-https://s3.amazonaws.com/bayarea_urbansim/data/2015_06_01_osm_bayarea4326.h5  
-https://s3.amazonaws.com/bayarea_urbansim/data/2015_08_03_tmnet.h5  
-https://s3.amazonaws.com/bayarea_urbansim/data/2015_12_21_zoning_parcels.csv  
-https://s3.amazonaws.com/bayarea_urbansim/data/02_01_2016_parcels_geography.csv  
-https://s3.amazonaws.com/bayarea_urbansim/data/2015_08_29_costar.csv  
-https://s3.amazonaws.com/bayarea_urbansim/data/2015_09_01_bayarea_v3.h5  
-
-Because the hdf5 file used here contains one table with  proprietary data, you will need to enter credentials to download it. You can request them from Tom Buckley([email protected]). Or if you already have access to Box, you can download the hdf5 file from there. 
-
-####Data Description  
-
-
-How To 
-------
-####Set Up Simulation and Estimation  
-Install dependencies using standard [pip](https://pip.pypa.io/en/latest/user_guide.html#requirements-files) requirements install:
-`pip install -r requirements.txt`
-You may also need to install pandana
-`pip install pandana`
-
-####Set up using a Virtual Machine
-For convenience, there is a [Vagrantfile](https://www.vagrantup.com/) and a `scripts/vagrant/bootstrap.sh` file. This is the recommended way to set up and run `Simulation.py` on Windows. 
-
-####Enter Amazon Web Services credentials to fetch data.
-
-See [Installing](http://docs.aws.amazon.com/cli/latest/userguide/installing.html) and [configuring] (http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) 
-
-Each of the following just runs a different set of models for a different set of years.
-
-####Run a Simulation  
-In the repository directory type `python run.py`  
-
-####Estimate Regressions used in the Simulation
-In the repository directory edit `run.py` and set `MODE` to "estimation" and type `python run.py`  
-
-####Run a Base Year Simulation
-In the repository directory edit `run.py` and set `MODE` to "baseyearsim" and type `python run.py`.  A base year simulation is used to run a few models and make sure everything matches the first year of the control totals but not to add any new buildings.  This is then used in comparison of the year 2040 to the base year for all future simulations (until the control totals change) and this mode is rerun.
-
-####Review Outputs from Simulation
-
-#####Runs Directory
-
-ALL OUTPUT IN THIS DIRECTORY IS CONSIDERED DRAFT. PLEASE CONTACT MTC FOR OFFICIAL FINAL OUTPUTS.
-
-`#` = a number that is updated in the RUNNUM file in the bayarea_urbansim directory each time you run Simulation.py.
+`[num]` = a positive integer used to identify each successive run.  This number usually starts at 1 and increments each time run.py is called.
 
 Many files are output to the `runs/` directory. They are described below.
 
 filename |description
 ----------------------------|-----------
-run#_topsheet_2040 | An overall summary of various housing, employment, etc by regional planning area types
-run#_parcel_output.csv 		|csv of parcels that are built for review in Explorer
-run#_parcel_data_diff.csv 			|A CSV with parcel level output for *all* parcels with lat, lng and includes change in total_residential_units and change in total_job_spaces, as well as zoned capacity measures
-run#_simulation_output.json |summary by TAZ for review in Explorer (unix only)
-run#_taz_summaries 			|A CSV for [input to the MTC travel model](http://analytics.mtc.ca.gov/foswiki/UrbanSimTwo/OutputToTravelModel)
-run#_urban_footprint_summary | A CSV with A Summary of how close the scenario is to meeting [Performance Target 4](http://planbayarea.org/the-plan/plan-details/goals-and-targets.html)
-
-
-Browse results [here](http://urbanforecast.com/runs/)   
-
-######Other Directories
-Below is an explanation of the directories in this repository not described above.
-
-configs/    
-
-The YAML files in this directory allow you to configure UrbanSim by changing the keys and values of arguments taken by urbansim functions. See the [UrbanSim Defaults](https://udst.github.io/urbansim_defaults/) docs for more details.
-
-Note that even the values taken by data can be and are configured with these config files (e.g. values in `settings.yaml`).
-
-data_regeneration/
-
-The scripts in here can be used to re-create the data in the `data/` folder from source (various local, state, and federal sources). Use these to re-create the data here when source data change fundamentally.
-
-scripts/
-This is a good place to put scripts that can exist independently of the analysis environment here.  
-
-####Parcel Geometries
-
-The parcel geometries are the basis of many operations in the simulation. For example, as one can see in [this pull request](https://github.com/MetropolitanTransportationCommission/bayarea_urbansim/pull/121), in order to add schedule real estate development projects to the list of projects that are included in the simulation, one must use an existing `geom_id`, which is a field on the parcels table added [here](https://github.com/MetropolitanTransportationCommission/bayarea_urbansim/blob/master/data_regeneration/match_aggregate.py#L775-L784).
-
-Parcel geometries are available at the following link:
-
-https://s3.amazonaws.com/bayarea_urbansim/data/09_01_2015_parcel_shareable.zip
-
-#####Geom ID
-
-Please be aware that many ArcGIS users have found that ArcGIS automatically converts and then rounds the `geom_id` column, effectively making it unusable. Therefore we recommend using QGIS, which does not exhibit this behavior with delimited files by default. 
-
-Also, in Microsoft Excel, you will need to make sure that the data type of the `geom_id` column is set to `number` and that the number of decimal points is set to 0. Otherwise when you save the CSV again the `geom_id`s will be unusable.
-
-What is the `geom_id` field and why does it exist? 
-
-In short, this is a legacy identifier. The `geom_id` field was introduced as a stable identifier for parcels across shapefiles, database tables, CSV's, and other data types. It is an integer because at some point there was a need to support integer only identifiers. It is not based on an Assessor's Parcel Numbers because there was a perception that those were inadequate. And it is based on the geometry of the parcel because many users have found that geometries are the most important feature of parcels.
+run[num]\_topsheet\_[year].csv | An overall summary of various housing and employment outcomes summarized by very coarse geographies.
+run[num]_parcel_output.csv 		| A csv of all new built space in the region.  This has a few thousand rows and dozens of columns which contain various inputs and outputs, as well as debugging information which helps explain why each development was picked by UrbanSim.
+run[num]\_parcel_data\_[year].csv 			|A CSV with parcel level output for *all* parcels with lat, lng and includes change in total_residential_units and change in total_job_spaces, as well as zoned capacity measures.
+run[num]\_building_data\_[year].csv 			|The same as above but for buildings.
+run[num]\_taz\_summarie\s_[year].csv 			|A CSV for [input to the MTC travel model](http://analytics.mtc.ca.gov/foswiki/UrbanSimTwo/OutputToTravelModel)
+run[num]\_pda_summaries\_[year].csv, run[num]\_juris_summaries\_[year].csv, run[num]\_superdistrict_summaries\_[year].csv | Similar outputs to the taz summaries but for each of these geographies.  Used for understanding the UrbanSim forecast at an aggregate level.
+run[runnum]_dropped_buildings.csv     | A summary of buildings which were redeveloped during the simulated forecast.
+run[runnum]_simulation_output.json | Used by the web output viewer.
+
+
+### Directory structure
+
+* baus/ contains all the Python code which runs the BAUS model.
+* data/ contains BAUS inputs which are small enough to store and render in GitHub (large files are stored on Amazon S3) - this also contains lots of scenario inputs in the form of csv files.  See the readme in the data directory for detailed docs on each file.
+* configs/ contains the model configuration files used by UrbanSim.  This also contains settings.yaml which provides simulation inputs and settings in a non-tabular form. 
+* scripts/ these are one-off scripts which are used to perform various input munging and output analysis tasks.  See the docs in that directory for more information.
@@ -4,7 +4,7 @@
 
 # run a full package of scenarios
 
-for num in [0, 1, 2, 3, 4]:
+for num in [0, 1, 3, 4, 5]:
     os.system('python run.py -s %d' % num)
 
 with open('RUNNUM', 'r') as f: