This is a project to be done for PDAS CA1 which test the basic competency in writing Python program and Python packages such as Python Numpy and Matplotlib for Data Analysis and Visualization
The objective of the project is:
Are the more popular courses in ITE, Poly, UNI correlated with better employment opportunities?
Notes: Courses are grouped into course clusters based on the Graduate Employment Surver provided by MOE. These courses are:
Fresh Graduates
- Arts, Design & Media
- Built Environment
- Business
- Dentistry
- Education (NIE)
- Engineering
- Health Sciences
- Humanities & Social Sciences
- Information & Digital Technologies
- Music
- Sciences
- Yale-NUS
Follow up Graduates
- Architecture
- Biomedical Sciences and Chinese Medicine
- Law
- Medicine
- Pharmacy
The first two datasets are from MOE while the last three datasets are from Data.gov.sg
File structure:
- datasets_cleaned => contains the cleaned csv files
- datasets_src => contains the csv files for all the original uncleaned datasets
- datasets.zip => contains the backup zip file for the datasets
- clean.ipynb => to clean the data from datasets_stc
- main.ipynb => where all the code will be
- README.md => contains all the source for the datasets
- Start by running:
pip3 install -r requirements.txt
- Delete the entire contents of the directory 'datasets_cleaned'. To clean the data, run the cells inside 'clean.ipynb' (This will recreate the contents of 'datasets_cleaned')
- Run the cells inside 'main.ipynb' to see the analysis and visualization performed
- For a summary of the graphs head to the powerpoint slides
Footnotes
-
The data for 2019 from the 2017-2019 employment is slightly different from the 2019-2021 employment data set. This is likely due to the statistical noise generated to provide privacy to graduates. ↩