Lab 2: Working with a Data Set

Overview

This project involves working with a real-life data set and exploring its preparation for machine learning. The dataset used is the Howell dataset, which includes height, weight, age, and gender information. The goal is to perform an initial exploration, prepare the data for modeling, and perform various analyses.

Project Structure

The project consists of the following files:

lab2_starter.ipynb: The Jupyter Notebook containing the code and analyses.
Howell.csv: The dataset used in this project.
README.md: This README file.
dgraves-ml-lab2-checkpoints.docx: The document containing screenshots and checkpoint analyses.

Getting Started

To get started with this project, you need to clone the repository to your local machine:

git clone https://github.com/yourusername/lab2-working-with-dataset.git
cd lab2-working-with-dataset

Requirements

pandas
matplotlib
scikit-learn
numpy

Virtual Environment

Create virtual environment and activate

#Create environment
python -m venv .venv

#Activate environment
source .venv/scripts/activate

Install required packages

pip install pandas matplotlib scikit-learn numpy

Usage

Navigate to the project directory
Open Jupyter lab with command:

jupyter lab

Follow instructions in starter notebook alongside course reference pdf to run code cells and perform analyses.

Results

Data Overview: Displaying basic information about the dataset, including the number of instances, features, and missing values.
Data Distributions: Visualizing the distributions of height, weight, and age.
Correlation Analysis: Identifying the highest correlation between features.
Age vs. Weight Analysis: Exploring the relationship between age and weight.
Age Histogram: Comparing the age distribution in the dataset with modern populations.
BMI Calculation: Adding a new feature for BMI and categorizing it.
Stratified Data Split: Splitting the data into training and test sets while maintaining the ratio of males to females.
Male-to-Female Ratios: Computing the male-to-female ratios for the entire dataset, training set, and test set.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
Howell.csv		Howell.csv
README.md		README.md
dgraves-ml-lab2-checkpoints.docx		dgraves-ml-lab2-checkpoints.docx
dgraves_lab2_starter.ipynb		dgraves_lab2_starter.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab 2: Working with a Data Set

Overview

Table of Contents

Project Structure

Getting Started

Requirements

Virtual Environment

Usage

Results

About

Releases

Packages

Languages

dgraves4/ml-jupyterlab2-dataset

Folders and files

Latest commit

History

Repository files navigation

Lab 2: Working with a Data Set

Overview

Table of Contents

Project Structure

Getting Started

Requirements

Virtual Environment

Usage

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages