This project implements sentiment analysis on airline tweets using Natural Language Processing (NLP) and Random Forest Classifier.
The model analyzes tweets to classify them into three sentiment categories:
- Positive
- Neutral
- Negative
- Python 3.13
- NLTK
- scikit-learn
- pandas
- numpy
- matplotlib
- Regular Expressions (re)
- Accuracy: 75.92%
- Algorithm: Random Forest Classifier
- Features: 2500 TF-IDF features
- pip install nltk
- pip install scikit-learn
- pip install pandas
- pip install numpy
-
Text preprocessing:
- Special character removal
- Single character removal
- Multiple space removal
- Lowercase conversion
-
Feature extraction:
- TF-IDF Vectorization
- Stop words removal
- Feature selection (max_features=2500)
-
Model Training:
- 80-20 train-test split
- Random Forest with 200 estimators
- Load your dataset in CSV format
- Run the preprocessing pipeline
- Train the model
- Make predictions on new data
-
TF-IDF Parameters:
- max_features: 2500
- min_df: 7
- max_df: 0.8
-
Random Forest Parameters:
- n_estimators: 200
- random_state: 0