Skip to content

devanshkadu2005/Sentiment

Repository files navigation

Sentiment and Emotion Analysis Pipeline

This project builds a full emotion-classification workflow from data preprocessing to model training, explainability outputs, and a FastAPI-powered web interface.

What this project includes

  • End-to-end ML pipeline across 4 scripts
  • Feature engineering with TF-IDF + structural text features
  • Multi-model training and weighted ensemble evaluation
  • Explainability artifacts with SHAP and LIME
  • FastAPI backend + web frontend for live text analysis
  • CLI validator for backend prediction quality checks

Project structure

  • 1_data_preprocessing.py: dataset loading, cleaning, structural feature generation
  • 2_feature_engineering.py: TF-IDF, scaling, label encoding, feature saving
  • 3_model_training_evaluation.py: model training, metrics, plots, SHAP/LIME artifacts
  • 4_finalize_assets.py: final predictions, confidence/intensity summaries
  • app.py: FastAPI server and inference endpoint
  • backend_cli_validator.py: CLI test runner for backend prediction validation
  • results/: generated features, plots, trained models, and final outputs

Prerequisites

  • Python 3.10+ (recommended: 3.11)
  • pip
  • Git (optional for clone workflow)

Setup (Windows PowerShell)

  1. Clone and move into the project folder.
git clone https://github.com/devanshkadu2005/Sentiment.git
cd Sentiment
  1. Create and activate a virtual environment.
python -m venv .venv
.\.venv\Scripts\Activate.ps1
  1. Install Python dependencies.
pip install -r requirements.txt

Setup (macOS/Linux)

git clone https://github.com/devanshkadu2005/Sentiment.git
cd Sentiment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the full pipeline (step by step)

Run these scripts in order:

python 1_data_preprocessing.py
python 2_feature_engineering.py
python 3_model_training_evaluation.py
python 4_finalize_assets.py

After this, all generated assets will be available under results/.

Run the API + frontend

You can start the server in either of these ways:

python app.py

or

uvicorn app:app --host 127.0.0.1 --port 8501

Then open:

API usage

Analyze endpoint

  • Method: POST
  • URL: http://127.0.0.1:8501/analyze
  • JSON body:
{
  "text": "I feel very excited and happy today!"
}

Example (PowerShell)

Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:8501/analyze" `
  -ContentType "application/json" `
  -Body '{"text":"I feel very excited and happy today!"}'

Run backend CLI validator

Default test suite:

python backend_cli_validator.py

With custom minimum confidence:

python backend_cli_validator.py --min-confidence 0.6

With additional custom test cases JSON file:

python backend_cli_validator.py --cases-file custom_cases.json

Main outputs

  • results/preprocessing/: processed train/val/test CSV and preprocessing plots
  • results/features/: feature matrices, encoders, vectorizer, feature metadata
  • results/training/: trained models, confusion matrices, ROC/F1 plots, SHAP/LIME outputs
  • results/final/: final prediction CSVs and summary visualizations

Notes

  • NLTK resources are downloaded automatically on first run.
  • If app.py reports missing model files, run steps 1-3 of the pipeline first.
  • Some steps (especially training + SHAP) may take significant time depending on hardware.
  • This repository already includes generated artifacts in results/, so you can run app.py directly if files are present.

Troubleshooting

  • ModuleNotFoundError: run pip install -r requirements.txt again in the active virtual environment.
  • FastAPI app starts but /analyze fails with missing files: re-run pipeline scripts in order.
  • If model training is too slow, start by running only preprocessing and feature engineering to verify setup first.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors