This project builds a full emotion-classification workflow from data preprocessing to model training, explainability outputs, and a FastAPI-powered web interface.
- End-to-end ML pipeline across 4 scripts
- Feature engineering with TF-IDF + structural text features
- Multi-model training and weighted ensemble evaluation
- Explainability artifacts with SHAP and LIME
- FastAPI backend + web frontend for live text analysis
- CLI validator for backend prediction quality checks
1_data_preprocessing.py: dataset loading, cleaning, structural feature generation2_feature_engineering.py: TF-IDF, scaling, label encoding, feature saving3_model_training_evaluation.py: model training, metrics, plots, SHAP/LIME artifacts4_finalize_assets.py: final predictions, confidence/intensity summariesapp.py: FastAPI server and inference endpointbackend_cli_validator.py: CLI test runner for backend prediction validationresults/: generated features, plots, trained models, and final outputs
- Python 3.10+ (recommended: 3.11)
- pip
- Git (optional for clone workflow)
- Clone and move into the project folder.
git clone https://github.com/devanshkadu2005/Sentiment.git
cd Sentiment- Create and activate a virtual environment.
python -m venv .venv
.\.venv\Scripts\Activate.ps1- Install Python dependencies.
pip install -r requirements.txtgit clone https://github.com/devanshkadu2005/Sentiment.git
cd Sentiment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun these scripts in order:
python 1_data_preprocessing.py
python 2_feature_engineering.py
python 3_model_training_evaluation.py
python 4_finalize_assets.pyAfter this, all generated assets will be available under results/.
You can start the server in either of these ways:
python app.pyor
uvicorn app:app --host 127.0.0.1 --port 8501Then open:
- Method:
POST - URL:
http://127.0.0.1:8501/analyze - JSON body:
{
"text": "I feel very excited and happy today!"
}Invoke-RestMethod -Method Post `
-Uri "http://127.0.0.1:8501/analyze" `
-ContentType "application/json" `
-Body '{"text":"I feel very excited and happy today!"}'Default test suite:
python backend_cli_validator.pyWith custom minimum confidence:
python backend_cli_validator.py --min-confidence 0.6With additional custom test cases JSON file:
python backend_cli_validator.py --cases-file custom_cases.jsonresults/preprocessing/: processed train/val/test CSV and preprocessing plotsresults/features/: feature matrices, encoders, vectorizer, feature metadataresults/training/: trained models, confusion matrices, ROC/F1 plots, SHAP/LIME outputsresults/final/: final prediction CSVs and summary visualizations
- NLTK resources are downloaded automatically on first run.
- If
app.pyreports missing model files, run steps 1-3 of the pipeline first. - Some steps (especially training + SHAP) may take significant time depending on hardware.
- This repository already includes generated artifacts in
results/, so you can runapp.pydirectly if files are present.
ModuleNotFoundError: runpip install -r requirements.txtagain in the active virtual environment.- FastAPI app starts but
/analyzefails with missing files: re-run pipeline scripts in order. - If model training is too slow, start by running only preprocessing and feature engineering to verify setup first.