Predicting Hospital Stay Length

Project Overview

This project demonstrates how explainable machine learning can be applied to healthcare operations by predicting the length of hospital stays (LOS) using real-world inpatient data. Accurate LOS prediction is critical for bed management, staffing, cost control, and patient flow optimisation.

Traditional models struggle to capture complex, non-linear drivers of LOS, and many AI solutions lack transparency. This system balances performance with interpretability to support clinical trust and adoption.

Project Goal

To develop an accurate and explainable ML-based system that predicts hospital stay length while remaining interpretable and usable by healthcare professionals.

Project Results & Visual Output

The following visual outputs demonstrate the ML model’s predictive performance, feature importance for interpretability, and the interactive Streamlit application dashboard for real-time hospital stay predictions.

Hospital Length of Stay Prediction Dashboard.

Key Features & Achievements

Built a full ML pipeline using the SPARCS Inpatient Discharge Dataset (2021)
Applied robust preprocessing: encoding categorical & scaling numerical features
Trained multiple models and selected Random Forest Regressor
Achieved average prediction error of MAE = 0.90 days
Extracted feature importance for transparency and clinical insight
Deployed the model as an interactive Streamlit web application

Technologies Used

Machine Learning

Random Forest Regressor
Decision Tree
AdaBoost

Explainability

Feature Importance
XAI-Ready Architecture

Data Processing

Pandas, NumPy
OneHotEncoder
StandardScaler

Evaluation & Deployment

MAE, MSE, R² Score
Streamlit Web App
SPARCS Dataset (NY State)

Use Cases

Hospital bed and capacity planning
Patient flow and discharge optimisation
Healthcare cost and resource management
Clinical decision support systems
Health analytics and operations dashboards

What This Project Demonstrates

This project demonstrates the ability to build a production-ready, explainable healthcare ML system that bridges advanced machine learning with real-world clinical usability and trust.