Back
Year
2024
Tech & Technique
Python, pandas, scikit-learn, Random Forest, Logistic Regression, KNN, matplotlib, seaborn
Description
A machine learning classification project predicting whether a property will be purchased, based on housing
attributes including size, location, pricing, energy rating, and renovation needs.
Starting from 13,320 rows of raw real estate data, built a full preprocessing pipeline, engineered 3 new features, and compared 4 ML models — with a tuned Random Forest achieving 75.63% accuracy and 0.93 AUC.
What makes it stand out:
Starting from 13,320 rows of raw real estate data, built a full preprocessing pipeline, engineered 3 new features, and compared 4 ML models — with a tuned Random Forest achieving 75.63% accuracy and 0.93 AUC.
What makes it stand out:
- Professional-grade analytical rigour — IQR-based outlier handling, deliberate feature engineering
- Top predictors revealed buyers are driven by layout efficiency, sustainability credentials, and property readiness — beyond just price
- Findings have real commercial relevance for real estate platforms and agents
My Role
Sole Developer & Analyst
- Built full preprocessing pipeline: missing value imputation, IQR outlier handling, encoding
- Engineered 3 new features that improved model interpretability
- Trained and compared Logistic Regression, Random Forest, KNN, and Decision Tree models
- Performed hyperparameter tuning; achieved 75.63% accuracy and 0.93 AUC with Random Forest
scroll for screenshots


