Tabular Classification
Scikit-learn
Joblib
English
random-forest
loan-default
credit-risk
finance
binary-classification
Instructions to use amanbokaro/loan-default-prediction-pipeline with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use amanbokaro/loan-default-prediction-pipeline with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("amanbokaro/loan-default-prediction-pipeline", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Loan Default Prediction Pipeline
A scikit-learn machine learning pipeline that predicts whether a loan applicant is at risk of defaulting, based on their financial profile and personal details.
Model: Random Forest Classifier
Task: Binary Classification (default risk: 0 = Low Risk, 1 = High Risk)
Validation Accuracy: ~89.7%
Model Description
This pipeline was trained on 252,000 loan applicant records. It combines a full data preprocessing pipeline with a tuned Random Forest Classifier. The preprocessing handles:
- Missing value imputation
- Snake-case column name formatting
- Boolean encoding (married/single status, car ownership, house ownership)
- Job stability ordinal encoding
- City tier ordinal encoding (Tier 1 / Tier 2 / Tier 3)
- State-level default rate target encoding
- Standard scaling of numerical features
- One-hot encoding of nominal categorical features
- Feature selection
How to Use
Install dependencies
pip install scikit-learn joblib pandas huggingface_hub
Load and use the pipeline
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
# Download the pipeline from Hugging Face Hub
model_path = hf_hub_download(
repo_id="amanbokaro/loan-default-prediction-pipeline",
filename="loan_default_rf_pipeline.joblib"
)
# Load the pipeline
pipeline = joblib.load(model_path)
# Example input (single applicant)
applicant = pd.DataFrame([{
"Income": 500000,
"Age": 35,
"Experience": 8,
"Married/Single": "married",
"House_Ownership": "rented",
"Car_Ownership": "yes",
"Profession": "Software_Developer",
"CITY": "Mumbai",
"STATE": "Maharashtra",
"CURRENT_JOB_YRS": 4,
"CURRENT_HOUSE_YRS": 3
}])
# Predict
prediction = pipeline.predict(applicant)
probability = pipeline.predict_proba(applicant)
print("Prediction:", "High Risk" if prediction[0] == 1 else "Low Risk")
print(f"Default probability: {probability[0][1]:.2%}")
Input Features
| Feature | Type | Description |
|---|---|---|
Income |
Numerical | Annual income of the applicant (INR) |
Age |
Numerical | Age of the applicant (years) |
Experience |
Numerical | Years of work experience |
Married/Single |
Categorical | Marital status (married / single) |
House_Ownership |
Categorical | Housing status (rented / owned / norent_noown) |
Car_Ownership |
Categorical | Car ownership (yes / no) |
Profession |
Categorical | Applicant's profession/job title |
CITY |
Categorical | City of residence |
STATE |
Categorical | State of residence |
CURRENT_JOB_YRS |
Numerical | Years at current job |
CURRENT_HOUSE_YRS |
Numerical | Years at current residence |
Output
| Value | Meaning |
|---|---|
0 |
Low Risk |
1 |
High Risk |
Training Details
| Detail | Value |
|---|---|
| Dataset size | 252,000 records |
| Train / Val / Test | 80% / 10% / 10% |
| Algorithm | Random Forest Classifier |
| Validation Accuracy | ~89.7% |
| Random Seed | 42 |
| Framework | scikit-learn 1.8.0 |
| Python | 3.11 |
Files
| File | Description |
|---|---|
loan_default_rf_pipeline.joblib |
Serialized sklearn pipeline (405 MB) |
License
MIT License — see LICENSE for details.
- Downloads last month
- -