Model Card for EchoHire Question Generator (DistilGPT-2 Fine-tuned)
This model is a fine-tuned DistilGPT-2 designed to generate interview questions based on a job title and associated skills. It was trained on a curated dataset of ~100 jobs, each with 17โ19 questions, to produce relevant, structured interview questions for recruiters or learning platforms.
Model Details
Model Description
This model generates questions automatically given a prompt with the job title and skills. The outputs are intended to help recruiters, HR teams, or training platforms quickly generate relevant interview questions without manually writing each one.
- Developed by: Syed Zeeshan Shah
- Model type: Causal Language Model (GPT-2)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: distilgpt2
- Shared by: Zeeshan506
Model Sources
- Repository: Hugging Face Model Hub
- Demo: N/A
Uses
Direct Use
- Generate interview questions automatically by providing a job title and skills.
- Can be used for recruitment platforms, HR automation, or interview prep content.
Downstream Use
- Could be fine-tuned further on domain-specific roles (e.g., data science, embedded systems).
- Can be integrated into apps, bots, or SaaS platforms for HR automation.
Out-of-Scope Use
- Not intended for generating biased or discriminatory questions.
- Should not be used as a sole source of assessment or evaluation for candidates.
- Outputs may not always reflect up-to-date technology trends or best practices.
Bias, Risks, and Limitations
- Trained on a limited dataset of ~100 jobs, so may not generalize to rare or niche roles.
- Model can produce long or merged questions; post-processing may be needed.
- Users should verify all generated questions for correctness and appropriateness.
Recommendations
- Review outputs before using in real interviews.
- Consider further fine-tuning for specific industries or technical domains.
- Post-process outputs for clean question formatting.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Zeeshan506/echohire-qgen-distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
prompt = "Job Title: Backend Developer\nSkills: Python, FastAPI, PostgreSQL\nQuestions:"
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = (input_ids != tokenizer.pad_token_id).long()
output_ids = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_new_tokens=300,
num_beams=3,
no_repeat_ngram_size=2
)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
Training Details
Training Data
- Dataset of ~100 jobs with 17โ19 interview questions each.
- Input: Job title + skills
- Output: Structured questions
Training Procedure
- Fine-tuned using Hugging Face Trainer API
- Epochs: 2
- Batch size: 2 per GPU
- Learning rate: 5e-5
- Tokenizer: distilgpt2 tokenizer, max input length 512
- Output: Structured questions
Metrics
Training Loss: Cross-entropy loss tracked during fine-tuning.
Human Review: Qualitative evaluation for relevance, coherence, and completeness of generated questions.
Automated Evaluation (BERTScore):
We performed BERTScore evaluation on a subset of the dataset (first 10 examples) to measure semantic similarity between the generated questions and reference completions using contextual embeddings.BERTScore Results (first 10 examples):
- Precision: 0.8691
- Recall: 0.8870
- F1 Score: 0.8780
These results indicate that the model generates questions that are highly aligned with reference completions, confirming the quality and effectiveness of the fine-tuned model.
Results
- Model successfully generates relevant interview questions for multiple technical domains.
- Some outputs may merge multiple questions; minor formatting post-processing recommended.
Environmental Impact
- Hardware Type: NVIDIA Tesla T4 (Google Colab)
- Hours used: ~3โ4 for training
- Compute Region: Colab USA/Europe
- Carbon Emitted: Minimal (single Colab GPU session)
Technical Specifications
- Architecture: DistilGPT-2, causal language model
- Objective: Generate text conditioned on job title + skills
- Software: Transformers library, PyTorch, Hugging Face Hub
Citation
If you use this model, please cite as:
APA:
Shah, S. Z. (2025). EchoHire Question Generator (DistilGPT-2 Fine-tuned). Hugging Face.
(https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2)
BibTeX:
@misc{shah2025echohire,
title={EchoHire Question Generator (DistilGPT-2 Fine-tuned)},
author={Shah, Syed Zeeshan},
year={2025},
howpublished={\url{https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2}}
}
- Downloads last month
- 4
Model tree for Zeeshan506/echohire-qgen-distilgpt2
Base model
openai-community/gpt2