Model Card for EchoHire Question Generator (DistilGPT-2 Fine-tuned)

This model is a fine-tuned DistilGPT-2 designed to generate interview questions based on a job title and associated skills. It was trained on a curated dataset of ~100 jobs, each with 17–19 questions, to produce relevant, structured interview questions for recruiters or learning platforms.

Model Details

Model Description

This model generates questions automatically given a prompt with the job title and skills. The outputs are intended to help recruiters, HR teams, or training platforms quickly generate relevant interview questions without manually writing each one.

Developed by: Syed Zeeshan Shah
Model type: Causal Language Model (GPT-2)
Language(s) (NLP): English
License: MIT
Finetuned from model: distilgpt2
Shared by: Zeeshan506

Model Sources

Repository: Hugging Face Model Hub
Demo: N/A

Uses

Direct Use

Generate interview questions automatically by providing a job title and skills.
Can be used for recruitment platforms, HR automation, or interview prep content.

Downstream Use

Could be fine-tuned further on domain-specific roles (e.g., data science, embedded systems).
Can be integrated into apps, bots, or SaaS platforms for HR automation.

Out-of-Scope Use

Not intended for generating biased or discriminatory questions.
Should not be used as a sole source of assessment or evaluation for candidates.
Outputs may not always reflect up-to-date technology trends or best practices.

Bias, Risks, and Limitations

Trained on a limited dataset of ~100 jobs, so may not generalize to rare or niche roles.
Model can produce long or merged questions; post-processing may be needed.
Users should verify all generated questions for correctness and appropriateness.

Recommendations

Review outputs before using in real interviews.
Consider further fine-tuning for specific industries or technical domains.
Post-process outputs for clean question formatting.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Zeeshan506/echohire-qgen-distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Job Title: Backend Developer\nSkills: Python, FastAPI, PostgreSQL\nQuestions:"

inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = (input_ids != tokenizer.pad_token_id).long()

output_ids = model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    max_new_tokens=300,
    num_beams=3,
    no_repeat_ngram_size=2
)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

Training Details

Training Data

Dataset of ~100 jobs with 17–19 interview questions each.
Input: Job title + skills
Output: Structured questions

Training Procedure

Fine-tuned using Hugging Face Trainer API
Epochs: 2
Batch size: 2 per GPU
Learning rate: 5e-5
Tokenizer: distilgpt2 tokenizer, max input length 512
Output: Structured questions

Metrics

Training Loss: Cross-entropy loss tracked during fine-tuning.
Human Review: Qualitative evaluation for relevance, coherence, and completeness of generated questions.
Automated Evaluation (BERTScore):
We performed BERTScore evaluation on a subset of the dataset (first 10 examples) to measure semantic similarity between the generated questions and reference completions using contextual embeddings.

BERTScore Results (first 10 examples):
- Precision: 0.8691
- Recall: 0.8870
- F1 Score: 0.8780

These results indicate that the model generates questions that are highly aligned with reference completions, confirming the quality and effectiveness of the fine-tuned model.

Results

Model successfully generates relevant interview questions for multiple technical domains.
Some outputs may merge multiple questions; minor formatting post-processing recommended.

Environmental Impact

Hardware Type: NVIDIA Tesla T4 (Google Colab)
Hours used: ~3–4 for training
Compute Region: Colab USA/Europe
Carbon Emitted: Minimal (single Colab GPU session)

Technical Specifications

Architecture: DistilGPT-2, causal language model
Objective: Generate text conditioned on job title + skills
Software: Transformers library, PyTorch, Hugging Face Hub

Citation

If you use this model, please cite as:

APA:

Shah, S. Z. (2025). EchoHire Question Generator (DistilGPT-2 Fine-tuned). Hugging Face.

(https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2)

BibTeX:

@misc{shah2025echohire,
  title={EchoHire Question Generator (DistilGPT-2 Fine-tuned)},
  author={Shah, Syed Zeeshan},
  year={2025},
  howpublished={\url{https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2}}
}

Downloads last month: 4

Safetensors

Model size

81.9M params

Tensor type

F32

Model tree for Zeeshan506/echohire-qgen-distilgpt2

Base model

openai-community/gpt2

Finetuned

(2089)

this model