Model Card for EchoHire Question Generator (DistilGPT-2 Fine-tuned)

This model is a fine-tuned DistilGPT-2 designed to generate interview questions based on a job title and associated skills. It was trained on a curated dataset of ~100 jobs, each with 17โ€“19 questions, to produce relevant, structured interview questions for recruiters or learning platforms.

Model Details

Model Description

This model generates questions automatically given a prompt with the job title and skills. The outputs are intended to help recruiters, HR teams, or training platforms quickly generate relevant interview questions without manually writing each one.

  • Developed by: Syed Zeeshan Shah
  • Model type: Causal Language Model (GPT-2)
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: distilgpt2
  • Shared by: Zeeshan506

Model Sources

Uses

Direct Use

  • Generate interview questions automatically by providing a job title and skills.
  • Can be used for recruitment platforms, HR automation, or interview prep content.

Downstream Use

  • Could be fine-tuned further on domain-specific roles (e.g., data science, embedded systems).
  • Can be integrated into apps, bots, or SaaS platforms for HR automation.

Out-of-Scope Use

  • Not intended for generating biased or discriminatory questions.
  • Should not be used as a sole source of assessment or evaluation for candidates.
  • Outputs may not always reflect up-to-date technology trends or best practices.

Bias, Risks, and Limitations

  • Trained on a limited dataset of ~100 jobs, so may not generalize to rare or niche roles.
  • Model can produce long or merged questions; post-processing may be needed.
  • Users should verify all generated questions for correctness and appropriateness.

Recommendations

  • Review outputs before using in real interviews.
  • Consider further fine-tuning for specific industries or technical domains.
  • Post-process outputs for clean question formatting.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Zeeshan506/echohire-qgen-distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Job Title: Backend Developer\nSkills: Python, FastAPI, PostgreSQL\nQuestions:"

inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = (input_ids != tokenizer.pad_token_id).long()

output_ids = model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    max_new_tokens=300,
    num_beams=3,
    no_repeat_ngram_size=2
)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

Training Details

Training Data

  • Dataset of ~100 jobs with 17โ€“19 interview questions each.
  • Input: Job title + skills
  • Output: Structured questions

Training Procedure

  • Fine-tuned using Hugging Face Trainer API
  • Epochs: 2
  • Batch size: 2 per GPU
  • Learning rate: 5e-5
  • Tokenizer: distilgpt2 tokenizer, max input length 512
  • Output: Structured questions

Metrics

  • Training Loss: Cross-entropy loss tracked during fine-tuning.

  • Human Review: Qualitative evaluation for relevance, coherence, and completeness of generated questions.

  • Automated Evaluation (BERTScore):
    We performed BERTScore evaluation on a subset of the dataset (first 10 examples) to measure semantic similarity between the generated questions and reference completions using contextual embeddings.

    BERTScore Results (first 10 examples):

    • Precision: 0.8691
    • Recall: 0.8870
    • F1 Score: 0.8780

These results indicate that the model generates questions that are highly aligned with reference completions, confirming the quality and effectiveness of the fine-tuned model.

Results

  • Model successfully generates relevant interview questions for multiple technical domains.
  • Some outputs may merge multiple questions; minor formatting post-processing recommended.

Environmental Impact

  • Hardware Type: NVIDIA Tesla T4 (Google Colab)
  • Hours used: ~3โ€“4 for training
  • Compute Region: Colab USA/Europe
  • Carbon Emitted: Minimal (single Colab GPU session)

Technical Specifications

  • Architecture: DistilGPT-2, causal language model
  • Objective: Generate text conditioned on job title + skills
  • Software: Transformers library, PyTorch, Hugging Face Hub

Citation

If you use this model, please cite as:

APA:

Shah, S. Z. (2025). EchoHire Question Generator (DistilGPT-2 Fine-tuned). Hugging Face.

(https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2)

BibTeX:

@misc{shah2025echohire,
  title={EchoHire Question Generator (DistilGPT-2 Fine-tuned)},
  author={Shah, Syed Zeeshan},
  year={2025},
  howpublished={\url{https://huggingface.co/Zeeshan506/echohire-qgen-distilgpt2}}
}
Downloads last month
4
Safetensors
Model size
81.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Zeeshan506/echohire-qgen-distilgpt2

Finetuned
(2089)
this model