GRATH: Gradual Self-Truthifying for Large Language Models
Paper
•
2401.12292
•
Published
•
2
This is a gradually self-truthified model (with one iteration) proposed in the paper GRATH: Gradual Self-Truthifying for Large Language Models.
Note: This model is applied with DPO twice. The reference model of DPO is set as the current base model.
The following bitsandbytes quantization config was used during training:
The following bitsandbytes quantization config was used during training:
PEFT 0.5.0
PEFT 0.5.0