--- license: mit tags: - text-generation - transformer - tiny-shakespeare - decoder-only model-index: - name: tiny_shakespeare_transformer results: [] --- # tiny_shakespeare_transformer A small Transformer Decoder model trained from scratch on the Tiny Shakespeare dataset. ## Training details - Dataset: Tiny Shakespeare - Epochs: 5 - Learning Rate: 0.0003 - Batch Size: 32 - Block Size: 128 - Optimizer: AdamW - Loss Function: CrossEntropyLoss - Dropout Rate: 0.1 - Embedding Dimension: 256 - Number of Layers: 6 - Number of Attention Heads: 8 ## Usage To use this model, simply load it using the following code: ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer") tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer") # Encode input text inputs = tokenizer("Once upon a time", return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0])) ``` ## Model Architecture This model is a Transformer Decoder-based architecture, optimized for text generation. It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text. ## Training Process - Training was performed for 5 epochs. - The model uses AdamW optimizer with a learning rate of 0.0003. - Dropout rate during training was set to 0.1 to reduce overfitting. ## License This model is released under the MIT License.