facebook
/

incoder-1B

Text Generation

Model card Files Files and versions

dpfried commited on May 31, 2022

Commit

01f4604

·

1 Parent(s): 9a7562b

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -39,14 +39,16 @@ See [https://github.com/dpfried/incoder](https://github.com/dpfried/incoder) for
 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-1B")`
 ### Tokenizer
-`tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-1B")`.
-Note: the incoder-1B and incoder-6B tokenizers are identical, so 'facebook/incoder-6B' could also be used.
-When calling `tokenizer.decode`, it's important to pass `clean_up_tokenization_spaces=False` to avoid removing spaces after punctuation:
 `tokenizer.decode(tokenizer.encode("from ."), clean_up_tokenization_spaces=False)`
 ## License
 CC-BY-NC 4.0

 `model = AutoModelForCausalLM.from_pretrained("facebook/incoder-1B")`
 ### Tokenizer
+`tokenizer = AutoTokenizer.from_pretrained("facebook/incoder-1B")`
+(Note: the incoder-1B and incoder-6B tokenizers are identical, so 'facebook/incoder-6B' could also be used.)
+When calling `tokenizer.decode`, it's important to pass `clean_up_tokenization_spaces=False` to avoid removing spaces after punctuation. For example:
 `tokenizer.decode(tokenizer.encode("from ."), clean_up_tokenization_spaces=False)`
+(Note: encoding prepends the `<|endoftext|>` token, as this marks the start of a document to our model. This token can be removed from the decoded output by passing `skip_special_tokens=True` to `tokenizer.decode`.)
 ## License
 CC-BY-NC 4.0