Cross-reference datasets
Browse filesThis improves discoverability and increases transparency.
To get started, I added the two largest ones (in terms of tokens contributed to training) as mentioned in the model card, feel free to be more exhaustive :)
README.md
CHANGED
|
@@ -39,6 +39,9 @@ language:
|
|
| 39 |
- sr
|
| 40 |
- sv
|
| 41 |
- uk
|
|
|
|
|
|
|
|
|
|
| 42 |
---
|
| 43 |
|
| 44 |

|
|
|
|
| 39 |
- sr
|
| 40 |
- sv
|
| 41 |
- uk
|
| 42 |
+
datasets:
|
| 43 |
+
- oscar-corpus/colossal-oscar-1.0
|
| 44 |
+
- HuggingFaceFW/fineweb-edu
|
| 45 |
---
|
| 46 |
|
| 47 |

|