Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Vendor ABC

company
Activity Feed Request to join this org

AI & ML interests

None defined yet.

Rajiv Shah's profile pictureTaylor Linton's profile picture

rajistics 
posted an update 12 months ago
view post
Post
3685
Having some fun with long context benchmarks (watch the video!!)

NoLiMA: NoLiMa: Long-Context Evaluation Beyond Literal Matching (2502.05167)
Fiction LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87
Michalenglo: https://deepmind.google/research/publications/117639/
LongGenBench: Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models (2409.02076)
NeedleBench: NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? (2407.11963)
RULER: RULER: What's the Real Context Size of Your Long-Context Language Models? (2404.06654)

For more: https://www.reddit.com/r/rajistics/comments/1jxwk29/long_context_llm_benchmarks_video/

let me know if you like these posts
rajistics 
updated 3 models over 3 years ago

vendorabc/tabular-playground

Tabular Classification • Updated Aug 30, 2022

vendorabc/modeltest

Tabular Classification • Updated Aug 30, 2022

vendorabc/modelhubexample

Tabular Classification • Updated Aug 30, 2022
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs