🏗️ Building on HF

Tom Aarsen

tomaarsen

huggingface

·

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

liked a model about 5 hours ago

thinkingmachines/Inkling

liked a model about 9 hours ago

microsoft/colipri

liked a model 1 day ago

nvidia/Nemotron-3-Embed-8B-BF16

View all activity

Organizations

upvoted a paper 1 day ago

Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing

Paper • 2606.03565 • Published Jun 14 • 3

upvoted an article 7 days ago

Article

Native-speed vLLM transformers modeling backend

hmellor, lysandre

•

8 days ago

• 48

upvoted a paper 7 days ago

Quantifying and Expanding the Theoretical Capacity of Late-Interaction Retrieval Models

Paper • 2607.05803 • Published 9 days ago • 10

upvoted 3 articles 8 days ago

Article

From Hugging Face to Amazon SageMaker Studio in one click

amazon

•

8 days ago

• 12

Article

Hugging Face Models on Foundry Managed Compute

microsoft

•

8 days ago

• 11

Article

After the party comes the free lunch: regularizing ColBERT models to enhance pooling capabilities and reduce index footprint

lightonai

•

9 days ago

• 13

upvoted a paper 8 days ago

Gemma 4 Technical Report

Paper • 2607.02770 • Published 14 days ago • 63

upvoted a paper 9 days ago

Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search

Paper • 2605.30917 • Published May 29 • 8

upvoted an article 10 days ago

Article

LeRobot v0.6.0: Imagine, Evaluate, Improve

+7

imstevenpmwork, pepijn223, CarolinePascal, lilkm, maximellerbach, nepyope, nikodembartnik, Nico-robot, thomwolf

•

9 days ago

• 62

upvoted a collection 17 days ago

Lychee-KaLM-Reranker

13 items • Updated 6 days ago • 3

upvoted an article 19 days ago

Article

SportsBERT Small: Domain model for Sports

NeuML

•

19 days ago

• 4

upvoted a changelog 20 days ago

Hugging Face Changelog

Share your feedback with us

20 days ago

• 130

upvoted a collection 20 days ago

Ornith-1.0

Ornith-1.0 is a family of open-source LLMs specialized for agentic coding. • 8 items • Updated 19 days ago • 345

upvoted a collection 21 days ago

DoctoBERT-fr

French medical encoders pretrained from scratch on curated and LLM-rephrased medical web data. • 6 items • Updated 9 days ago • 9

upvoted an article 21 days ago

Article

Where Does the Signal Live? <br> A Web Data Recipe for Medical Encoder Pretraining

bofenghuang

•

25 days ago

• 9

upvoted an article 22 days ago

Article

Building Moon Bot: A Slack-Native Coding Agent Backed by HuggingFace Buckets

huggingface

•

22 days ago

• 46

upvoted an article 23 days ago

Article

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

Wauplin, celinah

•

23 days ago

• 20

upvoted an article 27 days ago

Article

Is it agentic enough? Benchmarking open models on your own tooling

+1

lysandre, SaylorTwift, pcuenq

•

28 days ago

• 19

upvoted a collection 29 days ago

SkMTEB

16 items • Updated Apr 21 • 1

upvoted an article 29 days ago

Article

Party is over: regularizing ColBERT models to fix efficient ANN methods

lightonai

•

29 days ago

• 23