Loubna Ben Allal

Loubna Ben Allal

loubnabenallalcontact@gmail.com

Welcome to my personal page! 🌸 I'm Loubna, a Research Engineer at Hugging Face. I work on many aspects of training open foundation models, from data curation to pretraining recipes and evaluation, across different domains.

I started with code generation on the core team behind BigCode (The Stack, StarCoder, StarCoder2), then moved to small language models, leading SmolLM, SmolLM2 & SmolLM3 and building pretraining datasets like FineWeb-Edu. I'm also the author of The Smol Training Playbook, a comprehensive guide to building world-class small LLMs.

More recently, I've been working on AI for biology. We released Carbon, an open generative DNA foundation model. See the paper.

I hold the MVA master's degree from ENS Paris Saclay and an engineering degree from École des Mines de Nancy. I'm based in Paris, but grew up in Morocco, in a small town called Midelt.

I recently talked about LLMs and the impact of synthetic data on the French IT talkshow Underscore_. Watch the interview here.

Publications

Full list on Google Scholar.

Talks