Welcome to my personal page! 🌸 I'm Loubna, a Research Engineer at Hugging Face. I work on many aspects of training open foundation models, from data curation to pretraining recipes and evaluation, across different domains.
I started with code generation on the core team behind BigCode (The Stack, StarCoder, StarCoder2), then moved to small language models, leading SmolLM, SmolLM2 & SmolLM3 and building pretraining datasets like FineWeb-Edu. I'm also the author of The Smol Training Playbook, a comprehensive guide to building world-class small LLMs.
More recently, I've been working on AI for biology. We released Carbon, an open generative DNA foundation model. See the paper.
I hold the MVA master's degree from ENS Paris Saclay and an engineering degree from École des Mines de Nancy. I'm based in Paris, but grew up in Morocco, in a small town called Midelt.
I recently talked about LLMs and the impact of synthetic data on the French IT talkshow Underscore_. Watch the interview here.
Full list on Google Scholar.