1030 37 135

Clémentine Fourrier

clefourrier

http://clefourrier.github.io

AI & ML interests

None yet

Articles

Introducing the Open FinLLM Leaderboard

6 days ago

• 44

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Jun 18

• 36

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

May 24

• 24

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

May 24

• 21

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Apr 19

• 102

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Apr 16

• 13

Introducing the Chatbot Guardrails Arena

Mar 21

• 4

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Mar 5

• 4

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Feb 27

• 31

Introducing the Red-Teaming Resistance Leaderboard

Feb 23

• 12

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Feb 20

• 3

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Feb 2

• 2

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Jan 31

• 3

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Jan 29

• 14

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

Jan 12

• 6

2023, year of open LLMs

Dec 18, 2023

• 5

Open LLM Leaderboard: DROP deep dive

Dec 1, 2023

• 3

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 10

What's going on with the Open LLM Leaderboard?

Jun 23, 2023

• 18

Introduction to Graph Machine Learning

Jan 3, 2023

• 15

Organizations

clefourrier's activity

upvoted an article about 14 hours ago

Article

Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin

•

1 day ago

• 4

upvoted an article 8 days ago

Article

A Short Summary of Chinese AI Global Expansion

•

8 days ago

• 12

upvoted a collection 14 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 14 days ago • 239

upvoted 2 articles 3 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 244

Article

Our Transformers Code Agent beats the GAIA benchmark!

Jul 1

• 45

upvoted a paper 3 months ago

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

Paper • 2406.06565 • Published Jun 3 • 6

upvoted a collection 3 months ago

🎭 Avatars

Collection

The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 67 items • Updated 10 days ago • 73

upvoted a paper 4 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 85

upvoted an article 4 months ago

Article

Space secrets security update

May 31

• 50

upvoted 3 articles 5 months ago

Article

Evaling llm-jp-eval (evals are hard)

•

May 18

• 4

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

•

Apr 24

• 56

upvoted a collection 5 months ago

Granite Code Models

Collection

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated Aug 30 • 165

upvoted a paper 5 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted 2 articles 5 months ago

Article

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

May 3

• 13

Article

Improving Prompt Consistency with Structured Generations

Apr 30

• 53

upvoted 10 articles 6 months ago

Article

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

Jan 12

• 6

Article

An Introduction to AI Secure LLM Safety Leaderboard

Jan 26

• 5

Article

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Jan 29

• 14

Article

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Jan 31

• 3

Article

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Feb 20

• 3

Article

Introducing the Red-Teaming Resistance Leaderboard

Feb 23

• 12

Article

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Feb 27

• 31

Article

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Mar 5

• 4

Article

Introducing the Chatbot Guardrails Arena

Mar 21

• 4

Article

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Feb 2

• 2

upvoted 3 papers 7 months ago

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 90

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 593

upvoted a paper 9 months ago

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Paper • 2312.17090 • Published Dec 28, 2023 • 4

upvoted a paper 10 months ago

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 56

upvoted a collection 10 months ago

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 213

upvoted 2 papers 10 months ago

NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian

Paper • 2312.01314 • Published Dec 3, 2023 • 2

The Falcon Series of Open Language Models

Paper • 2311.16867 • Published Nov 28, 2023 • 12

upvoted a paper 11 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 182

upvoted a collection 11 months ago

LLM Leaderboard best models ❤️‍🔥

Collection

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated Jun 22 • 401

upvoted a paper 12 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

Clémentine Fourrier

AI & ML interests

Articles

Introducing the Open FinLLM Leaderboard

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Let's talk about LLM evaluation

Introducing the Open Arabic LLM Leaderboard

Introducing the Open Leaderboard for Hebrew LLMs!

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Improving Prompt Consistency with Structured Generations

Introducing the Open Chain of Thought Leaderboard

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Introducing the Chatbot Guardrails Arena

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Introducing the Red-Teaming Resistance Leaderboard

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

2023, year of open LLMs

Open LLM Leaderboard: DROP deep dive

Overview of natively supported quantization schemes in 🤗 Transformers

What's going on with the Open LLM Leaderboard?

Introduction to Graph Machine Learning

Organizations

clefourrier's activity

Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin

A Short Summary of Chinese AI Global Expansion

SmolLM - blazingly fast and remarkably powerful

Our Transformers Code Agent beats the GAIA benchmark!

Space secrets security update

Evaling llm-jp-eval (evals are hard)

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Improving Prompt Consistency with Structured Generations

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

An Introduction to AI Secure LLM Safety Leaderboard

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Introducing the Red-Teaming Resistance Leaderboard

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Introducing the Chatbot Guardrails Arena

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates