mmhamdy (Mohammed Hamdy)

Posts 6

Post

1638

🔗 Evaluating Long Context #1: Long Range Arena (LRA)

Accurately evaluating how well language models handle long contexts is crucial, but it's also quite challenging to do well. In this series of posts, we're going to examine the various benchmarks that were proposed to assess long context understanding, starting with Long Range Arens (LRA)

Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation.

📌 Key Features of LRA

1️⃣ Diverse Tasks: The LRA benchmark consists of a suite of tasks designed to evaluate model performance on long sequences ranging from 1,000 to 16,000 tokens. These tasks encompass different data types and modalities: Text, Natural and Synthetic Images, and Mathematical Expressions.

2️⃣ Synthetic and Real-world Tasks: LRA is comprised of both synthetic probing tasks and real-world tasks.

3️⃣ Open-Source and Extensible: Implemented in Python using Jax and Flax, the LRA benchmark code is publicly available, making it easy to extend.

📌 Tasks

1️⃣ Long ListOps

2️⃣ Byte-level Text Classification and Document Retrieval

3️⃣ Image Classification

4️⃣ Pathfinder and Pathfinder-X (Long-range spatial dependency)

👨‍💻 Long Range Arena (LRA) Github Repository: https://github.com/google-research/long-range-arena

📄 Long Range Arena (LRA) paper: Long Range Arena: A Benchmark for Efficient Transformers (2011.04006)

Post

3624

🚀 Introducing The Open Language Models List

This is a work-in-progress list of open language models with permissive licenses such as MIT, Apache 2.0, or other similar licenses.

The list is not limited to only autoregressive models or even only transformers models, and it includes many SSMs, and SSM-Transformers hybrids.

🤗 Contributions, corrections, and feedback are very welcome!

The Open Language Models List: https://github.com/mmhamdy/open-language-models

View all posts

Collections 3

Papers 1

arxiv:2407.14933

spaces 3

Sleeping

🏆

models 17

Mohammed Hamdy

AI & ML interests

Organizations

Posts 6

Collections 3

facebook/esmfold_v1

ElnaggarLab/ankh-base

ElnaggarLab/ankh-large

RITA: a Study on Scaling Up Generative Protein Sequence Models

Chatbot Arena Leaderboard

Open LLM Leaderboard 2

AI2 WildBench Leaderboard (V2)

URIAL Bench (Eval Base LLMs on MT-Bench)

Papers 1

spaces 3

Speech To Speech Translation

Automatic Speech Recognition

Music Genre Classifier

models 17

mmhamdy/speecht5-finetuned-fleurs-it-it

mmhamdy/whisper-tiny-finetuned-minds14-en-us

mmhamdy/whisper-tiny-finetuned-gtzan

mmhamdy/poca-SoccerTwos

mmhamdy/rl_course_vizdoom_health_gathering_supreme

mmhamdy/ppo-LunarLander-v2-2

mmhamdy/a2c-PandaReachDense-v2

mmhamdy/a2c-AntBulletEnv-v0

mmhamdy/Reinforce-Pixelcopter-PLE-v0

mmhamdy/ppo-Pyramids

datasets 1

mmhamdy/Arabic-OpenHermes-Filtered

Mohammed Hamdy

AI & ML interests

Organizations

Posts 6

Collections 3

Chatbot Arena Leaderboard

Open LLM Leaderboard 2

AI2 WildBench Leaderboard (V2)

URIAL Bench (Eval Base LLMs on MT-Bench)

Papers 1

spaces 3 Sort: Recently updated

Speech To Speech Translation

Automatic Speech Recognition

Music Genre Classifier

models 17 Sort: Recently updated

datasets 1

spaces 3

models 17