clem (Clem 🤗)

posted an update 2 days ago

Post

3487

Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!

3 replies

·

posted an update 5 days ago

Post

1808

What are we thinking about MovieGen from Meta? Are the researchers on Hugging Face to be able to ask them questions?

The paper is here: https://ai.meta.com/static-resource/movie-gen-research-paper

posted an update 5 days ago

Post

3586

Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!

3 replies

·

replied to MoritzLaurer's post 6 days ago

congrats!

posted an update about 1 month ago

Post

1742

"LLM inference at scale with TGI". Cool blogpost: https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi

Well done
@martinigoyanes @rafa-hernandez @Vidusharma @frisokingma @hannahwright @jeanmarcs @antonioramos & the whole https://huggingface.co/adyen team. Could be useful to cross-post here: https://huggingface.co/blog/community

2 replies

·

replied to their post about 1 month ago

I guess https://huggingface.co/docs/huggingface_hub/v0.5.1/en/package_reference/hf_api? @Wauplin is the expert I think on the topic

replied to Taylor658's post about 1 month ago

great video!

replied to their post about 1 month ago

It depends what you want to do but you can embed gradio/spaces (https://huggingface.co/docs/hub/en/spaces-sdks-gradio#embed-gradio-spaces-on-other-webpages), enable sign in with hf (https://huggingface.co/docs/hub/en/oauth) or just redirect to your org page (or any HF page)

posted an update about 1 month ago

Post

1777

Very cool to see more and more amazing startups like https://huggingface.co/PrunaAI relying on Hugging Face to get more visibility, distribution and usage!

7 replies

·

posted an update about 1 month ago

Post

4114

Just crossed 200,000 free public AI datasets shared by the community on Hugging Face! Text, image, video, audio, time-series & many more... Thanks everyone!

http://hf.co/datasets

posted an update about 2 months ago

Post

1547

Shoutout to everyone who participated in BigScience! Doesn't get enough credit but IMO paved the way for open-source LLMs!

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2211.05100)
bigscience/bloom
bigscience/bloomz

posted an update about 2 months ago

Post

3586

This isn’t a goal of ours because we have plenty of money in the bank but quite excited to see that @huggingfaceis profitable these days, with 220 team members and most of our platform being free (like model hosting) and open-source for the community!

Especially noteworthy at a time when most AI startups wouldn’t survive a year or two without VC money. Yay!

4 replies

·

replied to ybelkada's post about 2 months ago

nice!

posted an update about 2 months ago

Post

1617

@nb2375 welcome to HF!

replied to samjulien's post 2 months ago

very cool!

replied to 1aurent's post 3 months ago

very cool!

replied to fdaudens's post 3 months ago

Beautiful team work!

posted an update 3 months ago

Post

2492

This is the week of small AI language models!

4 replies

·

posted an update 3 months ago

Post

5762

5,000 new repos (models, datasets, spaces) are created EVERY DAY on HF now. The community is amazing!

replied to louisbrulenaudet's post 3 months ago

very cool! feel free to create the HF for Legal org and share about it and we can amplify!

posted an update 4 months ago

Post

3632

Who said you couldn't build a big business based on open-source AI? Congrats Mistral team: https://huggingface.co/mistralai

replied to thomwolf's post 4 months ago

Beautiful work!

replied to lunarflu's post 4 months ago

omg would be sick!

posted an update 4 months ago

Post

1542

I would pick @ylecun over @elonmuskceo every single day of the week.

Despite getting much less $$, recognition & visibility than entrepreneurs, the scientists who publish their groundbreaking research openly are the cornerstone of technological progress & massively contribute to making the world a better place!

1 reply

·

replied to singhsidhukuldeep's post 5 months ago

very cool! cc @clefourrier

replied to their post 5 months ago

https://huggingface.co/posts/merve/375349782904361

replied to their post 5 months ago

any info on when it's going to be released though?

replied to their post 5 months ago

do you have a link?

posted an update 5 months ago

Post

1565

What are you excited about from Google I/O?

9 replies

·

replied to Undi95's post 5 months ago

congrats!

replied to HeshamHaroon's post 5 months ago

very cool, thanks for sharing!

replied to singhsidhukuldeep's post 5 months ago

Interesting update! They can open-source GPT4 now haha

replied to danielhanchen's post 6 months ago

congratulations! well deserved!

replied to fdaudens's post 6 months ago

gotta catch them all!

replied to gsarti's post 6 months ago

you should create an org on HF for it

posted an update 6 months ago

Post

2521

Great in-depth Llama-3 tests from @wolfram , of the models from Meta of course but also @MaziyarPanahi @emozilla @turboderp : https://huggingface.co/blog/wolfram/llm-comparison-test-llama-3

Spotted by @jack-kumar

2 replies

·

replied to their post 6 months ago

let's fix it: https://twitter.com/ClementDelangue/status/1782065141200073122

posted an update 6 months ago

Post

2905

Already almost 1,000 llama3 model variations have been shared publicly on HF (many more in private use at companies): https://huggingface.co/models?p=5&sort=trending&search=llama3.

Everyone should fine-tune their own models for their use-cases, languages, industry, infra constraints,...

10,000 llama3 variants by the end of next week?

4 replies

·

replied to visheratin's post 6 months ago

Thank you! You should tweet it mentioning @elonmuskceo !

posted an update 6 months ago

Post

2675

We noticed that all the open-source models and datasets from https://huggingface.co/WizardLM in their personal Hugging Face account & in the Microsoft Hugging Face organization (https://huggingface.co/microsoft) have been made private by the author, which will lead some demos to fail (these models were collectively downloaded over a hundred thousand times a month).

This is the explanation that @WizardLM communicated a few hours ago: https://huggingface.co/posts/WizardLM/329547800484476#661e0d17bca1a6038b60503e

We apologize for the inconvenience & are trying to get in touch with the author & Microsoft in order to try to find a good resolution for community members. Let us know if you have any questions!

1 reply

·

posted an update 6 months ago

Post

2431

Fun dataset added last week by @esind from https://huggingface.co/Anthropic to compare persuasiveness between AI and human outputs:
Anthropic/persuasion

2 replies

·

posted an update 6 months ago

Post

2527

Introducing gretelai/synthetic_text_to_sql by https://huggingface.co/gretelai

It stands as the largest and most diverse synthetic Text-to-SQL dataset available to-date.

The dataset includes:

- 105,851 records partitioned into 100,000 train and 5,851 test records
~23M total tokens, including ~12M SQL tokens
- Coverage across 100 distinct domains/verticals
- Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
- Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
- Database context, including table and view create statements
- Natural language explanations of what the SQL query is doing
- Contextual tags to optimize model training

Blogpost: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Dataset: gretelai/synthetic_text_to_sql

1 reply

·

replied to Smooke's post 6 months ago

Thanks for sharing!

replied to julien-c's post 7 months ago

Welcome @josefprusa !

posted an update 8 months ago

Post

Terribly excited about open-source + on-device AI these days! Great to see @qualcomm release 80+ models optimized and curated for their devices and chips on HF: https://huggingface.co/qualcomm

1 reply

·

replied to dvilasuero's post 8 months ago

Unpopular opinion: this is the most impactful release of the day (because open)!

replied to DmitryRyumin's post 8 months ago

would be cool to have some integration with the HF hub

replied to trisfromgoogle's post 8 months ago

This is awesome!

replied to stas's post 8 months ago

very cool!

replied to victor's post 8 months ago

This comment has been hidden

replied to manu's post 8 months ago

🇫🇷🇫🇷🇫🇷

replied to dvilasuero's post 8 months ago

Very cool!

replied to clefourrier's post 8 months ago

very useful! This is the link to the leaderboard btw: https://huggingface.co/spaces/PatronusAI/enterprise_scenarios_leaderboard

replied to julien-c's post 8 months ago

very cool!

posted an update 8 months ago

Post

So impressed with the speed and accuracy of vikhyatk/moondream1 by
@vikhyatk (especially the last answer 😝😝😝).

Open multi-modal models have gone a long way!

Model: vikhyatk/moondream1

1 reply

·

posted an update 8 months ago

Post

With the Google announcement last week, I think we're now officially the only AI startup out there who has commercial collaborations with all the major cloud providers (AWS, GCP, Azure) and hardware providers (Nvidia, AMD, Intel, Qualcomm,...), making our vision of being the independent and agnostic platform for all AI builders truer than ever!

Let's go!

posted an update 9 months ago

Post

In 2024, we're expanding from open weights to open EVERYTHING (datasets, training scripts,...).

Excited to see this dataset release in French by @Pclanglais @carbonbasedLLM @anastasiastasenko :
PleIAs/French-PD-Newspapers

"To give you an idea of the size, the full French Wikipedia is about 2 billon words. This is 40 times larger."