Spaces-explorers

Activity Feed Request to join this org

AI & ML interests

Contributors who are invited to beta-test our next big feature! Contact us if you want to join this team :-)

Recent Activity

abhishek authored a paper 3 months ago

AutoTrain: No-code training for state-of-the-art models

nguyenvulebinh authored a paper 5 months ago

Convoifilter: A case study of doing cocktail party speech recognition

vumichien authored a paper 5 months ago

Consent in Crisis: The Rapid Decline of the AI Data Commons

View all activity

spaces-explorers's activity

Borchmann

authored 4 papers 2 months ago

kobiso

authored a paper 3 months ago

Intriguing Properties of Large Language and Vision Models

Paper • 2410.04751 • Published Oct 7, 2024 • 16

mariagrandury

authored a paper 3 months ago

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?

Paper • 2409.15334 • Published Sep 8, 2024

raedle

authored a paper 5 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 110

mariagrandury

authored a paper 5 months ago

The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain

Paper • 2407.17479 • Published Jul 1, 2024

kobiso

authored a paper 6 months ago

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

Paper • 2407.03958 • Published Jul 4, 2024 • 18

mariagrandury

authored a paper 6 months ago

Spanish and LLM Benchmarks: is MMLU Lost in Translation?

Paper • 2406.17789 • Published May 28, 2024

tolgacangoz

posted an update 8 months ago

Post

1125

Hi @TimothyAlexisVass , thanks for [this great blog post](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space#the-4-channels-of-the-sdxl-latents)! Shouldn't this be like that?
https://github.com/TimothyAlexisVass/hf-colorspace-blogpost/pull/1

efederici

posted an update 8 months ago

Post

1630

Finally, I can post! 🚀

I created a Capybara-inspired Italian dataset by translating the initial instruction and running it through a pipeline to generate conversations. I used Claude Sonnet for translation and instruction generation, and Opus for generating the answers.

I hope this dataset proves useful for people working on 🇮🇹 language models.

⛁ Open sourcing the dataset here: efederici/capybara-claude-15k-ita

1 reply

nateraw

posted an update 8 months ago

Post

3869

I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.

Check it out here ⤵️
https://nateraw.com/posts/training_musicgen_songstarter.html

:) still kinda a WIP so if there's anything else you want to see, let me know.

3 replies

nateraw

posted an update 9 months ago

Post

4318

Turns out if you do a cute little hack, you can make nateraw/musicgen-songstarter-v0.2 work on vocal inputs. 👀

Now, you can hum an idea for a song and get a music sample generated with AI 🔥🔥

Give it a try: ➡️ nateraw/singing-songstarter ⬅️

It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!

manandey

authored a paper 9 months ago

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 6

manandey

authored a paper 10 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 136

zpn

posted an update 11 months ago

Post

ICYMI! Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning

- Variable embedding dimension from 64 <-> 768
- Outperforms text-embedding-ada-002 while achieving a 3x memory reduction
- Day 1 integrations with Langchain, LlamaIndex, MongoDB, and Sentence Transformers

Check out
nomic-ai/nomic-embed-text-v1.5 for the model weights.

Technical report: https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf
Blog Post: https://blog.nomic.ai/posts/nomic-embed-matryoshka
Original Tweet Thread: https://x.com/nomic_ai/status/1757782157374734665?s=20

nav13n

authored a paper 11 months ago

MUSTAN: Multi-scale Temporal Context as Attention for Robust Video Foreground Segmentation

Paper • 2402.00918 • Published Feb 1, 2024

zpn

authored a paper 11 months ago

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 14

zpn

posted an update 11 months ago

Post

ICYMI! Nomic Embed, the first fully open long context text embedder to beat OpenAI

- Open source, open weights, open data
- Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks
- Day 1 integrations with Langchain, LlamaIndex, MongoDB, and Sentence Transformers

Check out nomic-ai/nomic-embed-text-v1 for the model weights.

Technical report: https://static.nomic.ai/reports/2024_Nomic_Embed_Text_Technical_Report.pdf
Blog Post: https://blog.nomic.ai/posts/nomic-embed-text-v1
Original Tweet Thread: https://x.com/nomic_ai/status/1753082063048040829?s=20

1 reply

AI & ML interests

Recent Activity

Team members 1364

spaces-explorers's activity