I created a Capybara-inspired Italian dataset by translating the initial instruction and running it through a pipeline to generate conversations. I used Claude Sonnet for translation and instruction generation, and Opus for generating the answers.
I hope this dataset proves useful for people working on 🇮🇹 language models.
I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.
It'll take your voice and try to autotune it (because let's be real, you're no michael jackson), then pass it along to the model to condition on the melody. It works surprisingly well!
ICYMI! Nomic Embed v1.5: Resizable Production Embeddings with Matryoshka Representation Learning
- Variable embedding dimension from 64 <-> 768 - Outperforms text-embedding-ada-002 while achieving a 3x memory reduction - Day 1 integrations with Langchain, LlamaIndex, MongoDB, and Sentence Transformers
ICYMI! Nomic Embed, the first fully open long context text embedder to beat OpenAI
- Open source, open weights, open data - Beats OpenAI text-embeding-3-small and Ada on short and long context benchmarks - Day 1 integrations with Langchain, LlamaIndex, MongoDB, and Sentence Transformers