lordvader31 (Keane Moraes)

liked a dataset about 1 month ago

HuggingFaceTB/smoltalk

Viewer • Updated Nov 26, 2024 • 2.2M • 7.92k • 267

liked a model 3 months ago

meta-llama/Llama-3.2-1B

Text Generation • Updated Oct 24, 2024 • 1.34M • • 1.4k

reacted to mlabonne's post with 👍 6 months ago

Post

17889

Large models are surprisingly bad storytellers.

I asked 8 LLMs to "Tell me a bedtime story about bears and waffles."

Claude 3.5 Sonnet and GPT-4o gave me the worst stories: no conflict, no moral, zero creativity.

In contrast, smaller models were quite creative and wrote stories involving talking waffle trees and bears ostracized for their love of waffles.

Here you can see a comparison between Claude 3.5 Sonnet and NeuralDaredevil-8B-abliterated. They both start with a family of bears but quickly diverge in terms of personality, conflict, etc.

I mapped it to the hero's journey to have some kind of framework. Prompt engineering can definitely help here, but it's still disappointing that the larger models don't create better stories right off the bat.

Do you know why smaller models outperform the frontier models here?

44 replies

·

upvoted a paper 6 months ago

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published Jul 2, 2024 • 34

liked a model 7 months ago

NousResearch/Meta-Llama-3-8B-Alternate-Tokenizer

Text Generation • Updated Jul 20, 2024 • 2.27k • 12

upvoted a collection 7 months ago

Code Llama Family

Collection

This collection hosts the transformers repos of the Code Llama release • 12 items • Updated Dec 6, 2024 • 42

upvoted an article 8 months ago

Article

Finetune Mixtral 8x7B with AutoTrain

By

•

Apr 1, 2024

• 8

upvoted a collection 9 months ago

🔍 Daily Picks in Interpretability & Analysis of LMs

Collection

Outstanding research in interpretability and evaluation of language models, summarized • 92 items • Updated 5 days ago • 94

reacted to gsarti's post with ❤️ 9 months ago

Post

2152

🔍 Today's pick in Interpretability & Analysis of LMs: Do language models plan ahead for future tokens? by W. Wu @jxm @lionellevine

This work aims to evaluate whether language models exhibit implicit planning during generation.

Authors propose two hypotheses that could result in planning-like behavior: 
- Pre-caching: the model engages in computation that is functional to future, but not current, predictions. 
- Breadcrumbs: Features contributing to the current prediction happen to also be the ones improving future ones.

To validate which behavior is observed in practice, authors note that off-diagonal gradients for weight matrices across the model are the ones responsible for pre-caching, and craft a variant of gradient descent (myopic descent) to remove such terms from the optimization procedure.

Using a synthetic dataset, authors demonstrate that pre-caching occurs in Transformers language models. However, for natural language settings the LM is observed to leverage breadcrumbs from previous passes even in the case of myopic training, rendering the latter hypothesis more plausible to account for model behavior.

📄 Paper: Do language models plan ahead for future tokens? (2404.00859)

🔍 All daily picks: https://huggingface.co/collections/gsarti/daily-picks-in-interpretability-and-analysis-ofc-lms-65ae3339949c5675d25de2f9