Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Tudor M
tudorizer
Follow
ijohn07's profile picture
1 follower
Β·
20 following
https://paragraph.xyz/@tudorizer
tudorizer
tudormunteanu
tudorm
tudorizer.bsky.social
AI & ML interests
Hardware, GPUs, renewable energy; enverge.ai
Recent Activity
upvoted
an
article
10 days ago
Energy Scores for AI Models
reacted
to
m-ric
's
post
with π₯
18 days ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: πͺπ²πΉπ°πΌπΊπ² π πΌπ±π²πΏπ»πππ₯π§! π€ We talk a lot about β¨Generative AIβ¨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. β‘οΈ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π§π;ππ₯: ποΈ Architecture changes: β First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 β¨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. π₯ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post π https://huggingface.co/blog/modernbert
updated
a Space
about 2 months ago
tudorizer/tiny-coder
View all activity
Organizations
tudorizer
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
upvoted
an
article
10 days ago
view article
Article
Energy Scores for AI Models
By
sasha
β’
May 9, 2024
β’
33