Ivan Fioravanti's picture

Ivan Fioravanti PRO

ivanfioravanti

AI & ML interests

None yet

Recent Activity

Organizations

CoreView's profile picture MLX Vision's profile picture MLX Community's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture

ivanfioravanti's activity

reacted to nyuuzyou's post with ๐Ÿ”ฅ 7 days ago
view post
Post
2494
CS2 Highlights Video Dataset - nyuuzyou/cs2-highlights

A collection of 4,857 high-quality Counter-Strike 2 gameplay highlights featuring:

- Professional and competitive gameplay recordings at 1080p resolution
- Complete metadata including Steam IDs and clip titles
- Preview thumbnails for all videos
- Both 60 FPS (842 clips) and 120 FPS (4,015 clips) content
- Gameplay from Faceit and official competitive modes

This extensive highlights collection provides a valuable resource for developing and evaluating video-based AI applications, especially in esports and competitive gaming contexts. Released under Creative Commons Zero (CC0) license.
reacted to onekq's post with ๐Ÿš€ 7 days ago
view post
Post
3002
๐Ÿ‹ DeepSeek ๐Ÿ‹v3 achieves a solid 7 point jump than v2.5, surpassing GPT-4o, but is still behind ๐Ÿ“ o1 ๐Ÿ“and Claude 3.5.

onekq-ai/WebApp1K-models-leaderboard
reacted to Jaward's post with ๐Ÿ‘€ 7 days ago
view post
Post
2921
nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb
reacted to nyuuzyou's post with ๐Ÿ”ฅ 7 days ago
view post
Post
2137
๐ŸŽจ KLING AI Dataset - nyuuzyou/klingai

A collection of 12,782 AI-generated media items featuring:
- High-quality image and video generations at various resolutions
- Complete metadata including user IDs, prompts, and generation parameters
- Content generated using text-to-image, text-to-video, and image-to-video modalities
- Full generation settings and technical parameters
posted an update 7 days ago
view post
Post
1509
Probably most of you already knows this trick but just in case:
๐Ÿค” Unable to connect to Hugging Face Spaces Dev Mode through local Cursor? ๐Ÿ’ก Don't worry there is an easy trick!

- right click Connect with VS Code
- copy link in your browser
- vscode://vscode-remote/...
- replace vscode with cursor and go
- cursor://vscode-remote/...
reacted to AdinaY's post with ๐Ÿ”ฅ 10 days ago
view post
Post
2899
QvQ-72B-Preview๐ŸŽ„ an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
โœจ Combines visual understanding & language reasoning.
โœจ Scores 70.3 on MMMU
โœจ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
reacted to victor's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
1863
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing ๐Ÿคฏ.

Check this dataset for the results: victor/qwq-misguided-attention
  • 2 replies
ยท
reacted to m-ric's post with โค๏ธ about 1 month ago
view post
Post
2379
Single most important thing to do today: ๐—ด๐—ผ ๐˜๐—ฟ๐˜† ๐—ค๐˜„๐—ค ๐—ผ๐—ป ๐—›๐˜‚๐—ด๐—ด๐—ถ๐—ป๐—ด ๐—–๐—ต๐—ฎ๐˜!

๐Ÿ‘‰ https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview
  • 2 replies
ยท
reacted to KnutJaegersberg's post with ๐Ÿ‘€ about 1 month ago
reacted to elliesleightholm's post with ๐Ÿค— about 1 month ago
reacted to cfahlgren1's post with โค๏ธ about 1 month ago
view post
Post
917
observers ๐Ÿ”ญ - automatically log all OpenAI compatible requests to a dataset๐Ÿ’ฝ

โ€ข supports any OpenAI compatible endpoint ๐Ÿ’ช
โ€ข supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts
reacted to merve's post with โค๏ธ about 1 month ago
view post
Post
1506
Apple released AIMv2 ๐Ÿ a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression ๐Ÿคฏ
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with ๐Ÿค— transformers
replied to m-ric's post 9 months ago
reacted to m-ric's post with โค๏ธ 9 months ago
view post
Post
2071
[๐๐ž๐ฐ ๐๐š๐ฉ๐ž๐ซ] ๐€๐ฅ๐ฅ ๐ญ๐จ๐ค๐ž๐ง๐ฌ ๐ฌ๐ก๐จ๐ฎ๐ฅ๐ ๐ง๐จ๐ญ ๐ซ๐ž๐ช๐ฎ๐ข๐ซ๐ž ๐ญ๐ก๐ž ๐ฌ๐š๐ฆ๐ž ๐ž๐Ÿ๐Ÿ๐จ๐ซ๐ญ ๐ญ๐จ ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž! โ‡’ ๐Œ๐ข๐ฑ๐ญ๐ฎ๐ซ๐ž ๐จ๐Ÿ ๐๐ž๐ฉ๐ญ๐ก๐ฌ ๐Ÿซง๐Ÿ 

Google Researchers were unhappy with the way current decoding generally works: all tokens go through the same layers, thus requiring exactly the same effort to compute.

Whereas in reality, completing the answer to a difficult math problem for instance should be more computationally intense than completing the text of the Declaration of Independence: ๐—ป๐—ผ๐˜ ๐—ฎ๐—น๐—น ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฒ๐—พ๐˜‚๐—ฎ๐—น!

โžก๏ธ ๐—ง๐—ต๐—ฒ๐˜† ๐—ต๐—ฎ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ด๐—ฒ๐—ป๐—ถ๐˜‚๐˜€ ๐—ถ๐—ฑ๐—ฒ๐—ฎ: ๐Ÿ’ก ๐—ต๐—ฎ๐˜ƒ๐—ถ๐—ป๐—ด ๐—ฎ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ด๐—ผ ๐˜๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฎ ๐—ฏ๐—น๐—ผ๐—ฐ๐—ธ ๐˜€๐—ต๐—ผ๐˜‚๐—น๐—ฑ ๐—ฏ๐—ฒ ๐—ผ๐—ฝ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น. The token can go through the block (thus undergoing expensive self-attention computation) or avoid it through a skip connection.
The routing decision is taken on the block level: each block selects from the total sequence the top-k tokens that will go through it, and the others tokens will skip it. ๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ข๐˜ญ๐˜ญ๐˜ฐ๐˜ธ๐˜ด ๐˜ต๐˜ฐ ๐˜ค๐˜ฉ๐˜ฐ๐˜ฐ๐˜ด๐˜ฆ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฆ๐˜น๐˜ข๐˜ค๐˜ต ๐™˜๐™–๐™ฅ๐™–๐™˜๐™ž๐™ฉ๐™ฎ ๐˜ฐ๐˜ง ๐˜ข ๐˜ฃ๐˜ญ๐˜ฐ๐˜ค๐˜ฌ, ๐˜ช.๐˜ฆ. ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฑ๐˜ฐ๐˜ณ๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ฐ๐˜ง ๐˜ต๐˜ฐ๐˜ฌ๐˜ฆ๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ข๐˜ต ๐˜จ๐˜ฐ ๐˜ต๐˜ฉ๐˜ณ๐˜ฐ๐˜ถ๐˜จ๐˜ฉ ๐˜ช๐˜ต, ๐˜ธ๐˜ฉ๐˜ช๐˜ค๐˜ฉ ๐˜ฅ๐˜ช๐˜ณ๐˜ฆ๐˜ค๐˜ต๐˜ญ๐˜บ ๐˜ช๐˜ฏ๐˜ง๐˜ญ๐˜ถ๐˜ฆ๐˜ฏ๐˜ค๐˜ฆ๐˜ด ๐˜ต๐˜ฉ๐˜ฆ ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ถ๐˜ต๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ข๐˜ญ ๐˜ช๐˜ฏ๐˜ต๐˜ฆ๐˜ฏ๐˜ด๐˜ช๐˜ต๐˜บ ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ ๐˜ง๐˜ฐ๐˜ณ๐˜ธ๐˜ข๐˜ณ๐˜ฅ ๐˜ฑ๐˜ข๐˜ด๐˜ด.

This yields Mixture-of-Depths (MoD), with spectacular results.

โœจ ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€:
๐ŸŽš๏ธ ๐—–๐—ฎ๐—ฝ๐—ฎ๐—ฐ๐—ถ๐˜๐˜† ๐—ฐ๐—ฎ๐—ป ๐—ฏ๐—ฒ ๐˜๐˜‚๐—ป๐—ฒ๐—ฑ ๐—ฎ๐—น๐—น ๐˜๐—ต๐—ฒ ๐˜„๐—ฎ๐˜† ๐—ฑ๐—ผ๐˜„๐—ป ๐˜๐—ผ ๐Ÿญ๐Ÿฎ.๐Ÿฑ% for every second block: thus 87.5% of tokens just skip the block!
๐Ÿš€ For the same training time and performance, >๐Ÿฒ๐Ÿฌ% ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐˜€๐—ฝ๐—ฒ๐—ฒ๐—ฑ!
๐Ÿค ๐—–๐—ฎ๐—ป ๐—ฏ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ฒ๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐— ๐—ถ๐˜…๐˜๐˜‚๐—ฟ๐—ฒ-๐—ผ๐—ณ-๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐˜€ for further improvements.

๐Ÿ“„ ๐—ฃ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—ต๐—ฒ๐—ฟ๐—ฒ ๐Ÿ‘‰ Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (2404.02258)
๐Ÿ“š I added it to my paper collection ๐Ÿ‘‰ m-ric/spinning-up-in-llms-659e698f9dd5a71bd3f579a7
  • 1 reply
ยท
reacted to Sentdex's post with โค๏ธ 11 months ago
view post
Post
Hi, welcome to my first post here!

I am slowly wrangling about 5 years of reddit comments (2015-2020). It's a total of billions samples that can be filtered as comment-reply pairs, chains of discussion, filtered by subreddit, up/down votes, controversy, sentiment, and more.

Any requests or ideas for curated datasets from here? I'll also tinker with uploading the entire dataset potentially in chunks or something, but it's quite a few terabytes in total, so I'll need to break it up still. I have some ideas for datasets I personally want too, but curious if anyone has something they'd really like to see that sounds interesting too.
ยท
reacted to osanseviero's post with ๐Ÿ‘ 11 months ago
view post
Post
Mixture of experts: beware ๐Ÿ›ก๏ธโš”๏ธ

New paper by DeepMind: Buffer Overflow in MoE Buffer Overflow in Mixture of Experts (2402.05526)

The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.

So if in the same batch we have
- User A benign query
- User B malicious query
The response for A might be altered!๐Ÿ˜ฑ

How is this possible?
One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).

This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.

How to mitigate this?
- Randomize batch order (and even run twice if some queries are very sensitive)
- Use a large capacity slack
- Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)

Very cool paper!!
  • 621 replies
ยท
posted an update 11 months ago
reacted to alielfilali01's post with ๐Ÿค— 11 months ago
view post
Post
Hi friends, i'am happy to share with you all a tool that i built a week ago or so, i'am talking here about the "LLM Training Cost Calculator" - a handy tool now available on Hugging Face Spaces! This interactive Gradio app provides an easy-to-use interface for estimating the training costs of large language models (LLMs).

(I've been asked to provide a report about the cost of finetuning each model etc... so i decided to do the lazy job and build a tool for it, Prof later can choose whatever config he likes ๐Ÿ˜†)

๐Ÿ” But Why this is important?
As LLMs continue to grow in size and complexity, understanding the computational and financial requirements is crucial for planning and managing AI projects. I believe this tool simplifies this process, giving you insights into potential expenses based on the number of parameters and tokens in your dataset.

๐ŸŒŸ Features:
- Input the number of parameters (in billions) and tokens (in trillions).
- Adjust for GPU utilization rates and overhead costs.
- Get an instant estimate of your training costs.
+ Choose your GPU (A100 80GB PCle, A100 80GB SXM, V100, H100 SXM, H100 PCle)

๐Ÿ“ˆ Coming Soon:
Plans are in place to expand the calculator's capabilities to include fine-tuning costs for models using LoRA or QLoRA. You'll be able to input a model ID from the Hugging Face Hub, select your fine-tuning strategy, and specify quantization details if using QLoRA.

I believe this tool will be a valuable asset to the AI community, helping to plan and allocate resources more effectively ๐Ÿค—.

Should you have any suggestions or feedback, please don't hesitate to contribute your thoughts in the comments below. Together, we can refine and enhance this resource for all.

๐Ÿ”— Try it here : https://huggingface.co/spaces/Ali-C137/LLM-Training-Cost-Calculator

PS : All thanks to Gradio, Hugging Face and the community ofc ๐Ÿ”ฅ ๐Ÿ˜‰
reacted to awni's post with ๐Ÿ‘โค๏ธ 11 months ago
view post
Post
First HF social post:

pip install -U mlx
  • 2 replies
ยท