Prithiv Sakthi's picture

Prithiv Sakthi

prithivMLmods

AI & ML interests

computer vision, multimodality, realism engine adapters @starngerzonehf

Recent Activity

updated a model about 2 hours ago
prithivMLmods/Triangulum-10B
updated a model about 4 hours ago
strangerzonehf/Flux-Ultimate-LoRA-Collection
View all activity

Articles

Organizations

Stanford AI's profile picture DataScienceEngineering's profile picture AI FILMS's profile picture Samsung Electronics's profile picture MISATO-dataset's profile picture GEM benchmark's profile picture OpenGVLab's profile picture MusicAI's profile picture BigScience Biomedical Datasets's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture AMD's profile picture lora concepts library's profile picture Platzi Community's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture Université Dauphine-PSL's profile picture Keras Dreambooth Event's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture The Waifu Research Department's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture AI Tamil Nadu's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture LocalLLaMA's profile picture Major TOM's profile picture MLX Community's profile picture C4AI Community's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture Dataset Tools's profile picture Nerdy Face's profile picture Stranger Zone's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture

prithivMLmods's activity

reacted to MonsterMMORPG's post with ❤️ about 8 hours ago
view post
Post
403
SANA: Ultra HD Fast Text to Image Model from NVIDIA Step by Step Tutorial on Windows, Cloud & Kaggle — Generate 2048x2048 Images

Below is YouTube link for step by step tutorial and a 1-Click to installer having very advanced Gradio APP to use newest Text-to-Image SANA Model on your Windows PC locally and also on cloud services such as Massed Compute, RunPod and free Kaggle.

https://youtu.be/KW-MHmoNcqo

This above tutorial covers the newest SANA 2K model and I predict SANA 4K model will be published as well. Sana 2K model is 4 MegaPixel so it can generate the following aspect ratio and resolutions very well:

“1:1”: (2048, 2048), “4:3”: (2304, 1792), “3:4”: (1792, 2304),
“3:2”: (2432, 1664), “2:3”: (1664, 2432), “16:9”: (2688, 1536),
“9:16”: (1536, 2688), “21:9”: (3072, 1280), “9:21”: (1280, 3072),
“4:5”: (1792, 2240), “5:4”: (2240, 1792)

I have developed an amazing Gradio app with so many new features :

VAE auto offloading to reduce VRAM usage significantly which is not exists on official pipeline

Gradio APP built upon official pipeline with improvements so works perfect

Batch size working perfect

Number of images working perfect

Multi-line prompting working perfect

Aspect ratios for both 1K and 2K models working perfect

Randomized seed working perfect

1-Click installers for Windows (using Python 3.10 and VENV — isolated), RunPod, Massed Compute and even a free Kaggle account notebook

With proper latest libraries working perfect speed on Windows too

Automatically properly saving every generated image into accurate folder

🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ⤵️
▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-116474081

🔗 SECourses Official Discord 9500+ Members ⤵️
▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

  • 2 replies
·
reacted to lamhieu's post with 🧠 about 13 hours ago
view post
Post
602
🚀 Unlock the power of a completely free, unlimited multilingual API!
🌐 The Lightweight Embeddings API offers state-of-the-art text and image embeddings, advanced reranking, and seamless support for over 100 languages — no limits, no restrictions.
🌟 Try it now: lamhieu/lightweight-embeddings
reacted to clem's post with 🤗 1 day ago
reacted to Xenova's post with 🔥 3 days ago
posted an update 4 days ago
view post
Post
3026
Triangulum Catalogued 🔥💫

🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF
  • 2 replies
·
reacted to merve's post with 🚀 4 days ago
view post
Post
3899
supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!

Here's our blog for you to get started https://huggingface.co/blog/smolagents
reacted to davidberenstein1957's post with 🧠 5 days ago
reacted to nyuuzyou's post with 🚀 7 days ago
view post
Post
2137
🎨 KLING AI Dataset - nyuuzyou/klingai

A collection of 12,782 AI-generated media items featuring:
- High-quality image and video generations at various resolutions
- Complete metadata including user IDs, prompts, and generation parameters
- Content generated using text-to-image, text-to-video, and image-to-video modalities
- Full generation settings and technical parameters
reacted to nicolay-r's post with ❤️ 8 days ago
view post
Post
2073
📢 Deligted to share the most recent milestone on quick deployment of Named Entity Recognition (NER) in Gen-AI powered systems.

Releasing the bulk-ner 0.25.0 which represent a tiny framework that would save you time for deploing NER with any model.

💎 Why is this important? In the era of GenAI the handling out textual output might be challenging. Instead, recognizing named-entities via domain-oriented systems for your donwstream LLM would be preferable option.

📦: https://pypi.org/project/bulk-ner/0.25.0/
🌟: https://github.com/nicolay-r/bulk-ner

I noticed that the direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

To cope with these problems, in version 0.25.0 I made a huge steps forward by providing:
✅ 🐍 Python API support: see screenshot below for a quick deployment (see screenshot below 📸)
✅ 🪶 No-string: dependencies are now clear, so it is purely Python implementation for API calls.
✅ 👌 Simplified output formatting: we use lists to represent texts with inner lists that refer to annotated objects (see screenshot below 📸)

📒 We have a colab for a quick start here (or screenshot for bash / Python API 📸)
https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb

👏 The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit
reacted to AdinaY's post with 🚀 9 days ago
reacted to AdinaY's post with 🚀 10 days ago
view post
Post
2899
QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
reacted to merve's post with 🔥 11 days ago
reacted to sayakpaul's post with 🔥 12 days ago
reacted to nicolay-r's post with 🧠 13 days ago
view post
Post
2253
📢 So far I noticed that 🧠 reasoning with llm 🤖 in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.🔴 Third-party framework installation
2.🔴 Text chunking
3.🔴 support of meta-annotation like spans / objects / etc.

💎 To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. 🎊

https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python 🐍 no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.

It supports 👨‍💻 API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python 🐍
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
📘 https://github.com/nicolay-r/bulk-translate/wiki

All you have to do is to provide iterator of texts, where each text:
1. ✅ String object
2. ✅ List of strings and nested lists that represent spans (value + any ID data).

🤖 By default I provide a wrapper over googletrans which you can override with your own 🔥
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
posted an update 14 days ago
reacted to Jaward's post with 🔥 15 days ago
reacted to suayptalha's post with 👍 15 days ago
view post
Post
1621
🚀 FastLlama Series is Live!

🦾 Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.

🤖 Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.

💻 Its compact size makes it versatile for a wide range of applications!
💬 Chat with the model:
🔗 Chat Link: suayptalha/Chat-with-FastLlama
🔗 Model Link: suayptalha/FastLlama-3.2-1B-Instruct
reacted to m-ric's post with 🔥 16 days ago
view post
Post
1935
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co/blog/modernbert
  • 1 reply
·
reacted to anton-l's post with 🚀 16 days ago
view post
Post
2114
Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
🛠️ carefully extracting math data from Common Crawl;
🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! 🚀
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
posted an update 16 days ago
view post
Post
2469
Qwen2VL Models: Vision and Language Processing 🍉

📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]

Colab Demo: prithivMLmods/Qwen2-VL-OCR-2B-Instruct

❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.

🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.

📄PDFs are rendered using the ReportLab software library toolkit.

🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct

🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing

📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f

.
.
.
@prithivMLmods 🤗
  • 1 reply
·