Dhruv's picture
2 1

Dhruv PRO

dhruv3006
Ā·

AI & ML interests

None yet

Recent Activity

reacted to singhsidhukuldeep's post with šŸ‘ about 11 hours ago
Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG) Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG). Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference. Technical Deep Dive: - CAG preloads external knowledge and precomputes KV caches, storing them for future use - The system processes documents only once, regardless of subsequent query volume - During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation - The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation Performance Highlights: - Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems - Demonstrated up to 40x faster generation times compared to traditional approaches - Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks Why This Matters: The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.
liked a Space 5 days ago
black-forest-labs/FLUX.1-Fill-dev
View all activity

Organizations

MLX Community's profile picture Hugging Face Discord Community's profile picture

dhruv3006's activity

reacted to singhsidhukuldeep's post with šŸ‘ about 11 hours ago
view post
Post
2418
Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.
reacted to reach-vb's post with šŸ”„ 11 days ago
view post
Post
3613
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! šŸ”„
reacted to singhsidhukuldeep's post with šŸ”„ 13 days ago
view post
Post
2173
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

šŸš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512Ɨ512 pixels with 14Ɨ14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

āš”ļø Under The Hood:
- Multi-stage training process with progressive resolution scaling (224ā†’384ā†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

šŸ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

šŸŽÆ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!