Exciting breakthrough in e-commerce recommendation systems! Walmart Global Tech researchers have developed a novel Triple Modality Fusion (TMF) framework that revolutionizes how we make product recommendations.
>> Key Innovation The framework ingeniously combines three distinct data types: - Visual data to capture product aesthetics and context - Textual information for detailed product features - Graph data to understand complex user-item relationships
>> Technical Architecture The system leverages a Large Language Model (Llama2-7B) as its backbone and introduces several sophisticated components:
Modality Fusion Module - All-Modality Self-Attention (AMSA) for unified representation - Cross-Modality Attention (CMA) mechanism for deep feature integration - Custom FFN adapters to align different modality embeddings
Advanced Training Strategy - Curriculum learning approach with three complexity levels - Parameter-Efficient Fine-Tuning using LoRA - Special token system for behavior and item representation
>> Real-World Impact The results are remarkable: - 38.25% improvement in Electronics recommendations - 43.09% boost in Sports category accuracy - Significantly higher human evaluation scores compared to traditional methods
Currently deployed in Walmart's production environment, this research demonstrates how combining multiple data modalities with advanced LLM architectures can dramatically improve recommendation accuracy and user satisfaction.
๐ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
๐ด๐ป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
๐ ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic.
๐ค ๐๐๐ ๐ป๐ผ๐ ๐๐ฒ ๐ฑ๐ผ๐ป'๐ ๐ป๐ฒ๐ฒ๐ฑ ๐ต๐๐ด๐ฒ ๐ฟ๐ฒ๐ฝ๐ผ๐ ๐ฎ๐ป๐๐บ๐ผ๐ฟ๐ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
โก ๐๐'๐ ๐๐ถ๐ป๐, ๐๐ฒ๐ ๐ฝ๐ผ๐๐ฒ๐ฟ๐ณ๐๐น: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157. I would be happy to receive your feedback.
๐ RAGOndevice: High-Performance Local AI Document Analysis Assistant ๐ซ Core Value RAGOndevice is a high-performance AI system running locally without cloud dependency. Using CohereForAI's optimized 7B model, it enables professional-grade document analysis on standard PCs. โจ ๐ Ondevice AI Advantages 1. ๐ Efficient Resource Utilization
๐ฏ Optimized 7B Model: Runs on standard PCs โก Local Processing: Instant response without cloud ๐ป Low-Spec Compatible: Performs well on regular GPUs ๐ Optimized Memory: Ensures stable operation
2. ๐ก๏ธ Data Security & Cost Efficiency
๐ Complete Privacy: No external data transmission ๐ Offline Operation: No internet required ๐ฐ No Subscription: One-time installation โ๏ธ Resource Optimization: Uses existing hardware
๐ฎ Key Features 1. ๐ Powerful Document Analysis
๐ Multi-Format Support: TXT, CSV, PDF, Parquet ๐ง Intelligent Analysis: Automatic structure recognition ๐๏ธ OCR Support: Advanced PDF text extraction ๐ฌ Real-time Chat: Natural language interaction
2. ๐ Local RAG System
๐ฏ Efficient Search: TF-IDF based local search ๐งฉ Context Understanding: Accurate information retrieval ๐ Wikipedia Integration: Rich background knowledge
๐ฏ Use Cases
๐ข Enterprise: Secure confidential document processing ๐ฌ Personal Research: Private data analysis ๐ Education: Personal learning material analysis ๐ป Development: Local codebase analysis
โญ Differentiators
๐โโ๏ธ Independent Operation: Zero cloud dependency โก Instant Response: No network latency ๐ Complete Security: Full data control ๐ Cost Efficiency: No ongoing costs
๐ฎ Future Plans
๐ Enhanced model optimization ๐ Local knowledge base expansion โก Hardware optimization ๐ Extended file support
๐ RAGOndevice democratizes high-performance AI, providing the optimal local AI solution for security-sensitive environments. ๐
๐ฅ Power of Local AI: Experience enterprise-grade AI capabilities right on your device!
After some heated discussion ๐ฅ, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐ฅ