Dokyoon

leeloolee

Eruly

AI & ML interests

Recent Activity

upvoted a paper about 1 hour ago

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

reacted to singhsidhukuldeep's post with 👀 about 22 hours ago

Exciting breakthrough in e-commerce recommendation systems! Walmart Global Tech researchers have developed a novel Triple Modality Fusion (TMF) framework that revolutionizes how we make product recommendations. >> Key Innovation The framework ingeniously combines three distinct data types: - Visual data to capture product aesthetics and context - Textual information for detailed product features - Graph data to understand complex user-item relationships >> Technical Architecture The system leverages a Large Language Model (Llama2-7B) as its backbone and introduces several sophisticated components: Modality Fusion Module - All-Modality Self-Attention (AMSA) for unified representation - Cross-Modality Attention (CMA) mechanism for deep feature integration - Custom FFN adapters to align different modality embeddings Advanced Training Strategy - Curriculum learning approach with three complexity levels - Parameter-Efficient Fine-Tuning using LoRA - Special token system for behavior and item representation >> Real-World Impact The results are remarkable: - 38.25% improvement in Electronics recommendations - 43.09% boost in Sports category accuracy - Significantly higher human evaluation scores compared to traditional methods Currently deployed in Walmart's production environment, this research demonstrates how combining multiple data modalities with advanced LLM architectures can dramatically improve recommendation accuracy and user satisfaction.

upvoted a paper 14 days ago

GUI Agents: A Survey

View all activity

Organizations

leeloolee's activity

upvoted a paper about 1 hour ago

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models

Paper • 2501.03124 • Published 1 day ago • 6

upvoted a paper 14 days ago

GUI Agents: A Survey

Paper • 2412.13501 • Published 21 days ago • 23

upvoted a paper 20 days ago

Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published 27 days ago • 42

upvoted a paper 22 days ago

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 16

upvoted a collection 22 days ago

Multimodal-SAE

Collection

The collection of the sae that hooked on llava • 4 items • Updated 2 days ago • 5

upvoted a collection 23 days ago

GUI agents

Collection

A collection of papers on GUI agents • 3 items • Updated 25 days ago • 5

upvoted a paper 28 days ago

Granite Guardian

Paper • 2412.07724 • Published 28 days ago • 18

upvoted a paper 30 days ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 121

upvoted an article about 1 month ago

Article

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

•

Nov 19, 2024

• 11

upvoted a paper about 1 month ago

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Paper • 2411.14405 • Published Nov 21, 2024 • 58

upvoted an article about 2 months ago

Article

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

•

Nov 21, 2024

• 35

upvoted 3 papers about 2 months ago

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published Nov 9, 2024 • 45

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Paper • 2411.07461 • Published Nov 12, 2024 • 22

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published Nov 7, 2024 • 13

upvoted 2 papers 2 months ago

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Paper • 2410.20474 • Published Oct 27, 2024 • 14

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Paper • 2410.15999 • Published Oct 21, 2024 • 19

upvoted 4 papers 3 months ago

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45