Revolutionary Multi-Agent Framework Transforms Personalized Recommendations The future of recommendation systems just got a major upgrade. Researchers from Walmart Global Tech have introduced ARAG (Agentic Retrieval Augmented Generation), a groundbreaking framework that leverages multi-agent collaboration to deliver highly personalized recommendations. >> The Technical Innovation Traditional RAG systems rely on simple cosine similarity and embedding matching, which often miss the nuanced preferences that drive user behavior. ARAG addresses this limitation through a sophisticated four-agent architecture that works in concert: User Understanding Agent: Synthesizes natural language summaries of user preferences from both long-term historical interactions and current session behaviors, creating a comprehensive view of user intent. Natural Language Inference (NLI) Agent: Evaluates semantic alignment between candidate items and inferred user preferences by analyzing textual metadata including titles, descriptions, and reviews. This goes beyond surface-level matching to understand contextual relevance. Context Summary Agent: Processes and condenses findings from the NLI agent, focusing only on items that meet alignment thresholds to create targeted contextual summaries. Item Ranker Agent: Integrates all signals to produce final ranked recommendations, explicitly considering previous session behaviors, relevant history, and purchase likelihood. >> Under the Hood Architecture The system operates through a blackboard-style multi-agent collaboration where agents read from and write to shared structured memory. The workflow begins with standard RAG retrieval to generate an initial recall set, then applies parallel inference where the User Understanding and NLI agents work simultaneously. Cross-agent attention allows the Context Summary Agent to use both user summaries and NLI scores as relevance signals, while the final Item Ranker consumes all agent outputs to generate contextually-aware rankings. >> Impressive Performance Results Testing on Amazon Review datasets across Clothing, Electronics, and Home categories showed remarkable improvements: - 42.1% improvement in NDCG@5 for Clothing - 35.5% improvement in Hit@5 metrics - Consistent outperformance across all product categories compared to recency-based and vanilla RAG approaches The ablation study revealed that each agent contributes incremental value, with the complete system achieving up to 14% additional gains when all components work together. >> Why This Matters ARAG transforms recommendation from a simple retrieval task into a coordinated reasoning process. By separating concerns of user understanding, semantic alignment, context synthesis, and ranking, the framework delivers both accuracy improvements and transparent rationales that enhance user trust.
Recommendation System Optimization
Explore top LinkedIn content from expert professionals.
Summary
Recommendation-system-optimization refers to improving how digital platforms suggest products, services, or content to users by making these suggestions more personalized, diverse, and memory-efficient. Recent advancements include multi-agent frameworks, graph-based memory reduction, personalized messaging, empathetic feedback integration, and ways to handle bias in recommendations.
- Personalize recommendations: Use systems that analyze both user behavior and emotional feedback to deliver suggestions tailored to individual needs and contexts.
- Reduce memory use: Implement graph clustering or similar approaches to group users and items, which helps lower memory requirements without sacrificing accuracy.
- Increase diversity: Apply constraints or penalties during recommendation selection to balance popular and expensive items, ensuring users get a wider variety of choices.
-
-
Recsys models have a major memory problem. This graph-based solution cuts it by 75%. Deep recommender systems face a critical challenge: their embedding tables consume massive amounts of memory due to the sheer number of users and items they need to represent. For example, a user embedding table for a recommendation model at Meta can take up hundreds of GB of memory. The traditional solution is the "hashing trick" - mapping multiple IDs to share the same embedding. But random hashing leads to collisions between dissimilar entities, hurting model performance. Enter GraphHash - a novel approach from researchers at Snap Inc. that leverages graph clustering to group similar users and items based on their interaction patterns. The magic happens through modularity-based clustering, which acts as a computationally efficient proxy for message passing - achieving similar smoothing effects as graph neural networks but at a fraction of the computational cost. The results are impressive: Reduced embedding table size by over 75%, while achieving a 101.52% improvement in recall compared to traditional hashing baselines. It works seamlessly with various industry-scale recommendation models as a plug-and-play solution. Looking ahead, GraphHash opens up exciting possibilities for hybrid approaches blending frequency data with graph clusters - and extending this to cold-start scenarios. The success of this approach suggests similar techniques could benefit other domains facing memory constraints. #recommendersystems #deeplearning #artificialintelligence #ai
-
Below is a diagram of our agentic architecture (well, part of it). See the top-right box: "recommender service"? Let’s talk about that. At Aampe, we split copy personalization into two distinct decisions: ➡️ Which item to recommend ➡️ How to compose the message that delivers it Each calls for a different approach. For item recommendations, we use classical recommender systems: collaborative filtering, content-based ranking, etc. These are built to handle high-cardinality action spaces — often tens or hundreds of thousands of items — by leveraging global similarity structures among users and items. For message personalization, we take a different route. Each user has a dedicated semantic-associative agent that composes messages modularly — choosing tone, value proposition, incentive type, product category, and call to action. These decisions use a variant of Thompson sampling, with beta distributions derived from each user’s response history. Why split the system this way? Sometimes you want to send content without recommending an item — having two separate processes makes that easier. But there are deeper reasons why recommender systems suit item selection and reinforcement learning suits copy composition: 1️⃣ Cardinality. The item space is vast — trial-and-error is inefficient. Recommenders generalize across users/items. Copy has a smaller, more personal space where direct exploration works well. 2️⃣ Objectives. Item recommendations aim at discovery — surfacing new or long-tail content. Copy is about resonance — hitting the right tone based on past response. 3️⃣ Decision structure. Item selection is often a single decision. Copy is modular — interdependent parts that must cohere. Perfect for RL over structured actions. 4️⃣ Hidden dimensions. Item preferences stem from stable traits like taste or relevance. Copy preferences shift quickly and depend on context — ideal for RL’s recency-weighted learning. 5️⃣ Reward density. Item responses are sparse. Every content delivery yields feedback — dense enough to train RL agents, if interpreted correctly. In short: recommenders find cross-user/item patterns in large spaces. RL adapts to each user in real time over structured choices. Aampe uses both — each matched to the decision it’s best for.
-
𝗘𝗺𝗽𝗮𝘁𝗵𝗲𝘁𝗶𝗰 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗲𝗿 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 enhance traditional recommendation algorithms by integrating users’ emotions. While typical systems rely on user ratings to gauge satisfaction, they often miss the reasons behind these feelings. Emotions, which include a variety of feelings like excitement or frustration, offer deeper insights into user experiences. By combining ratings and emotions, recommender systems can develop richer user profiles and provide more personalized, context-aware recommendations based on both past ratings and current emotional states. 𝗜𝗻 𝗲-𝗰𝗼𝗺𝗺𝗲𝗿𝗰𝗲, if a user rates a drama movie 4/5 but feels emotionally drained, future recommendations might include uplifting dramas to balance their experience. Similarly, if a book is rated 3/5 for being too intense, the system might suggest less intense thrillers based on both the rating and emotional feedback. 𝗜𝗻 𝗵𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲, a patient rating a physical therapy session 4/5 but expressing frustration about slow progress might receive motivational messages and suggestions for additional supportive therapies. If a high-intensity workout is rated 5/5 but leaves the user exhausted, the system could recommend a mix of high-intensity and recovery workouts to balance effectiveness. At the recent 𝗔𝗖𝗠 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗲𝗿 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗖𝗼𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 (𝗥𝗲𝗰𝗦𝘆𝘀 𝟮𝟬𝟮𝟰), the paper 𝗧𝗼𝘄𝗮𝗿𝗱𝘀 𝗘𝗺𝗽𝗮𝘁𝗵𝗲𝘁𝗶𝗰 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗲𝗿 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 (which won the best paper award) described an innovative framework called the 𝗘𝗺𝗽𝗮𝘁𝗵𝗲𝘁𝗶𝗰 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗥𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗲𝗿 (𝗘𝗖𝗥) which enhances traditional conversational recommender systems by incorporating empathy. The approach enhances the ReDial recommendations dialogue data set by leveraging GPT-3.5-Turbo to annotate user emotions. It also uses reviews from external resources to create a set of responses. The two key components of ECR are: 𝗘𝗺𝗼𝘁𝗶𝗼𝗻-𝗮𝘄𝗮𝗿𝗲 𝗶𝘁𝗲𝗺 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻: the system maps emotions to entities (e.g., books, movies). Multi-task learning is used to learn user preferences and emotional contexts to create a more holistic user profile. 𝗘𝗺𝗼𝘁𝗶𝗼𝗻-𝗮𝗹𝗶𝗴𝗻𝗲𝗱 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻: the system uses retrieval-augmented prompts to fine-tune pretrained models such as DialoGPT and Llama-2-Chat to retrieve relevant emotional content from the responses database for response generation. It also has user feedback integration to prompt users for explicit feedback when emotions are unclear. ECR also introduces several novel metrics such as Emotion Matching Score (EMS) and Emotion Transition Score (ETS) to measure how well the systems responses align with the users emotions and the ability of the system to positively influence the users emotional state through its recommendations. Paper: https://lnkd.in/ePEbppvY
-
𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧: Your model disproportionately recommends popular/expensive items, reducing diversity. How would you quantify and mitigate popularity and price bias in a recommender system? 𝐏𝐨𝐩𝐮𝐥𝐚𝐫𝐢𝐭𝐲 𝐁𝐢𝐚𝐬 Popular items are shown more frequently than less popular ones. To quantify this, we can calculate metrics like - ✅ Skewness of Item Frequency Distribution: Check the distribution of item exposure (how many times items are recommended to users). A high skew indicates over-representation of popular items. ✅ Popularity Bias Index: This could be a ratio of the proportion of recommendations from the top-N most popular items versus the rest. 𝐏𝐫𝐢𝐜𝐞 𝐁𝐢𝐚𝐬 Expensive items are disproportionately recommended. To quantify price bias, we can - ✅ Price Distribution of Recommended Items: Calculate the average price of recommended items and compare it with the average price of all items in the catalog. ✅ Price-to-CTR Correlation: Measure how price correlates with the click-through rate (CTR). A strong correlation suggests price bias. To address these we can use - 𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐋𝐨𝐬𝐬 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧: Modify the loss function during training to penalize over-prediction of popular items. This can be done by giving inverse popularity weights to the items during model training. 𝐃𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 𝐂𝐨𝐧𝐬𝐭𝐫𝐚𝐢𝐧𝐭𝐬: After the model has made its recommendations, we can enforce a diversity constraint by using a diversity penalty. The system can recombine the top-N recommendations by penalizing duplicate categories, genres, or other similar features. 𝐏𝐫𝐢𝐜𝐞 𝐍𝐨𝐫𝐦𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧: Normalize the predicted score for expensive items so that they don’t dominate the ranking. This can be achieved by applying a log transformation on the price before feeding it into the model, or using a price decaying factor that lowers the predicted score as the price increases. 𝐏𝐫𝐢𝐜𝐞 𝐑𝐞𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠: Adjust the recommendation list by introducing a price constraint after inference, where we ensure that the final list has a mix of low, medium, and high-priced items. This can be done through a cost-based penalty that discourages recommending items above a certain price threshold. 𝗟𝗶𝗸𝗲 for more such content. 𝐒𝐮𝐛𝐬𝐜𝐫𝐢𝐛𝐞 to the FREE substack - https://lnkd.in/g5YDsjex and YT Channel - https://lnkd.in/gttKwJtd to land your next Data Science Role. 𝗙𝗼𝗹𝗹𝗼𝘄 Karun Thankachan for all things Data Science.
-
When developing a recommendation system for industrial settings, it's crucial to account for the constraints posed by extensive item catalogs. A common approach to address the challenges of large item corpora involves employing a two-stage methodology. This approach divides the recommendation process into two distinct phases: candidate retrieval and ranking. This blog, written by machine learning scientists at Expedia Group, shares their insights on harnessing the power of a Two Tower Neural Network architecture to enhance candidate retrieval modeling. The "two tower" architecture has proven successful in various recommendation scenarios across diverse domains, demonstrating particular efficacy in handling vast industrial product catalogs. At its core, the architecture comprises a "query encoder" and an "item encoder." The output of each encoder undergoes interaction via a dot product before being fed into an activation function like SoftMax or Sigmoid. The query encoder learns a representation from features such as search queries, reference items, user historical interactions, or any context relevant to the user and their search. Conversely, the item encoder processes the candidate item, typically representing it through content features like property location, popularity-based attributes, and property amenities in the case of Expedia lodging. The author provides a comprehensive, step-by-step guide on implementing the two-tower neural network in TensorFlow, accompanied by additional techniques to enhance its performance. This resource serves as a valuable reference for those working on recommendation systems. #machinelearning #algorithms #recommendations #retrieval #ranking – – – Check out the "Snacks Weekly on Data Science" podcast and subscribe, where I explain in more detail the concepts discussed in this and future posts: -- Apple Podcast: https://lnkd.in/gj6aPBBY -- Spotify: https://lnkd.in/gKgaMvbh https://lnkd.in/gjuG3xhJ
-
In our journey of building personalized recommendations, we often debate when models should run in real-time vs. batch processing. It completely depends on use case, scalability, and latency that is acceptable. Let me try to simplify it so that you can explain it better to your management - 1) Real-Time Models – When Instant Personalization is Key. This flow is used when recommendations must be generated instantly based on a user’s current actions. Example Use Cases: "You May Also Like" – A user clicks on a product, and recommendations are generated dynamically. Personalized Home Page – When a user logs in, their recommendations are fetched in real time. Dynamic Offers – Based on recent user behavior, a discount or coupon is displayed immediately. This is how it can be implemented if using Amazon Web Services (AWS): 🔹 User Action → A user visits a webpage or clicks on a product. 🔹 API Gateway + Lambda → Triggers an API call to fetch recommendations. 🔹 Model Prediction (SageMaker Endpoint) → If no cached results exist, the model generates new recommendations. 🔹 DynamoDB / Redis Cache → First checks for recent recommendations to reduce latency. 🔹 Response to Frontend → Results are returned and displayed instantly. 2) Batch Processing – Precomputed Recommendations This approach is used when personalization can be precomputed, reducing the need for real-time execution. Example Use Cases: "Your Favorites" (Rule-Based Personalization) – If a user buys from X retailers frequently, precompute recommendations daily. Periodic Email / Push Notifications – Personalized product suggestions for email marketing campaigns. Homepage Personalization (Static User Preferences) – Daily updates to improve page load speed. This is how it can be implemented: 🔹 Daily / Weekly Training Jobs (Glue, SageMaker, EMR) → or you can use dedicated EC2 & Jenkins to process large amounts of data and update recommendations. 🔹 Updated Recommendations Stored (DynamoDB, Redis) 🔹 Precomputed Recommendations Served via API / CloudFront So, if recommendation changes dynamically basis user session, use real time. For predictable updates use batch. Infact, one can use hybrid approach also - Cache precomputed results and fall back on real-time inference when needed. #recommendation #n=1personalisation #datascience #data
-
Interested in the next generation retrieval paradigm or making your recommendations, search, or RAG/LLMs applications 20-30% better? Check out our paper, Retrieval with Learned Similarities, a collaboration with Microsoft Research, accepted as an oral presentation (155 out of 2062 submissions) at WWW 2025! Retrieval plays a fundamental role in recommendation systems, search, and natural language processing tasks by efficiently finding relevant items from billions given a query. Our paper introduces Mixture-of-Logits (MoL) as a universal approximator of similarity functions, unifying recent sparse/dense retrieval, multi-embeddings, advanced neural networks, as well as generative approaches. Beyond theoretical properties, MoL sets new SotA across heterogeneous scenarios, from sequential retrieval on top of Transformers/HSTU backbones, to finetuning language models for RAG/QA. Importantly, this new paradigm efficiently utilizes modern accelerators to achieve dense retrieval-level speed on GPUs, while delivering 20-30% better Hit Rate@50-400 on databases with 100M+ items and making content distribution more democratic (arxiv.org/abs/2306.04039 KDD'23, arxiv.org/abs/2407.13218 CIKM'24). These gains make a compelling case for migrating web-scale vector databases to Retrieval with Learned Similarities (RAILS), and open up exciting research opportunities to support learned similarities. Learn more: 📄 Paper: https://lnkd.in/gUDeRZUp 🔗 GitHub: https://lnkd.in/gvDz4SG8 We have more exciting work ahead; if you are interested in pushing the boundary of foundational technologies together, please talk to Bailu Ding or me at The Web Conference in Sydney or reach out via email! #TheWebConf25 #WWW2025 #TheWebConf #RecSys #RAG #LLMs #NeuralRetrieval
-
Exploring the next frontier in RecSys: How are LLMs redefining Recommendation products? Zooming out a bit from my previous post on fine-tuning LLMs for Recommender Systems tasks (RecSys), I want to provide a brief of the most popular ways of leveraging LLMs for this today more generally. There are mainly three categories. 🔸 𝟭. 𝗟𝗟𝗠-𝗣𝗼𝘄𝗲𝗿𝗲𝗱 𝗥𝗲𝗰𝗦𝘆𝘀: A fascinating shift from traditional methods, this approach employs LLMs directly for recommendation tasks without retraining new models. Instead, it leverages prompts (Liu et al. 2023a; Gao et al. 2023; Dai et al. 2023; Chen 2023) or minor fine-tuning (Zhang et al. 2023; Kang et al. 2023; Bao et al. 2023) to translate RecSys challenges into natural language tasks. By designing prompts for scenarios like rating prediction and sequential recommendation, among others, this method uses few-shot prompting to inject user interaction insights into the model, helping LLMs in capturing user preferences and needs more accurately. 🔸 𝟮. 𝗟𝗟𝗠𝘀 𝗮𝘀 𝗥𝗶𝗰𝗵 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝗼𝘂𝗿𝗰𝗲𝘀: Viewed as a sophisticated feature extractor (Wu et al. 2021; Qiu et al. 2021; Yao et al. 2022; Muhamed et al. 2021; Xiao et al. 2022), this paradigm enriches traditional RecSys with LLM-derived embeddings. By inputting user and item features into LLMs, it fetches nuanced embeddings that traditional models then use for enhanced recommendation accuracy. Moreover, certain methods (Liu et al. 2023b; Wang et al. 2022, 2023) innovate by generating tokens from these features, mining for semantic clues to user preferences that can significantly inform the recommendation process. 🔸 𝟯. 𝗟𝗟𝗠𝘀 𝗮𝘀 𝗚𝘂𝗶𝗱𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀: Stepping into an agent's role, LLMs orchestrate the recommendation pipeline, overseeing everything from data collection to feature engineering and the scoring/ranking mechanism. This model (Andreas 2022; Bao et al. 2023; Hou et al. 2023; Lin et al. 2023; Gao et al. 2023; Friedman et al. 2023) enables LLMs to adapt seamlessly to the recommendation context, managing the intricacies of user interaction and system response to deliver precise recommendations. Have I left anything out? #AI #MachineLearning #Personalization #RecSys #3MinPapers
-
Another very good paper from DeepMind (+Google + University of Illinois Chicago) They address precisely the gap between LLMs and Recommendation Systems we discussed here They do not talk about generic LLM frameworks but focus on particular tasks of Masked Item Modeling (MIM) and Bayesian Personalized Ranking (BPR) and simulate them through LLMs The authors introduce an innovative approach for adapting Large Language Models (LLMs) to new recommendation settings. This involves enhancing the fine-tuning process of LLMs with auxiliary task data samples that simulate traditional training operations of classic recommendation systems using natural language prompts. They introduce highly informative recommendation-task data samples, improving upon existing efforts by simplifying the input/output dynamics— notably, by removing user IDs and enriching user item sequences with item titles for clarity. Utilizing this methodology, they fine-tune the publicly available FLAN-T5-XL (3 billion parameters) and FLAN-T5-Base (223 million parameters) models. This process employs a straightforward multi-task learning framework that integrates their advanced recommendation-task and auxiliary-task data samples. Through rigorous testing across a variety of recommendation scenarios— including retrieval, ranking, and rating predictions—in three distinct areas (Amazon Toys & Games, Beauty, and Sports & Outdoors, see datasets are available https://lnkd.in/gijTg5xJ, would be more interesting to see on production data), the efficacy of their method and its individual components is clearly demonstrated. Notably, in retrieval tasks, their model significantly outperforms both traditional recommendation systems and current LLM-based solutions, including the latest state-of-the-art (SOTA) models, by considerable margins. They briefly mention (Part 6) limits due to LLMs computational costs that makes it hard to use LLMs as 'backbones' for the recommendation systems. I believe it's very important and open topics. Certainly, there is a trend to solve the same tasks with smaller LLMs. Potentially, LLMs can be used only for queries where they will provide the biggest benefits rather than all queries https://lnkd.in/g7ziJEx4