Building Bechef's Smart Shopping List Classifier: Machine Learning That Saves You Steps

April 12, 2025

How We Built Bechef's Smart Shopping List Classifier: Machine Learning That Saves You Steps

The Problem: Zigzagging Through Grocery Stores Costs You Time

Have you ever found yourself walking back and forth across a grocery store because your shopping list wasn't organized by aisle? Our data shows the average shopper backtracks through the same aisles 4-6 times during a typical shopping trip, adding 15-20 minutes of frustration to every grocery run.

At Bechef, we wanted to solve this all-too-common "zigzag dance" once and for all. Not just for convenience, but to give you back precious time.

The problem sounds simple: classify grocery items into store categories (Milk → Dairy, Tomatoes → Fresh Produce, etc.). But creating a solution that works across different stores, maintains consistent categorization, and handles thousands of possible ingredients proved to be a fascinating engineering challenge.

Why Not Just Use LLMs? A Technical Breakdown

It's 2025, so naturally, large language models were our first thought. We ran a two-week proof-of-concept using GPT-4.5 Turbo. While the results were impressive in isolation, we quickly identified several critical limitations:

Non-deterministic outputs: LLMs aren't deterministic - ask them to categorize "peanut butter" five times and you might get "spreads," "breakfast items," "nut products," "condiments," and "baking ingredients." This inconsistency creates a confusing user experience.
Latency & cost inefficiency: Making API calls for every item added to shopping lists would introduce 200-500ms delays and cost approximately $0.
Cross-store consistency challenges: Different store layouts require a standardized categorization system that can be mapped to various retail environments.

Finding a Better Solution: The Engineering Approach

After consulting with Tom Strange, creator of the ingredient-parser library, we decided to explore fasttext, a lightweight and efficient text classification tool. This would allow us to:

Train once and deploy everywhere
Achieve <1ms inference time per ingredient
Provide consistent categorization

Building the Dataset: The Foundation of Accuracy

To build a robust classifier, we needed comprehensive training data representing ingredients across different cuisines, cultural contexts, and naming conventions. We found a rich dataset of recipe ingredients that included items from diverse backgrounds.

FastText's training format is refreshingly straightforward, requiring labeled examples in this structure:

__label__categoryn text of the item to classify

Our engineering challenge was transforming messy ingredient lists into this clean, structured format. We built a custom preprocessing pipeline that:

Normalized ingredient text (removing quantities, units, etc.)
Deduplicated similar entries
Standardized formatting
Applied consistent labeling

The Labeling Challenge: Where AI Meets Human Expertise

Manually labeling thousands of ingredients would have been prohibitively tedious, so we leveraged AI to streamline the process. We extracted unique ingredients from our dataset, processed them in batches, and used large language models (LLMs) to suggest preliminary category labels.

This approach significantly accelerated our workflow, but it wasn’t without challenges — the AI occasionally produced inconsistent or inaccurate classifications. To address this, we conducted multiple verification passes and incorporated manual corrections. After a few rounds of refinement and human review, we were confident in the quality of our dataset. It was time to start building.

Training the Model: Technical Implementation Details

After several iterations of data cleaning, we created a final dataset of 6,885 labeled ingredients split into:

4,590 training examples (66.7%)
2,295 validation samples (33.3%)

Training the model was straightforward, with FastText's parameters requiring minimal tuning.

Results and Performance Analysis

Testing on our validation dataset yielded promising results:

Number of samples:  2,295
Precision@1:        0.705
Recall@1:           0.705

This means our model correctly categorizes ingredients about 70.5% of the time at first attempt—not perfect, but highly practical for real-world use. Furthermore, we discovered one amusing quirk: when given an empty string, the model confidently predicts "wines-and-alcohol".

An interesting technical note: precision and recall are identical because our model always makes exactly one prediction per ingredient in this implementation.

Classification accuracy by category

Try It Yourself: The Bechef Difference

This classifier is now fully integrated into the Bechef app and works seamlessly with Bechef's existing features:

Recipe collection: Gather recipes from blogs, TikTok, Instagram, and more
Clean formatting: Convert any recipe format into structured, easy-to-follow steps
Creator attribution: Every recipe maintains a link to its original creator
Smart organization: Arrange recipes into themed cookbooks for meal planning
Social sharing: Collaborate on cookbook collections with friends and family
Calendar planning: Schedule meals across your week or month

In a world where grocery store layouts change frequently and product categories can be ambiguous, our lightweight classifier provides consistent organization that works across different stores and shopping styles.

The next time you're planning meals in Bechef, appreciate that little bit of machine learning magic quietly organizing your shopping list—saving you time, steps, and frustration in the grocery aisles.

Download Bechef today and experience the difference a smart shopping list can make. Your future self, standing confidently in aisle 6 without needing to backtrack to aisle 2, will thank you.