Agentic Commerce Pt. 1: What AI Shopping Agents Can Buy Right Now

AI shopping agents can now buy groceries, laptops, and more. Here's a look at what works today, why the demos impress, and where execution still breaks.

Table of contents

TLDR: Agentic commerce is here, but mostly in narrow, assisted forms. Agents can already help users discover products, compare options, build baskets, and complete some purchases, but reliable autonomous buying is still limited to low-risk, bounded flows.

In a January 2025 livestream, an OpenAI researcher held up a photo of a handwritten grocery list and handed it to the company’s new agent, Operator.

The agent read the list, opened Instacart, built the basket, and booked a delivery slot. A person gave the task, and software went shopping.

That clip did the rounds because it showed the big promise of an agent acting on your behalf. Shopping is one of the obvious tests for agents because it turns intent into action.

This is the first piece in a series on the seven layers of agentic commerce, starting with execution. From a retail perspective, an agent reads what you want, opens a store, fills a basket, and pays. The question for this layer is: what can these agents buy today, and how well do they do it?

The answer has two halves. 

Demand signals have climbed fast. AI-driven traffic to US retail sites rose 693.4% year over year during the 2025 holiday season, and AI referrals converted 31% better than other sources. Salesforce estimated that AI and agents influenced 20% of global online holiday sales. 

That doesn’t mean agents completed all those purchases, but it shows that shoppers are already bringing AI into the buying journey.

In the UK, the share of shoppers using AI assistants doubled from 12% to 28% in a year, with 44% now saying they’d let an agent handle the whole process once they’ve set a budget and brand. 

But execution itself is still brittle. On a careful shopping benchmark, the strongest model scored 17.76% against human experts’ 30.02%, and passed safety checks only 35.42% of the time. So there’s still a wide distance between the demo and the daily experience.

A person gestures toward a small robot beside a grocery basket and laptop, illustrating an AI agent handling shopping on a user's behalf

What can retail shopping agents buy today?

Execution within agentic commerce has moved from research preview to live product in under a year. Here’s where it stands.

ToolWhat it does nowAgentic depthCurrent limits
ChatGPT Instant CheckoutLets users buy from eligible Etsy sellers inside ChatGPT, with Shopify support plannedCheckout-nativeSingle-item purchases, US only
Amazon Buy for MeLets Amazon’s agent buy selected products from outside brand sites inside the Amazon appApp-mediated checkoutSelect US customers, selected brands and products, no promo codes, beta
Perplexity Instant BuyLets users search for products and buy from merchants directly on PerplexityCheckout-nativeUS users only, eligible products only
Perplexity CometUses an agentic browser to research products and help with shopping tasks across the webBrowser-drivenSite blocking, checkout friction, safety and reliability issues
Operator / ChatGPT agentDrives a browser to shop across websitesBrowser-drivenSlow, gets stuck, sometimes blocked by sites
Instacart in ChatGPTLets users browse groceries, build a cart, and check out inside ChatGPTCheckout-nativeGrocery only, available through supported retailers and user accounts
DoorDash in ChatGPTTurns recipe ideas into grocery lists and sends users to DoorDash checkoutApp-mediated checkoutGrocery only, select users at launch

Across each platform, the agent handles discovery, basket, and payment, while the merchant keeps fulfilment and returns. For now, most flows cap at one item per order.

The fridge that orders its own milk

The fridge that notices low milk and reorders it has been the stock demo of automated shopping for a decade. The shipped version is more modest.

Amazon’s Auto Buy places an order when a price drops below a threshold you set, and Subscribe & Save runs on a fixed schedule. Neither one ‘reasons’ per se—both simply follow a rule you wrote in advance.

Groceries are the natural first home for execution because it’s a repeat purchase. You buy the same milk, eggs, and coffee every week, so the agent has little to get wrong and a clear record to copy. 

That’s why Instacart and DoorDash were among the first to wire recipe-to-cart flows into ChatGPT. Repeat purchases give an agent a safe place to start.

UK shoppers have noticed. Nearly a quarter (23%) expect at least 10% of their purchases to be AI-driven within a year, and 46% would let an agent switch brands for a better-value option. The appetite has arrived ahead of the plumbing.

Broken execution

Such demos pull attention because they compress effort. Ask an agent to “find a work laptop under £800 with 16GB of RAM and good battery life,” and it scans more listings than a person would sift through by hand. The promise is search without the slog, and a basket that fills itself.

But it’s early days, so these agents aren’t perfect. Tasks sometimes run slowly, and the agent often gets stuck part-way through. Hand it your card and you’re trusting it not to buy 1,000 pairs of socks instead of 10.

On WebMall, a four-shop comparison test, the strongest agent handled add-to-cart and checkout tasks without trouble but completed under 65% of the harder jobs, like finding the cheapest offer across shops or reading vague requirements. On DeepShop, the top system reached only 20% on hard queries. 

The benchmarks don’t all measure the same thing, but they point in the same direction: agents do better with bounded tasks and worse when the purchase requires judgment, substitution, compatibility checks, or safety awareness.

Reliability is climbing: the length of tasks an agent can finish at even odds has been doubling every seven months (perhaps a new Moore’s Law for AI?). 

But for now, a model that succeeds nine times in ten and fails unpredictably on the tenth makes a useful assistant and a poor autonomous buyer, because that tenth time could involve a $10,000 purchase mistake.

A person observing a sequential flow of abstract screens leading to a checkmark, illustrating the steps of an agentic shopping journey from browsing to completed purchase

What agentic commerce execution needs next

A demo runs well on a controlled stage with one cooperative store. But production agents need to run on millions of different storefronts, each with its own buttons, login walls, and checkout quirks.

For an agent to buy reliably across all of them, the stores themselves have to become readable by machines. The agent can’t carry the whole burden alone.

That’s the next layer: infrastructure.

Get a free audit

Book a 30-minute call to see where AI could help your organisation.