Back to Blog
3 min read

Which AI engine actually knows your product features? A study of 270k+ business queries

Using XFunnel's Hallucination Score monitoring across 437 companies, we tracked which AI engines accurately represent verified business capabilities like SOC2 compliance and service areas - the results reveal critical gaps in AI business intelligence.

Which AI engine actually knows your product features? A study of 270k+ business queries

Which AI Search Engine Actually Knows Your Product Features?

We tested 8 major AI engines with 270,000+ questions about real company features to see which ones can correctly confirm what businesses actually offer


The Bottom Line

Using data from XFunnel's Hallucination Score monitoring system, we tracked 270,907 queries about specific business features across 437 companies. These weren't random questions - they were targeted queries about verified company capabilities like "Is Company X SOC2 compliant?" or "Does Company Y deliver to the UAE?" when we know the answer is definitively YES.

The results? AI engines only correctly confirm these verified business facts 69% of the time. Microsoft Copilot performed best at 84%, Grok was close at 83%, and Perplexity showed impressive improvement at 78%. Meanwhile, ChatGPT (w/o Browsing) dropped to 67%, and Claude struggled at just 38%.

The business impact is huge: there's a 46-percentage-point difference between the best and worst performing engines when it comes to accurately representing what your company actually offers.


How We Actually Tested This

Our Hallucination Score KPI

Here's how our Hallucination Score monitoring works: We continuously ask AI engines specific questions about our clients' verified business capabilities (like "Is Company X SOC2 compliant?" or "Does Company Y offer API access?" when we know for certain they do). We then track how often each AI correctly confirms these known facts as part of each company's ongoing Hallucination Score.

The three possible responses:

  • YES → Got it right! ✅
  • "I don't know" → Playing it safe, but not helpful 🤷‍♀️
  • NO → Wrong answer (this is the hallucination we're worried about) ❌

Why this matters for businesses: When potential customers ask AI engines about your company's capabilities - whether you're SOC2 compliant, which countries you serve, what integrations you offer - you need to know which engines are giving accurate answers. Our Hallucination Score helps companies understand where they might be losing customers due to AI misinformation or AI uncertainty about their actual capabilities.


Leaderboard (August 2025)

Hallucination Score Results - Which AI Engines Accurately Represent Your Business?

AI Engine Accuracy Leaderboard showing VFAR scores

🥇 The Accuracy Champions (80%+)

1. Microsoft Copilot - 84%

  • Super consistent performance month after month
  • Slightly improving over time (+0.4%)
  • What makes it good: Copilot does a great job finding reliable sources and double-checking information before giving you an answer.

2. Grok - 83% What makes it good: Grok gets frequent updates and has really good real-time search capabilities, which helps it stay accurate with current information.

🥈 The Strong Performers (70-80%)

3. Perplexity - 78%Most Improved

  • Started at 60% in April, now up to 78%!
  • Had a particularly good August (+5%)
  • What makes it good: Perplexity is really focused on finding current information from the web and showing you exactly where it got each fact.

4. ChatGPT+Browsing - 76%

  • Bounced back strong after some early hiccups when it first launched
  • What makes it good: When ChatGPT can browse the web, it gets much more accurate and up-to-date information.

5. Google AI Mode - 74%

  • Steady, small improvements each month
  • What makes it good: Google keeps tweaking how their AI search results work, and it's gradually getting better.

6. Google Gemini - 73%

  • Had a rough August (dropped 7% that month)
  • The challenge: Gemini seems a bit inconsistent month to month.

🥉 The Struggling Ones (60-70%)

7. ChatGPT (w/o Browsing) - 67% ⚠️ Biggest Concern

  • Dropped from 83% in April to 67% in August-that's a big slide!
  • The problem: Without being able to browse the web, ChatGPT is working with outdated information and can't fact-check itself.

🚨 Needs Major Improvement (<60%)

8. Claude - 38%

  • Says "I don't know" 59% of the time (even when it should know the answer)
  • The challenge: Claude is extremely cautious and would rather not answer than risk being wrong. While this prevents bad information, it's not very helpful when you need actual answers.

What These Numbers Really Mean

The overall average: AI engines only confirm known company features 69% of the time

Here's how each AI typically responds when asked about features that companies definitely offer:

  • Copilot: Confirms the feature 84% of the time, says "I don't know" 10%, gives wrong info 6% → Confident and well-informed about businesses
  • Perplexity: Confirms the feature 78% of the time, says "I don't know" 10%, gives wrong info 12% → Pretty good and getting better at business intelligence
  • Claude: Confirms the feature only 38% of the time, says "I don't know" 59%, gives wrong info 3% → Overly cautious about stating business facts

Each AI has a different approach to business information - Copilot is confident about company data, Perplexity digs for current sources, and Claude avoids making any definitive statements about businesses.


Who's Most Reliable Month to Month?

  • Most consistent: Copilot-not only the best accuracy, but you can count on it staying that way
  • Moderately consistent: ChatGPT and Gemini have some ups and downs but nothing too crazy
  • All over the place (but improving): Perplexity and ChatGPT+Browsing are still finding their groove, but the trend is upward

Why consistency matters: If customers are researching your business through AI, you want that AI to have reliable, up-to-date information about your company month after month.


What Happened in 2025

Perplexity: The amazing comeback story Started the year at 60% in April, climbed all the way to 78% by August. They really focused on getting better at finding current information and showing you where it came from.

ChatGPT (w/o Browsing): The worrying decline in business knowledge Started strong at 83% in April but dropped to 67% by August. This shows how quickly AI can become outdated about company offerings without access to current business information online.

ChatGPT+Browsing: The recovery Had some early problems when it launched, but bounced back to a solid 76% as OpenAI worked out the kinks.

Gemini: The roller coaster Actually hit 80% in July but then dropped 7% in August. Google is still trying to get their AI search working consistently.

Google AI Mode: The steady improver Nothing dramatic, just small improvements month after month.

Grok: The consistent performer Stayed strong at 83% while constantly releasing new features and updates.


The Bottom Line

Here's where things stand for businesses in 2025:

  • Best for product knowledge: Microsoft Copilot and Grok actually know what companies offer
  • Getting much better: Perplexity and ChatGPT+Browsing are improving their business intelligence
  • Missing the mark: ChatGPT (w/o Browsing) is increasingly out of touch with current company offerings
  • Playing it too safe: Claude says "I don't know" way too often, even about basic company features

As more customers turn to AI to research products and services, monitoring your Hallucination Score across different engines becomes critical for business growth - especially given that 77% of businesses are concerned about AI hallucinations. The engines that accurately represent your capabilities will drive discovery and sales, while those with poor scores may be costing you customers who can't get accurate information about what you actually offer.

This is why 437 companies now use XFunnel's Hallucination Score monitoring - to track exactly how AI engines represent their business and where they might be losing potential customers to misinformation or AI uncertainty.


About this analysis

This analysis is based on real data from XFunnel's Hallucination Score monitoring service, which tracks how accurately AI engines represent our clients' verified business capabilities. We continuously monitor 437 companies across thousands of specific queries about their confirmed features, compliance status, service areas, and product capabilities.

This isn't academic research - it's real business intelligence from companies actively monitoring how AI engines represent their businesses to potential customers. If you're wondering whether AI engines accurately represent your company's actual capabilities, this data reveals exactly where the gaps are and which engines your customers can trust.


Neri Bluman

About the Author

Neri Bluman is the co-founder of xfunnel.ai.
You can follow him on LinkedIn.