News

Perplexity AI: Stealth Crawling and Privacy, What’s Happening?

Article Highlights:
  • Perplexity AI uses stealth crawling techniques to bypass website directives
  • Non-transparent behavior risks exposing private content
  • Site owners lose control over data accessed by bots
  • Experiments on new domains confirmed restriction bypass
  • New rules have been implemented to block Perplexity’s stealth crawling
Perplexity AI: Stealth Crawling and Privacy, What’s Happening?

Introduction

In recent months, Perplexity AI, an AI-powered answer engine, has sparked debate due to its non-transparent crawling behavior. Several web operators have reported suspicious activity, raising concerns about privacy and trust on the web.

What is stealth crawling?

Stealth crawling refers to techniques where a bot disguises its identity to access website content, ignoring directives set by owners via robots.txt files or firewall rules. In Perplexity AI’s case, frequent changes of user agent and ASN have been observed to evade blocks.

Privacy and trust implications

The web has always relied on transparency and rule-following between crawlers and sites. When a bot ignores directives, it risks exposing private content and undermines trust.

  • Site owners lose control over who accesses their data.
  • Bot transparency is lost, making it hard to distinguish legitimate from illegitimate activity.
  • Trust in the web as a safe and regulated space is weakened.

How Perplexity AI’s behavior was discovered

Some customers reported that, despite blocking Perplexity via robots.txt and WAF rules, the bot still accessed their content. Targeted tests on new, unindexed domains confirmed that Perplexity provided detailed information about protected content, bypassing restrictions.

Experiments and findings

Experiments involved creating brand-new domains with directives preventing any bot access. Nevertheless, querying Perplexity AI yielded detailed responses about protected content, proving the effectiveness of stealth crawling techniques.

Reactions and countermeasures

Given this evidence, Perplexity was removed from the verified bot list and new rules were implemented to block stealth crawling. The debate on bot transparency and data protection is more relevant than ever.

"Trust is the foundation of the web, and crawler transparency is essential to maintain it."

Original article on Perplexity

Introduction In recent months, Perplexity AI, an AI-powered answer engine, has sparked debate due to its non-transparent crawling behavior. Several web [...] Evol Magazine