Want More? Follow Teng Yan and Chain of Thought on X

Join our Telegram research channel and Discord community.

Subscribe for timely research and insights—delivered to your inbox.

TL;DR

Verification of outputs are critical to ensure AI performs reliably.
Mira is building a layer-1 network that delivers trustless, scalable, and accurate verification of AI outputs.
Reducing hallucinations and bias at the same time is a delicate balancing act. Mira does this by harnessing the collective wisdom of AI models.
Mira’s verification system is built on two fundamental design principles: (1) Break AI outputs into smaller, easily verifiable pieces, and (2) Use an ensemble of models to verify each piece.
Mira’s initial market size is tied to LLMOps, but its total addressable market could expand to all of AI because every AI application will need more reliable outputs.
Mira is already powering AI verification for several AI apps with 200K+ users.
Mira’s ultimate goal is to become a synthetic foundation model, seamlessly plugging into every major provider to deliver pre-verified outputs through a single API.

Hallucinations: an experience involving the apparent perception of something not present.

Andrej Karpathy calls AI “dream machines.” He believes that hallucinations—those moments when AI confidently generates things that aren’t real—are a feature, not a bug. It’s futile to try to eliminate them entirely. And honestly, there’s something poetic about that.

# On the "hallucination problem"

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

We direct their dreams with prompts. The prompts start the dream, and based on the… x.com/i/web/status/1…
— #Andrej Karpathy (#@karpathy)
1:35 AM • Dec 9, 2023

Large language models (LLMs) are an artist, a creator. It dreams in code, generates ideas out of thin air, and spins meaning from data. But for AI to move from beautiful daydreams to practical, everyday applications, we must rein in those hallucinations.

Error rates for LLMs remain high across many tasks—often hovering around 30%. At that level, LLMs still require a human-in-the-loop to reach a usable standard of accuracy.

But when we hit that elusive 99.x% accuracy—where outputs are reliable without human oversight—magic happens. That’s the threshold where AI achieves human-level reliability, unlocking an endless universe of use cases previously out of reach.

Reaching that level of precision, however, is no small feat. It demands relentless engineering effort and innovation.

The story of Mira starts here. But before we dive in, let’s take a moment to talk about LLM development—and why verifications are shaping up to be the next big thing in AI.

How a LLM is Born

LLM development is the latest iteration in the deep learning journey—distinct from the traditional software development practices we’ve honed over the past 50+ years. LLMs, which have only been around for about three years, flip the script completely, moving from deterministic thinking (if X, then Y) to probabilistic reasoning (if X, then… maybe Y?).

This means the infrastructure for an AI-driven world demands an entirely new set of tools and workflows. Yet many of these tools are still locked inside the research labs that created the LLMs.

The good news is that these tools are starting to trickle out into the public domain, opening up a world of possibilities for developers everywhere.

At the tail end of this new workflow lies a critical piece of the puzzle: evaluations & verifications. Today, our spotlight lands on these. They answer a fundamental question: Is the AI working well?

Verification = Trust

Trust is the foundation of any great AI product.

As AI becomes an increasingly integral part of our lives, the technology itself remains fragile. Mistakes happen, and when they do, trust erodes quickly. Users expect AI to be accurate, unbiased, and genuinely helpful, but without reliable systems in place to ensure that, frustration mounts—and frustration leads to churn.

This is where verifications come into play.

Verifications act as a safeguard. They are the quality assurance layer developers rely on to refine outputs and build systems that users can trust.

Mira is tackling a core Web2 problem with the trustless transparency of crypto. By leveraging a decentralized network of verifier nodes, Mira ensures that AI outputs are accurately and independently verified.

Enter Mira

Let’s say you have a paragraph of output from an LLM about the city of Paris. How do you verify that it is accurate? It is hard to do so because there is so much nuance around everything from claims to the structure of the content to the writing style.

This is where Mira steps in.

Mira’s vision is bold: to create a layer-1 network that delivers trustless, scalable, and accurate verification of AI outputs. By harnessing collective wisdom, Mira reduces biases and hallucinations while proving how blockchain can truly enhance AI.

Source: Mira

Early results are promising. In a recent study published on Arxiv, Mira demonstrated that using multiple models used to generate outputs and requiring consensus significantly boosts accuracy. Precision reached 95.6% with three models, compared to 73.1% for a single model output.

Two key design elements power Mira’s approach:

Sharding & Binarization of content: Breaking complex AI outputs into smaller, independently verifiable pieces.
Model Diversity: Leveraging multiple models to enhance reliability and minimise bias.

#1: Content Transformation via Binarization & Sharding

AI-generated outputs range from simple statements to sprawling essays, thanks to the near-zero cost of content generation. But this abundance of complexity creates a challenge: how do you ensure the accuracy of such diverse outputs?

Mira’s solution is simple: break it down.

Mira transforms complex AI-generated content into smaller, digestible pieces that AI models can objectively review in a process called sharding.

By standardising outputs and breaking them into discrete, verifiable claims, Mira ensures every piece can be evaluated consistently, eliminating the ambiguity that often plagues evaluations.

For example, consider this compound statement:

“Photosynthesis occurs in plants to convert sunlight into energy, and bees play a critical role in pollination by transferring pollen between flowers.”

On the surface, it seems simple to verify. But when handed to multiple models, interpretation quirks might lead to different answers. Mira’s content transformation via sharding solves this by splitting the statement into two independent claims:

“Photosynthesis occurs in plants to convert sunlight into energy.”
“Bees play a critical role in pollination by transferring pollen between flowers.”

Once sharded, each claim undergoes binarization, where it’s converted into a multiple-choice question. These questions are distributed to a network of nodes running AI models. Using Mira’s ensemble verification method, the models collaborate to evaluate and confirm the validity of each claim.

Currently, Mira’s content sharding and binarization capabilities are focused on text inputs. By early 2025, these processes will expand to support multimodal inputs, such as images and videos

#2: An Ensemble, not An Individual

Mira has developed an advanced verification system that combines the strengths of multiple AI models to assess the quality of AI outputs.

Let’s unpack that.

Traditional automated evaluations often rely on a single large language model (LLM), like GPT-4, as the ultimate arbiter of quality. While functional, this approach has significant flaws: it’s costly, prone to bias, and limited by the quirks and “personality” inherent in models.

Mira’s breakthrough is a shift from reliance on a single massive model to leveraging an ensemble of diverse LLMs. This ensemble excels in tasks where factual accuracy is more important than creative flair, reducing error rates and delivering more reliable, consistent verifications.

Ensemble techniques have been well-studied in machine learning tasks like classification, and Mira is now bringing this to verification.

At the heart of Mira’s system is the Panel of LLM verifiers (PoLL)—a collaborative network of models that work together to verify outputs. Think of it as a diverse panel of experts weighing in on a decision rather than leaving it to a single, potentially biased judge.

And this is not just wishful thinking—it’s grounded in research. Take a look at the chart below:

Accuracy changes of different evaluation judges compared to human judges. PoLL (group of models, far right) showed the smallest spread in scores compared to human judges.

A Cohere study published in April 2024 demonstrated that a panel of three smaller models—GPT-3.5, Claude-3 Haiku, and Command R—aligned more closely with human judgments than GPT-4 alone. Remarkably, this ensemble method was also 7x cheaper.

Mira is now putting this research into action, deploying its ensemble verification method at scale. The internal results they have shared so far are compelling:

• Error rates reduced from 80% to 5% for complex reasoning tasks.

• 5x improvements in speed and cost compared to human verification.

This is no small feat. By employing consensus mechanisms, Mira’s diverse ensemble of models effectively filters out hallucinations and balances individual model biases. Together, they deliver something greater than the sum of their parts: verifications that are faster, cheaper, and more aligned with our needs.

How It Works — Architectural Design

To recap, Mira’s verification system is built on two fundamental design principles:

Break AI outputs into smaller, easily verifiable pieces.
Verify every piece using an ensemble of diverse AI models.

Maintaining a diverse set of models is essential for high-quality outputs, making Mira’s design ideal for a decentralised architecture. Eliminating single points of failure is crucial for any verification product.

Mira uses a blockchain-based approach to ensure no single entity can manipulate outcomes. The premise is simple: AI-generated outputs should be verified just like blockchain state changes.

Verification happens through a network of independent nodes, with operators economically incentivised to perform accurate verifications. By aligning rewards with honesty, Mira’s system discourages bad actors and ensures reliable results.

Here’s how it works:

An AI developer creates a dataset of outputs from their model and submits it to Mira via an API.
Mira transforms the dataset into multiple-choice questions (binarization) and splits it into smaller, manageable pieces (sharding).
These shards are distributed to Mira’s network of verifier nodes. Each node receives a different shard to verify.
Each node independently reviews the questions in its assigned shard and submits its results back to the network.
Nodes assigned to the same shard reach consensus on the verification results, which are then aggregated into the final assessment.
Final verifications results are returned to the AI developer, along with an verification certificate—a cryptographic proof of the assessment. This certificate is stored on the blockchain, creating a verifiable, tamper-proof verification record.

Mira ensures data confidentiality by breaking input data into smaller pieces, ensuring no single node has access to the complete dataset.

For additional security, Mira supports dynamic privacy levels, allowing users to adjust the number of shards based on data sensitivity. While higher privacy levels require more sharding (and thus higher costs), they provide added confidentiality for users handling sensitive information.

Every verification a node performs is recorded on the blockchain, creating a transparent and auditable record of the verification process. This immutable ledger ensures trust and accountability that traditional, non-blockchain-based approaches cannot achieve.

This sets a new standard for secure and unbiased AI verification.

Ensuring that Nodes do their Work

In Mira’s decentralised network, honest work is rewarded.

Experts can deploy specialised AI models via node software and earn tokens for accurate verifications. AI developers, in turn, pay fees per verification, creating a self-sustaining economic loop between demand and supply.

This approach bridges real value from Web2 workflows into the Web3 ecosystem, directly rewarding participants such as inference providers and model creators.

But incentives come with challenges. In any decentralised system, bad actors will try to exploit the network, submitting fake results to earn rewards without doing the work.

So, how do we make sure nodes are actually performing their tasks accurately and honestly?

To maintain integrity, Mira employs Proof-of-Verification—a mechanism inspired by Bitcoin’s proof-of-work but designed for AI. Instead of mining blocks, nodes must prove they’ve completed verification tasks to participate in the consensus process.

Here’s how it works:

Staking Requirements: Every node must stake tokens as an economic commitment. If a node repeatedly submits incorrect results, a portion of its stake is slashed as a penalty. This ensures nodes have skin in the game and a reason to act honestly.
Penalties for Fake Work: Nodes that submit fake results—like skipping computations or generating random outputs—face penalties. Fraud is detected when their results consistently deviate significantly from the consensus (assuming most nodes are honest).

Proof-of-Verification creates a balanced system in which nodes are economically motivated to perform high-quality verifications. This mechanism ensures that the network remains secure and reliable over time.

Challenges & Trade-offs

Here’s the question: If Mira’s approach is so effective, why isn’t everyone doing it?

The answer lies in the trade-offs and complexities of implementing such a system in the real world. Achieving the perfect balance between fast, accurate evaluations and managing the intricacies of multiple models is no small feat.

One of Mira’s biggest hurdles is latency. While using ensembles of models allows verifications to run in parallel, synchronising results and reaching consensus introduces delays. The process is only as fast as the slowest node.

Currently, this makes Mira ideal for batch processing of AI outputs—use cases where real-time results aren’t required. As the network grows with more nodes and compute availability, the long-term goal is to achieve real-time verifications, expanding Mira’s applicability to a wider range of scenarios.

Beyond latency, other challenges include:

Engineering Complexity: Orchestrating evaluations across multiple models and ensuring the consensus mechanism operates smoothly demands significant engineering effort.

Higher Compute Requirements: Even when using smaller models, running them together in ensembles increases computational demands.

Good Consensus Mechanism Design: The way consensus is achieved—through majority voting, weighted scoring, or other methods—plays a critical role in the system’s reliability. In ambiguous cases, ensembles may struggle to align, leading to inconsistent results.

Applications & Use Cases for Mira

Source: Mira

Mira's API integrates easily with any application, similar to OpenAI’s GPT-4o. It is agnostic to consumer and B2B applications, making it a versatile solution for various use cases. Today, over a dozen applications use Mira’s infrastructure.

Consumer Integrations

On the consumer side, Mira is already powering AI verification for several early-stage AI apps:

Creato: A discovery and sharing app for personalised daily quotes and status messages, serving 120k+ users.
Astro247: A platform where users chat with an AI astrologer for personalised horoscopes and predictions.
Amor: An AI companion app allowing users to engage with fantasy AI characters for immersive conversations.
Klok: A crypto-focused ChatGPT by Mira that answers crypto queries using APIs like CoinMarketCap and web-scraped data from crypto sites and news outlets.

Delphi Oracle is the latest and perhaps most advanced integration. This AI-powered research assistant allows Delphi Digital members to engage directly with research content, ask questions, clarify points, integrate price feeds, and adjust the content to various levels of complexity.

Delphi Oracle leverages Mira Network’s verification technology to deliver reliable and accurate responses. By verifying responses across multiple models, Mira reduces hallucination rates from ~30% to under 5%, ensuring a strong foundation of trust.

At the core of Delphi Oracle is a high-performance query router

Price Queries: Routed directly to market data endpoints for near-instant responses.
Basic Questions: Handled by a cached response system, balancing speed and cost-effectiveness.
Complex Inquiries: Directed to a specialised LLM processing pipeline capable of synthesising information from multiple sources.

This smart routing system, combined with intelligent caching, ensures optimal performance by balancing latency, cost, and quality.

Mira’s testing revealed that smaller, cost-effective models could handle most queries almost as well as larger models. This has resulted in a 90% reduction in operational costs, all while maintaining the high-quality responses users expect.

Though many of these consumer apps are still early, they highlight Mira’s ability to integrate seamlessly and support large, active user bases. It’s not hard to imagine thousands of applications plugging into Mira’s ecosystem—so long as the developer experience remains simple and the value proposition stays clear.

B2B Applications

On the B2B front, Mira is zeroing in on specialised integrations in industries where trust and precision are paramount, with an initial focus on healthcare and education.

Key applications include:

Healthcare: AI assistants providing reliable second opinions and supporting doctors in critical decision-making.
Education: Personalised learning assistants that adapt to individual students’ needs while maintaining factual accuracy and alignment with curricula.
Legal Services: Systems capable of accurately summarising case law and predicting legal outcomes to streamline legal workflows.

Mira’s Endgame

Mira’s ultimate goal is to offer natively verified generations—where users simply connect via an API, just like OpenAI or Anthropic, and receive pre-verified outputs before they’re returned.

They aim to replace existing model APIs by providing highly reliable versions of existing models (e.g., Mira-Claude-3.5-Sonnet or Mira-OpenAI-GPT-4o), enhanced with built-in, consensus-based reliability.

Market Size

Generative AI is on a rocket ship. According to Bloomberg, the market is projected to grow at a jaw-dropping 42% CAGR, with revenue surpassing $1 trillion by 2030. Within this massive wave, tools that improve the speed, accuracy, and reliability of AI workflows will capture a meaningful slice.

As more enterprises integrate LLMs into their workflows—ranging from customer support chatbots to complex research assistants—the need for robust model verifications becomes more pressing.

Organisations will be seeking tools that can (1) measure model accuracy and reliability, (2) diagnose prompt and parameter inefficiencies, (3) continuously monitor performance and drift, and (4) ensure compliance with emerging regulatory frameworks around AI safety.

Sound familiar? It’s a playbook we’ve seen before with MLOps (short for “Machine Learning Operations”). As machine learning scaled in the 2010s, tools for deploying, tracking, and maintaining models became essential, creating a multi-billion-dollar market. With the rise of generative AI, LLMOps is following the same trajectory.

Capturing even a small slice of the trillion-dollar market could push this sub-sector to $100B+ by 2030.

Several Web2 startups are already positioning themselves, offering tools to annotate data, fine-tune models, and evaluate performance:

• Braintrust ($36M raised)

• Vellum AI ($5M raised)

• Humanloop ($2.8M raised)

These early movers are laying the groundwork, but the space is fluid. In 2025, we are likely to see a proliferation of startups in this sector. Some may specialise in niche evaluation metrics (e.g., bias detection, and robustness testing), while others broaden their offerings to cover the entire AI development lifecycle.

Larger tech incumbents—such as major cloud providers and AI platforms—will likely bundle evaluation features into their offerings. Last month, OpenAI introduced evaluations directly on its platform. To stay competitive, startups must differentiate through specialization, ease of use, and advanced analytics.

Mira isn’t a direct competitor to these startups or incumbents. Instead, it’s an infrastructure provider that seamlessly integrates with both via APIs. The key? It just has to work.

Mira’s initial market size is tied to LLMOps, but its total addressable market will expand to all of AI because every AI application will need more reliable outputs.

From a game theory perspective, Mira is in a unique situation. Unlike other model providers like OpenAI, who are locked into supporting their own systems, Mira can integrate across models. This positions Mira as the trust layer for AI, offering reliability that no single provider can match.

2025 Roadmap

Mira’s 2025 roadmap aims to balance integrity, scalability, and community participation on its path to full decentralisation:

Phase 1: Bootstrapping Trust (Where we are today)

In the early stage, vetted node operators ensure network reliability. Well-known GPU compute providers serve as the first wave of operators, handling initial operations and laying a strong foundation for growth.

Phase 2: Progressive Decentralisation

Mira introduces designed duplication, where multiple instances of the same verifier model process each request. While this increases verification costs, it’s essential for identifying and removing malicious operators. By comparing outputs across nodes, bad actors are caught early.

In its mature form, Mira will implement random sharding to distribute verification tasks. This makes collusion economically unviable and strengthens the network’s resilience and security as it scales.

Phase 3: Synthetic foundation model

Here Mira will offer natively verified generations. Users will connect via API, similar to OpenAI or Anthropic, and receive pre-verified outputs—reliable, ready-to-use results without additional validation.

In the coming months, Mira is gearing up for several major milestones:

Launch of Mira Flows, its AI workflow product that allows developers to build API-driven AI apps quickly
Public testnet in January.
A token launch is also on the horizon, targeted for Q1 2024.

🌈 Research Alpha: Node Delegator Program

Mira is expanding opportunities for community involvement through its Node Delegator Program. This initiative makes supporting the network accessible to everyone—no technical expertise is required.

The process is simple: You can rent compute resources and delegate them to a curated group of node operators. Contributions can range from $35 to $750, and rewards are offered for supporting the network. Mira manages all the complex infrastructure, so node delegators can sit back, watch the network grow, and capture some upside.

You can use the following code exclusive to Chain of Thought readers (300 invites only, fastest fingers first) to whitelist yourself for the delegator program: COTR0

Team

Today, Mira has a small but tight team that is largely engineering-focused.

There are 3 co-founders:

Karan Sirdesai (CEO), previously in the Crypto & AI investment team at Accel and consulting at BCG
Sid Doddipalli (CTO) is an Alumni of IIT Madras and previously the co-founder at Stader Labs, a liquid staking platform on Ethereum with $400M+ TVL
Ninad Naik (Chief Product Officer) has held leadership roles as Director of Product Management at Uber and as General Manager at Amazon’s Smart Home division.

Together, they combine investment acumen, technical innovation, and product leadership to Mira’s vision for decentralised AI verification. Mira raised a $9M seed round in July 2024, led by BITKRAFT and Framework Ventures.

Our Thoughts

It’s refreshing to see a Crypto AI team tackling a fundamental Web2 AI problem—making AI better—rather than playing speculative games in crypto’s bubble.

Verifications will be 2025’s AI buzzword

The industry is waking up to the importance of verifications. Relying on “vibes” is no longer enough. Every AI application and workflow will soon need a proper verification process—and it’s not a stretch to imagine future regulations mandating these processes to ensure safety.

Mira’s approach leverages multiple models to independently verify outputs, avoiding reliance on a single centralised model. This decentralised framework enhances trust and reduces the risks of bias and manipulation.

And let’s consider what happens if we get to AGI in the next few years (a real possibility).

As Anand Iyer from Canonical points out, if AI can subtly manipulate decisions and code, how can we trust the systems testing for these behaviours? Smart people are thinking ahead. Anthropic’s research underscores the urgency, highlighting evaluations as a critical tool to identify potentially dangerous AI capabilities before they escalate into problems.

By enabling radical transparency, blockchains add a powerful layer of protection against rogue AI systems. Trustless consensus mechanisms ensure that safety evaluations are verified by thousands of independent nodes (like on Mira), drastically the risk of Sybil attacks.

Ambitious Vision with Execution Risk

Mira is chasing a huge market with clear demand for a solution that works. But the challenges are real. Improving latency, precision, and cost efficiency will require relentless engineering effort and time. The team will need to consistently demonstrate that their approach is measurably better than existing alternatives.

The core innovation lies in Mira’s binarization and sharding process. This “secret sauce” promises to address scalability and trust challenges. For Mira to succeed, this technology needs to deliver on its promise.

Token design & Mira’s secret sauce

In any decentralised network, token and incentive design are make-or-break factors. Mira’s success will depend on how well these mechanisms align participant interests while maintaining network integrity.

While the details of Mira’s tokenomics remain under wraps, I expect the team to reveal more as the token launch approaches in early 2025.

A Bright Future

❝

“We’ve found that engineering teams who implement great evaluations move significantly faster – up to 10 times faster – than those who are just watching what happens in production and trying to fix them ad-hoc,”

Ankur Goyal, Braintrust

In an AI-driven world, trust is everything.

As models become more complex, reliable verifications will underpin every great AI product. They help us tackle hallucinations, eliminate biases, and ensure AI outputs align with users' actual needs.

Mira automates verifications, cutting costs and reliance on human intervention. This unlocks faster iterations, real-time adjustments, and scalable solutions without bottlenecks.

Ultimately, Mira aims to be the API for trust—a decentralised verification framework that every AI developer and application can depend on for verified answers.

It’s bold, ambitious, and exactly what the AI world needs.

Thanks for reading,

Teng Yan

This research deep dive was sponsored by Mira, with Chain of Thought receiving funding for this initiative. All insights and analysis are our own. We uphold strict standards of objectivity in all our viewpoints.

To learn more about our approach to sponsored Deep Dives, please see our note here.

This report is intended solely for educational purposes and does not constitute financial advice. It is not an endorsement to buy or sell assets or make financial decisions. Always conduct your own research and exercise caution when making investment choices.

Mira: The One API for Trust