TEEs for AI agents combine a hardware-isolated execution environment with remote attestation to produce verifiable AI compute — outputs from a language model or agent that a smart contract or end user can trust without trusting the operator running the model. The hardware enclave keeps prompts, model weights, and intermediate activations confidential while signing the final output with a key bound to the loaded code's measurement. The signed output, plus the attestation chain rooted in the chip vendor's PKI, lets a verifier check that a specific model ran on a specific input even if the operator is unknown or untrusted.
This pattern has moved from concept to production over 2023–2025. Marlin's Oyster, Phala Network's worker mesh, Atoma Network, Automata's verifiable AI service, and Flashbots' work on attested inference all ship in some form. The Confidential Computing Consortium's 2023 outlook report projected the broader confidential computing market reaching roughly $54 billion by 2026, with AI inference listed as one of the fastest-growing segments. Hardware vendor support — Nvidia Confidential Computing on Hopper-generation GPUs, Intel TDX on 4th and 5th Gen Xeons, and AWS Nitro on most modern EC2 — has made attested AI inference operationally feasible.
This article walks through why agents need verifiable compute, how a TEE-based attested inference flow works, the production projects building it, and the limitations teams should weigh before depending on it.
Why AI Agents Need Verifiable Compute
An autonomous AI agent — a piece of software that perceives an environment, decides an action, and acts on behalf of a user — has a verifiability problem when it controls anything of value. The user wants to know that the agent followed its specified policy. A smart contract that pays the agent wants to know the agent's output came from the right model on the right input. Other agents the agent transacts with want to confirm the same. Without a verification primitive, every interaction collapses to "trust the operator running the model."
Three concrete scenarios drive demand for verifiable agent compute:
Agent-controlled trading and treasury management. Agents that hold funds, place trades, or move stablecoins between chains need to prove they followed user-specified rules. A signed attestation linking each output to the loaded model and prompt closes the audit gap.
Agent-to-agent commerce. When agents transact with each other — paying for inference, ordering data, fulfilling intents — each side wants assurance about the other's behavior. EIP-8004 and similar trust-layer designs explicitly contemplate attested execution as one of the inputs.
Regulated agent workflows. Compliance-bound institutions running agents on sensitive data need an audit trail showing what the model saw and what it produced. The attestation gives that trail without requiring the data to be replicated to an auditor.
The alternative paths are weaker. Replicating inference across multiple operators (commit-and-reveal, threshold consensus) is expensive and reveals the inputs. ZK-proving an LLM is technically possible but currently runs 10,000–100,000× slower than native inference; RISC Zero and Succinct have published proving benchmarks confirming this gap. TEE-based attested inference sits in the gap — hardware-rooted trust with near-native inference speed.
How a TEE-Attested Inference Flow Works
The mechanics of attested inference follow a four-step pattern that recurs across production deployments.
Step 1: Enclave provisioning. The operator launches an enclave or trust domain — Intel SGX, Intel TDX, AWS Nitro Enclave, or AMD SEV-SNP. The CPU computes a measurement of the loaded code. The deployer publishes the expected measurement (hash) so verifiers know what to look for.
Step 2: Model and prompt loading. Model weights are loaded into the enclave through an attested channel — typically a TLS connection terminated inside the enclave whose endpoint key is bound to the measurement. The prompt and any user-supplied context arrive through the same channel. The host operator never sees plaintext weights or prompts.
Step 3: Inference and signing. The model runs inside the enclave. The output is computed and signed with an enclave-held key. The signature, along with a hash of the input and a hash of the model weights, becomes the attested output payload.
Step 4: Verification. A verifier — a smart contract, a user-side library, another agent — checks the signature against the attestation chain rooted in the chip vendor's PKI (Intel PCS, AMD KDS, or AWS Nitro). The verifier confirms the measurement matches the expected value, the chip TCB is current, and the signature is valid. The output is then accepted as having come from the specified model on the specified input.
Onchain verification adds a fifth step: a Solidity contract implements the chip-vendor verification logic. Automata's DCAP attestation contracts verify Intel SGX/TDX attestations directly in EVM, with a SNARK-compressed path to reduce gas. Marlin's Oyster integrates similar onchain verifiers for both Nitro and SGX paths.
Several variations on the pattern appear in production:
Persistent signing keys. The enclave generates a key pair once at launch and registers the public key onchain alongside the measurement. Subsequent outputs are signed with that key — the verifier checks attestation once, then accepts enclave-signed outputs continuously without re-verifying the chip-vendor chain on every call.
Bound-output signatures. The enclave includes a hash of the input and a session ID in the signed payload, preventing an output from being replayed against a different input.
Attested model commitments. The model weight hash is committed onchain at deployment time so verifiers can detect substitution attacks where an operator swaps in a different model.
Hardware Stack: GPU TEEs and CPU TEEs
LLM inference at production scale requires GPUs, which historically sat outside the CPU TEE trust boundary. The hardware stack has changed substantially in 2023–2025.
Nvidia Hopper (H100) Confidential Computing. Nvidia's Hopper architecture includes Confidential Compute mode, which encrypts data on the PCIe link between CPU and GPU and runs the GPU's compute kernels inside a hardware-isolated context. Combined with Intel TDX or AMD SEV-SNP on the host, this lets a model run inside a confidential VM with the GPU compute also attested. Nvidia documentation covers the protocol at the NVIDIA Confidential Computing developer page.
Intel TDX with GPU passthrough. Microsoft Azure and Google Cloud offer TDX confidential VMs with attached H100 GPUs in confidential mode. The full inference path — host CPU, memory, GPU — runs inside an attested boundary.
AWS Nitro Enclaves with Inferentia and GPU. AWS Nitro Enclaves originally lacked GPU support, restricting them to CPU-only inference. The 2024 introduction of GPU passthrough into Nitro Enclaves on G4/G5/P5 instances closed the gap, though Nitro's GPU integration is more limited than Nvidia Confidential Computing on TDX.
CPU-only inference. Smaller agent workloads (sub-7B-parameter models, RAG retrievers, classification heads, policy engines) run entirely on CPU TEEs. Intel SGX with Ice Lake's terabyte-scale EPC handles these comfortably. Many production agent deployments today are CPU-only because the policy and orchestration logic dominates the compute, and the heavy LLM is called through an external attested service.
The hardware stack matters because it determines the model size that can run inside the trust boundary. A 70B-parameter LLM needs GPU-class compute; a small policy model fits CPU SGX. Production deployments tend to split: a CPU TEE runs the orchestration logic, and calls a GPU-confidential inference endpoint for the heavy lifting.
Production Projects Building TEE-Based AI
Several projects have shipped TEE-attested AI infrastructure beyond proof-of-concept stage.
Marlin Oyster. Marlin's Oyster is a marketplace for confidential compute that supports AWS Nitro Enclaves and Intel SGX, with TDX support added in 2024. Oyster's product page covers AI inference, ZK proving, and oracle workloads. Several agent projects use Oyster to run LLM agents that produce onchain-verifiable outputs.
Phala Network. Phala has positioned its SGX-based worker mesh as a host for AI agents — the project markets "Agent Wars" and Phat Contract AI. The agent's prompt and model run inside an attested SGX enclave; the output is signed by the worker's measurement-bound key. Phala's public dashboards show the worker network processed roughly 30,000 contract calls per day in early 2025, with thousands of registered workers.
Atoma Network. Atoma is a more recent entrant focused exclusively on confidential AI inference. The network supports a mix of Intel TDX, AMD SEV-SNP, and Nvidia Confidential Computing GPU support. The pitch is verifiable AI compute settled in stablecoins with onchain-verifiable inference receipts.
Automata Multi-Prover. Automata's DCAP attestation contracts let any EVM smart contract verify Intel SGX/TDX attestations directly. Automata's verifiable AI service uses TEE-attested LLM inference for agent-controlled smart-contract interactions. The automata-dcap-attestation repository is one of the most-used onchain attestation verifiers.
Flashbots and SUAVE. Flashbots' SUAVE coprocessor architecture, while not exclusively focused on AI, provides the attested execution primitive that AI agent flows can build on. The SUAVE TEE writings describe the architecture in production-grade detail.
Tinfoil and Confidential Inference. Smaller projects like Tinfoil offer confidential LLM inference services where the model runs inside an attested enclave and the API consumer can verify the attestation before sending sensitive prompts. The pattern has expanded in 2024–2025 alongside concerns about AI providers training on user inputs.
Linking Agent Outputs to Onchain Actions
The point of attested inference for agents is not just confidentiality — it is binding the agent's decision to an onchain consequence. Several integration patterns have emerged.
Agent as transaction signer. The enclave holds a private key that signs Ethereum transactions. The smart contract verifies the signing key was generated inside an enclave with a known measurement. Every transaction the agent submits is therefore traceable to the loaded model.
Agent as oracle. The agent's output is consumed by a smart contract as an oracle input. The contract verifies the attestation before accepting the value. Use cases include conditional payments, claim adjudication, content moderation decisions, and policy-based access control.
Agent as intent solver. An attested agent receives user intents (e.g., "swap $100 USDC for ETH at the best rate"), computes the optimal execution plan, and signs the resulting transaction batch. The user can verify before submission that the plan came from a known model attesting to a public policy.
Agent as compliance gate. An attested agent runs a policy model that decides whether a regulated transaction should proceed. The compliance attestation accompanies the transaction onchain, giving regulators an audit trail without requiring the underlying data to be public.
Each pattern depends on the same primitive — the enclave's signed output binds the agent's behavior to the loaded code. Onchain verification of the measurement closes the loop.
Limitations and Open Problems
TEE-attested AI inference is the most practical path to verifiable agent compute in 2026, but it is not perfect. Five concerns recur in production deployments.
Side-channel attacks on inference. Cache timing, branch prediction, and power-analysis side channels have all been demonstrated against SGX. ML inference workloads have additional side-channel surface — memory-access patterns can leak information about the prompt or model weights. Production deployments use constant-time inference paths and reject vulnerable chip generations through attestation policy.
Vendor concentration. Almost all production attested AI inference depends on Intel, AMD, AWS, or Nvidia signing infrastructure. A vendor PKI compromise would be catastrophic. Multi-vendor architectures — running the same inference across SGX and TDX, or combining a TEE attestation with a ZK proof — reduce single-vendor exposure.
Model commitment integrity. The attestation proves the loaded code matches a measurement. It does not by itself prove the model weights are the audited version unless the weights are committed onchain or hashed into the measurement. Production systems either bake the model hash into the binary or commit the weight hash separately.
Output replay. Without an input hash and session binding in the signed output, a valid attestation can be reused. Production systems include input hashes, nonces, and session IDs in the signed payload to prevent replay.
Bounded compute. Even with TB-scale EPC and GPU confidential computing, attested inference fits a specific operating envelope — typically inference rather than training, single-chip rather than multi-host, sub-trillion-parameter models. Larger workloads currently exceed what attested compute can run, though hardware capacity grows each generation.
How Eco's Stack Connects to Agent Compute
Eco's stablecoin orchestration network spans 15 chains, executing transfers and intents across cross-chain messaging protocols and intent-based routing. AI agents that move stablecoins on behalf of users — managing treasury balances, paying invoices, fulfilling intents — sit naturally on top of this stack. Verifiable compute lets those agents prove they followed a stated policy: an attested LLM signs each transfer decision, the smart contract verifies the attestation, and the routing layer executes the transfer. For teams building agentic stablecoin flows, attested inference is one of the primitives that turns "the agent should follow this policy" into a verifiable claim.
FAQ
What is verifiable compute for AI?
Verifiable compute is computation whose result can be checked by an outside party without re-running the computation. For AI, this typically means proving that a specific model ran on a specific input, producing a specific output. TEE-based verifiable compute uses a hardware enclave that signs the output with a key bound to the loaded code's measurement, producing an attestation a smart contract or user can verify.
Can a TEE actually run an LLM?
Yes, with appropriate hardware. Smaller models (sub-7B parameters) run on CPU TEEs like Intel SGX with Ice Lake's terabyte-scale EPC. Larger models require GPU support — Nvidia Hopper Confidential Computing combined with Intel TDX or AMD SEV-SNP on the host. AWS Nitro Enclaves added GPU passthrough in 2024. Most production deployments today either run smaller models entirely in CPU TEEs or pass to a GPU-confidential inference endpoint.
Why not use ZK proofs instead?
ZK proving an LLM remains 10,000–100,000× slower than native inference as of 2026. RISC Zero and Succinct have published proving benchmarks confirming the overhead. For workloads where the prompt and the latency matter, TEEs are the practical choice. ZK is more appropriate for short-circuit verification — proving a model card, a policy compliance check, or a final settlement state — than for live inference.
How does a smart contract verify an AI agent's TEE attestation?
The smart contract calls a verifier contract that implements the chip vendor's attestation logic. Automata's DCAP attestation contracts verify Intel SGX/TDX attestations directly in EVM, with a SNARK-compressed path that reduces gas costs. The verifier checks the signature chain, confirms the measurement matches an expected value, and validates the chip TCB. Once verified, the contract accepts subsequent enclave-signed outputs without re-verifying the chain.
What projects offer attested AI inference today?
Marlin Oyster, Phala Network, Atoma Network, Automata, and Tinfoil all ship attested AI inference in production. Marlin and Automata are the most blockchain-native — both expose onchain attestation verifiers. Phala focuses on Phat Contract AI agents inside its SGX worker mesh. Atoma targets confidential GPU inference settled in stablecoins. Tinfoil and similar projects offer confidential inference as a developer-facing API.

