The first MCP-native eval and observability tool. Log every trace, evaluate output quality, track costs across all your agents. Open-source core. One command to start.
npx @iris-eval/mcp-server
Works with any MCP-compatible agent
Traditional APM sees HTTP status codes and latency. It has no idea your agent just leaked a credit card number, hallucinated an answer, or burned $0.47 on a single query.
Iris registers as an MCP server. Your agent discovers it and invokes its tools automatically. No SDK. No code changes.
log_trace captures full agent runs with hierarchical spans, per-tool-call latency, token usage, and cost in USD. Stored in SQLite, queryable instantly.
evaluate_output scores quality across completeness, relevance, safety, and cost. Returns per-rule pass/fail with actionable suggestions. Add custom rules via Zod schemas.
Aggregate cost across all agents over any time window. Not just per-trace cost — total spend visibility. Set budget thresholds and get flagged when agents overspend.
Add Iris to your Claude Desktop MCP config. Works with Claude Desktop, Cursor, any MCP-compatible agent.
{
"mcpServers": {
"iris-eval": {
"command": "npx",
"args": ["@iris-eval/mcp-server"]
}
}
}
$ npm install -g @iris-eval/mcp-server
$ iris-mcp --dashboard
Original research on MCP agent observability, evaluation methodology, and the evolving landscape of AI agent infrastructure.
The gap between deploying AI agents and understanding what they're doing. Covers protocol-native observability, heuristic vs. semantic eval, cost visibility, and EU AI Act implications.
Read reportAI agents fail silently. Traditional monitoring can't see the difference between a correct response and a hallucinated one. Why protocol-native observability changes the equation.
Read postWe're collecting data on how teams evaluate, monitor, and track costs for AI agents in production. Results will be published as a downloadable report.
Get notifiedI kept running into the same problem building AI agents: once they're running, you have no visibility into what they're actually doing. Traditional monitoring tells you the request succeeded. It can't tell you the agent leaked PII, hallucinated an answer, or burned through your budget on a single query.
So I built Iris — an MCP server that any agent discovers and uses automatically. No SDK. No code changes. Just add it to your config and start seeing everything.
3 tools, 12 eval rules, SQLite storage, web dashboard, production security
PostgreSQL, multi-tenancy, team dashboards, API key management
Alert rules, webhooks, email notifications, retention policies
Semantic evaluation, OpenTelemetry export, drift detection, A/B testing
SSO/SAML, RBAC, audit logs, SOC 2 compliance
As your team grows, Iris grows with you. Get early access to the cloud tier.