Overview
The LLM IQ Agent API is a plug and play evaluation platform designed for enterprises seeking to benchmark and compare large language models (LLMs) such as GPT-4, Claude 3, Gemini, Mistral, and Cohere without the overhead of prompt engineering, dataset curation, or framework configuration.
Using natural language queries, teams can instantly access comprehensive benchmarking results across 25+ enterprise-grade evaluation domains, including reasoning, summarization, extraction, and query generation. The API supports questions like What is the best model for financial document summarization? or Compare Claude 3 and GPT-4 on reasoning tasks. Behind the scenes, it runs precision-tuned tests using multiple prompt variations and decoding strategies to simulate realistic workflows.
With actionable insights delivered through a professional-grade API, LLM-IQ Agent API enables intelligent decision-making at every stage of the GenAI lifecycle. Development teams can embed the API directly into inference workflows to power real-time model selection and dynamic prompt routing, automatically choosing the best-fit model for each user query. Procurement and vendor management functions gain standardized metrics for evaluating LLM providers, while engineering teams can offload the burden of framework development. For regulated industries, the API offers audit-ready evaluations aligned to compliance standards and domain-specific requirements. With LLM-IQ, enterprises gain a trusted layer of evaluation and transparency to support retrieval-augmented generation (RAG), multi-agent orchestration, and large-scale model deployment strategies.
Highlights
- Natural language-driven LLM evaluation API benchmark GPT-4, Claude 3, Gemini, and more with no setup required
- Covers 25+ enterprise use cases such as reasoning, summarization, extraction, query generation, and more
- Objective, real-time model benchmarking powered by proprietary prompt engineering and decoding strategies
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Quick Launch
Pricing
Dimension | Description | Cost/request |
|---|---|---|
Successful API Requests | Number of successful API requests completed | $0.06 |
Vendor refund policy
Articul8 charges only for successful API requests. Failed or incomplete requests are excluded from billing. Refunds or credits may be issued if a failed request was misclassified or usage was misattributed. Requests must be submitted within 15 days with relevant logs. Refunds are typically issued as credits; monetary refunds are only provided in cases of billing errors.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
- Amazon Bedrock AgentCore
API-Based Agents & Tools
API-Based Agents and Tools integrate through standard web protocols. Your applications can make API calls to access agent capabilities and receive responses.
Additional details
Usage instructions
API
LLM-IQ Agent
The LLM-IQ Agent is an intelligent assistant that helps you select the optimal AI model for your use case. It analyzes comprehensive benchmark datasets and uses advanced querying capabilities to deliver personalized, data-driven model recommendations from a simple natural-language question.
Whether you're comparing model accuracy, identifying the best model for a specific task, or exploring the landscape of open- and closed-source models, the LLM-IQ Agent provides actionable insights without manual research.
How It Works
The agent operates over a structured dataset of model evaluations, parameters, metrics, and task-level results. It uses specialized internal tools, including:
- Data extraction for retrieving model attributes
- Filtering and ranking mechanisms for narrowing results
- Aggregation for summarizing performance
- Dataset metadata tools to provide context and coverage
When you submit a query, the agent parses your intent, selects the right tools, processes benchmark data, and returns clear recommendations with supporting evidence.
Key Benefits
- Natural-language model selection
- Data-driven recommendations grounded in benchmark results
- Easy comparison of models across tasks and domains
- Significant time savings through automated research
Quick Start
-
Select the desired endpoint (link placeholder).
-
Send a POST request to:
<https://agents-api.articul8.ai/v1/llm-iq-agent/recommend>
Include your API key and your question as the "query" field.
-
Receive model recommendations with supporting benchmark data.
Example Queries
- “Which models are available right now?”
- “Which model performs best for grounded chat?”
- “How do GPT-5 and Claude 4 compare for financial documents?”
- “What evaluation categories are included?”
- “What model excels at generating queries from context?”
- “I want to summarize insights on Morocco’s economy. Which model should I use?”
API Usage
Making a Request
Response Structure
| Key | Description |
|---|---|
| answer | The agent’s model recommendation |
| tools_used | Tools selected to process your query |
| model_data | Benchmark evidence supporting the result |
Error Handling
| Code | Meaning | How to Fix |
|---|---|---|
| 422 | Validation error | Check syntax and request formatting |
| 401 | Unauthorized | Verify LLM-IQ access and API key |
| 500 | Internal server error | Retry; contact support if persistent |
Need Help?
Contact Articul8 Support at:
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.