LLMTest

MCP Tools Reference

These tools are available when you connect the LLMTest MCP server to your IDE. Your AI assistant calls them automatically based on your requests.

status

Show proxy status and activity summary. Returns flow count, total calls, spend, and pending suggestions.

"Check my LLMTest status"

list_flows

List all AI flows with model, call count, latency, and cost per flow.

"What AI models am I using and how much do they cost?"

get_suggestions

Get pending model-switch recommendations. Shows cost savings, latency differences, and quality comparisons.

"Are there any cheaper models I should switch to?"

update_suggestion

Accept or dismiss a model suggestion.

id number required
Suggestion ID
action string required
"accept" or "dismiss"

run_benchmark

Benchmark a flow against alternative models. The system selects the most relevant challengers based on your optimization goal, runs pairwise comparisons using an AI judge, and returns win/loss/tie records with cost and latency data.

flow string required
Flow name to benchmark
currentModel string optional
Baseline model. Required for pre-launch flows. Auto-detected from traffic if omitted.
optimize_for "cost" | "quality" | "speed" | "balanced" optional
What to optimize for. Affects which challengers are selected and how results are ranked. Default: "balanced".
challengers string[] optional
Specific models to test (e.g. ["anthropic/claude-sonnet-4"]). If omitted, challengers are auto-selected.
"Benchmark my support-bot flow, optimize for cost"
"Test if Claude Sonnet is better than GPT-4o for my code-reviewer flow"

seed_samples

Register test samples for a flow. Needed for pre-launch benchmarking when you don't have real traffic yet. The AI assistant generates realistic test prompts based on your description.

flow string required
Flow name
samples array required
Array of message arrays (system + user messages)
"Create 10 test samples for my support-bot flow with diverse customer issues"

list_samples

Show how many test samples are stored per flow and whether each flow is ready for benchmarking (needs at least 3).

list_new_models

Show new and trending models worth testing. Includes pricing, context length, and priority score.

"What new AI models have been released recently?"

get_account

Check your credit balance, total spend, and account info.

leave_feedback

Submit feedback about the tool experience.

rating 1-5 required
Rating
comment string optional
Free-text feedback