EvalForge Docs
Back to main site →

Models & APIs

Configure external providers and local models via the LiteLLM gateway.


Unified Model Layer

EvalForge uses LiteLLM as its routing gateway. This means you have a single, unified interface to connect to over 100+ LLM providers including OpenAI, Anthropic, Mistral, Google Gemini, and local open-source models (via Ollama or vLLM).

Model Configurations

A Model Configuration stores the parameters needed to execute a prompt against an LLM. You configure these once and reuse them across various experiments.

ParameterDescriptionExample
providerThe LiteLLM provider mappingopenai, mistral
model_nameThe specific model identifiergpt-4o-mini, claude-3-opus
temperatureSampling temperature (0.0 to 2.0)0.7
base_urlCustom API endpoint (for local/proxy)http://localhost:11434

API Keys & Security

By default, API keys are managed at the environment level in your backend .env file (e.g., OPENAI_API_KEY). This ensures sensitive credentials never leak to the browser frontend or database.

Important Note on LiteLLM Configuration

The provider and model_name fields map directly to LiteLLM standard identifiers. For example, to use Mistral, configure provider: mistral and model_name: mistral-small-latest.

LiteLLM Proxy: If you use the litellm provider with a base_url, EvalForge automatically configures proxy mode and sends LITELLM_MASTER_KEY as the authorization header.

Testing Connections

You can test a model configuration before running an experiment. EvalForge provides a dedicated endpoint that verifies connectivity, bypassing fallback logic to ensure the specific configuration is reachable.

POST /api/v1/models/{id}/test

{
  "status": "success",
  "message": "Successfully connected to mistral/mistral-small-latest",
  "latency_ms": 333.67
}

Cost & Latency Tracking

EvalForge automatically intercepts responses from LiteLLM to track exact input and output token usage. It estimates costs based on the specific model's pricing tier and records the time-to-first-token (TTFT) and total response latency.