EvalForge Docs
Back to main site →

Security & Production

Important considerations for deploying the open-source platform to production.


API Authentication & Proxies

By default, the EvalForge FastAPI backend does not enforce any authentication middleware on its endpoints. It is designed to be run in trusted, isolated environments (like a local machine or a private VPC) where the frontend is the only consumer.

Public Deployment Warning

If you expose the FastAPI backend directly to the public internet, you must place it behind an authentication proxy (e.g., Cloudflare Access, OAuth2-Proxy, or NGINX with Basic Auth). Furthermore, the default CORS policy allows all credentials and methods, which can be dangerous if the API is exposed on a public domain without an auth gateway.

Heuristic Moderation ("Safe Ingest")

When importing datasets or receiving manual traces, EvalForge offers scrub_pii and filter_toxicity flags to prevent sensitive data from entering the database.

Important: The current built-in implementation uses basic regular expressions for PII detection and a hardcoded list of keywords for toxicity.

  • Toxicity: This is a simple heuristic. It may flag benign phrases like "this was the worst weather" while missing subtle or contextual toxicity.
  • PII: The regex patterns for SSNs, phone numbers, and emails are baseline standard and may produce false positives on 7-digit numeric strings.

For enterprise deployments with strict compliance requirements, we recommend replacing the moderate_content function in the backend with an API call to a dedicated ML service (like Azure Content Safety or OpenAI Moderation API).

Execution Concurrency

When running bulk experiments via ARQ workers, EvalForge pushes LLM evaluation requests sequentially inside each batch job. The max_concurrent_requestssetting exists in configuration but the primary execution loop relies on the LLM provider's own queue limits. Make sure your provider accounts (OpenAI/Anthropic) have sufficient Tier/Rate Limits before running 1,000+ sample datasets.