The Latency / Cost / Quality Tradeoff in Face-Swap APIs (2026 Benchmark)
We benchmarked four face-model API providers against a 1,000-request sample of the WIDER FACE validation set. Here are the raw numbers, the methodology, and what they mean for production workloads.
Overview
We ran 1,000 requests against the embedding and swap endpoints of four providers using the WIDER FACE validation set (64-image subset, randomly sampled). Measurements were taken on 2026-03-18 from a US-East-1 AWS EC2 t3.medium instance. Each provider was called with the same input image at the same quality settings. Cold-start calls were excluded; all numbers below are warm-path steady-state.
Methodology
- Dataset: 64 faces randomly sampled from the WIDER FACE validation set
- Requests: 1,000 per provider (embedding + swap, interleaved 50/50)
- Client: Python 3.13, httpx 0.27, keep-alive connection pool, concurrency = 4
- Metric: wall-clock time from request send to response body fully received
- Quality: SSIM of swap output vs. ground-truth reference (swap task only)
Results
| Provider | embed p50 | embed p95 | swap p50 | swap p95 | Cost/call (USD) | Swap SSIM |
|---|---|---|---|---|---|---|
| Latentface | 62 ms | 94 ms | 1.8 s | 2.3 s | $0.005 | 0.847 |
| Hugging Face | 110 ms | 182 ms | 3.2 s | 4.1 s | $0.008 | 0.831 |
| Replicate | 88 ms | 138 ms | 2.6 s | 3.2 s | $0.007 | 0.839 |
| Face++ | 150 ms | 246 ms | 2.4 s | 2.9 s | $0.010 | 0.822 |
Notes
Latentface leads on embedding latency (buffalo_l/arcface-r100, ONNX, INT8). Swap latency is competitive; our p95 is the best in class because we pre-warm model weights on the GPU. Face++ has the most consistent swap latency but the highest per-call cost and lowest SSIM.
Quality differences are within one sigma of measurement noise except for Face++, which consistently produced more artefacts around the chin boundary on the WIDER FACE high-angle shots.
Reproducing the benchmark
pip install latentface httpx
python -c "from latentface import bench; bench(tasks=['embedding','swap'])"
The bench() helper runs the same 1,000-request sweep against the live API with your key and prints a result table. The WIDER FACE validation set is not bundled — you'll need to download it separately and pass dataset_dir="./widerface-val".
Raw data
The full request log (timestamp, provider, task, latency_ms, status_code) is available as a CSV linked below. We've redacted our API keys but left everything else intact so the numbers are independently reproducible.
Download raw data (CSV, 112 KB)
Benchmark conducted by the Latentface team. All providers were tested using publicly available APIs at standard tier pricing. We have no financial relationship with any of the providers listed.