We benchmarked four face-model API providers against a 1,000-request sample of the WIDER FACE validation set. Here are the raw numbers, the methodology, and what they mean for production workloads.

Overview

We ran 1,000 requests against the embedding and swap endpoints of four providers using the WIDER FACE validation set (64-image subset, randomly sampled). Measurements were taken on 2026-03-18 from a US-East-1 AWS EC2 t3.medium instance. Each provider was called with the same input image at the same quality settings. Cold-start calls were excluded; all numbers below are warm-path steady-state.

Methodology

Dataset: 64 faces randomly sampled from the WIDER FACE validation set
Requests: 1,000 per provider (embedding + swap, interleaved 50/50)
Client: Python 3.13, httpx 0.27, keep-alive connection pool, concurrency = 4
Metric: wall-clock time from request send to response body fully received
Quality: SSIM of swap output vs. ground-truth reference (swap task only)

Results

Provider	embed p50	embed p95	swap p50	swap p95	Cost/call (USD)	Swap SSIM
Latentface	62 ms	94 ms	1.8 s	2.3 s	$0.005	0.847
Hugging Face	110 ms	182 ms	3.2 s	4.1 s	$0.008	0.831
Replicate	88 ms	138 ms	2.6 s	3.2 s	$0.007	0.839
Face++	150 ms	246 ms	2.4 s	2.9 s	$0.010	0.822

Notes

Latentface leads on embedding latency (buffalo_l/arcface-r100, ONNX, INT8). Swap latency is competitive; our p95 is the best in class because we pre-warm model weights on the GPU. Face++ has the most consistent swap latency but the highest per-call cost and lowest SSIM.

Quality differences are within one sigma of measurement noise except for Face++, which consistently produced more artefacts around the chin boundary on the WIDER FACE high-angle shots.

Reproducing the benchmark

pip install latentface httpx
python -c "from latentface import bench; bench(tasks=['embedding','swap'])"

The bench() helper runs the same 1,000-request sweep against the live API with your key and prints a result table. The WIDER FACE validation set is not bundled — you'll need to download it separately and pass dataset_dir="./widerface-val".

Raw data

The full request log (timestamp, provider, task, latency_ms, status_code) is available as a CSV linked below. We've redacted our API keys but left everything else intact so the numbers are independently reproducible.

Download raw data (CSV, 112 KB)

Benchmark conducted by the Latentface team. All providers were tested using publicly available APIs at standard tier pricing. We have no financial relationship with any of the providers listed.

The Latency / Cost / Quality Tradeoff in Face-Swap APIs (2026 Benchmark)