Blog/The Latency / Cost / Quality Tradeoff in Face-Swap APIs (2026 Benchmark)
ResearchPillar9 min read

The Latency / Cost / Quality Tradeoff in Face-Swap APIs (2026 Benchmark)

Latentface Team·

We benchmarked four face-model API providers against a 1,000-request sample of the WIDER FACE validation set. Here are the raw numbers, the methodology, and what they mean for production workloads.

Overview

We ran 1,000 requests against the embedding and swap endpoints of four providers using the WIDER FACE validation set (64-image subset, randomly sampled). Measurements were taken on 2026-03-18 from a US-East-1 AWS EC2 t3.medium instance. Each provider was called with the same input image at the same quality settings. Cold-start calls were excluded; all numbers below are warm-path steady-state.

Methodology

  • Dataset: 64 faces randomly sampled from the WIDER FACE validation set
  • Requests: 1,000 per provider (embedding + swap, interleaved 50/50)
  • Client: Python 3.13, httpx 0.27, keep-alive connection pool, concurrency = 4
  • Metric: wall-clock time from request send to response body fully received
  • Quality: SSIM of swap output vs. ground-truth reference (swap task only)

Results

Providerembed p50embed p95swap p50swap p95Cost/call (USD)Swap SSIM
Latentface62 ms94 ms1.8 s2.3 s$0.0050.847
Hugging Face110 ms182 ms3.2 s4.1 s$0.0080.831
Replicate88 ms138 ms2.6 s3.2 s$0.0070.839
Face++150 ms246 ms2.4 s2.9 s$0.0100.822

Notes

Latentface leads on embedding latency (buffalo_l/arcface-r100, ONNX, INT8). Swap latency is competitive; our p95 is the best in class because we pre-warm model weights on the GPU. Face++ has the most consistent swap latency but the highest per-call cost and lowest SSIM.

Quality differences are within one sigma of measurement noise except for Face++, which consistently produced more artefacts around the chin boundary on the WIDER FACE high-angle shots.

Reproducing the benchmark

pip install latentface httpx
python -c "from latentface import bench; bench(tasks=['embedding','swap'])"

The bench() helper runs the same 1,000-request sweep against the live API with your key and prints a result table. The WIDER FACE validation set is not bundled — you'll need to download it separately and pass dataset_dir="./widerface-val".

Raw data

The full request log (timestamp, provider, task, latency_ms, status_code) is available as a CSV linked below. We've redacted our API keys but left everything else intact so the numbers are independently reproducible.

Download raw data (CSV, 112 KB)


Benchmark conducted by the Latentface team. All providers were tested using publicly available APIs at standard tier pricing. We have no financial relationship with any of the providers listed.

Try Latentface today

Developer-first face-model API — embedding, similarity, swap, enhancement — with Python and TypeScript SDKs and pay-per-call pricing.