Face-Model API for Developers
Developer-first face-model API — embedding, similarity, swap, enhancement — with Python and TypeScript SDKs and pay-per-call pricing.
Indie ML devs and research teams ship with Latentface every day.
Features
A face-model API for every job
Six REST endpoints. Python and TypeScript SDKs with matching surfaces. Public pay-per-call pricing.
POST /v1/embed
Extract a 512-dim face embedding from an image. Use for similarity search, deduplication, and verification pipelines.
POST /v1/swap
Swap the face from a source image onto a target. Returns a PNG stream or an S3-hosted URL at your option.
POST /v1/match
Cosine-similarity match between two embeddings or two images. Returns a 0–1 score with a decision threshold.
POST /v1/enhance
Upscale and restore low-resolution or compressed face crops. Preserves identity without over-smoothing skin.
POST /v1/blend
Blend N source faces into a single synthetic identity with tunable weights. Great for look-alike generators.
GET /v1/usage
Query per-key call counts, error rates, p50/p95 latency, and the current billing-cycle spend.
Use Cases
Built for every use case
From first-call prototypes to production pipelines — Latentface plugs into your stack in minutes.
Developer ergonomics
- Typed SDKs
Python and TypeScript packages with matching call surfaces and strict types.
- curl-first docs
Every endpoint documented with a runnable curl example before any SDK boilerplate.
- Hugging Face Spaces
Demo models published on HF Spaces — inspect, fork, and benchmark before you integrate.
Honest pay-per-call
- Per-call billing
From $0.005 per call on the Starter tier. No seat fees, no annual minimums.
- Free tier
100 calls/day free with watermarked output — build a demo before you swipe a card.
- Usage dashboard
Per-key spend, call counts, and latency breakdowns in your account — export to CSV any time.
Production-ready
- p95 under 1 s
Sub-second p95 latency on standard tier. Priority queue on Scale gets you under 500 ms.
- Global GPU pool
A100 / H100 workers across us-west / us-east / eu-central with automatic failover.
- 99.9% uptime
Status page published — every incident documented with root cause, no spin.
Testimonials
Trusted by ML teams and indie developers
Engineers shipping face features in production pick Latentface for pricing transparency and docs quality.
“Swapped out a homegrown face-embedding service for Latentface in an afternoon. p95 is half what we used to see and the Python SDK is actually typed.”
“The HF Spaces demos sold me before I even read pricing. Fork the Space, benchmark against our data, sign up — 30 minutes to first production call.”
“We benchmark every face-swap provider quarterly. Latentface is the one we actually recommend to clients — honest latency numbers, no SLA theater.”
“curl-to-production in under 10 minutes. That is the bar for dev tools in 2026 and almost no one clears it. Latentface does.”
“Rate limits are generous, error messages actually tell you what is wrong, and the TypeScript types are not 'any'. Bar cleared.”
“Our face-verification pipeline processes 40k identities a day. Zero pages this quarter. The usage dashboard told us we overprovisioned by 3x.”
Pricing
Public pay-per-call pricing
Start free, pay for what you use, scale without calling sales.
No credit card. Watermarked output. Ship a proof of concept.
- 100 calls / day
- Watermarked responses
- All 6 endpoints
- Community support
- HF Spaces demos
For side projects, prototypes, and pre-PMF products.
- 10,000 calls / month
- No watermark
- p95 < 2 s standard queue
- Email support
- Python + TypeScript SDKs
Production traffic with priority GPU queue and per-key metrics.
- 100,000 calls / month
- p95 < 1 s priority queue
- Per-key usage dashboard
- Slack support channel
- Webhook notifications
- Bring-your-own S3 output
FAQ
Frequently asked questions
How do I get an API key?
Create a free account, confirm your email, and copy your key from the dashboard. Free-tier keys work immediately — no credit card, no wait list.
Which languages and runtimes do you support?
First-party SDKs for Python (`pip install latentface`) and TypeScript (`npm install @latentface/sdk`). Any language can call the REST API directly — every endpoint has a curl example in the docs.
How is Latentface different from Hugging Face or Replicate?
We focus exclusively on face models — embedding, swap, match, blend, enhance — with a matched Python/TypeScript surface and honest p95 latency numbers. No SLA theater and no demo-request gates. Our models are also published on HF Spaces so you can benchmark against your own data before integrating.
What are the rate limits?
Free: 100 calls/day, 10 rpm. Starter: 10,000 calls/month, 60 rpm. Scale: 100,000 calls/month, 300 rpm with priority queue. Enterprise plans remove the per-minute cap — contact us through the self-serve annual-contract flow.
How do you handle privacy for uploaded images?
Images are processed and discarded within 60 seconds unless you explicitly enable recording to your own S3 bucket. We never train on your inputs. The privacy policy is binding and we publish every sub-processor.
Do you have a free tier that works without a credit card?
Yes. 100 calls per day forever, all six endpoints, watermarked output. Build a prototype, demo it, ship it — only upgrade when you're ready to remove the watermark.