On-device and edge inference are trendy - and sometimes wrong for your LCP and INP budgets. Here is how we choose where models run.
TL;DR
Browser inference wins on privacy and repeat visits after warm-up; edge API wins for heavy models; core cloud wins when you need large context or strict compliance enclaves.
When is client-side AI worth the download cost?
Small classification or embedding models (< 50 MB quantized) can run after requestIdleCallback so the first paint stays clean. Always ship a server fallback for low-end devices.
What breaks Core Web Vitals if we are careless?
Synchronous WASM init, blocking main-thread tensor work, and huge hero videos pretending to be “AI previews” will tank INP. Instrument long tasks and cap concurrent sessions per tab.
Need a performance pass on an AI feature? Talk to us about a Core Web Vitals review.