Global edge nodes connected to a lightweight web app visualization
Engineering article

Edge AI Inference That Keeps Pages Fast

Marcus Thorne

Marcus Thorne

Lead Systems Architect

PUBLISHED

Mar 18, 2026

On-device and edge inference are trendy - and sometimes wrong for your LCP and INP budgets. Here is how we choose where models run.

TL;DR
Browser inference wins on privacy and repeat visits after warm-up; edge API wins for heavy models; core cloud wins when you need large context or strict compliance enclaves.

When is client-side AI worth the download cost?

Small classification or embedding models (< 50 MB quantized) can run after requestIdleCallback so the first paint stays clean. Always ship a server fallback for low-end devices.

What breaks Core Web Vitals if we are careless?

Synchronous WASM init, blocking main-thread tensor work, and huge hero videos pretending to be “AI previews” will tank INP. Instrument long tasks and cap concurrent sessions per tab.

Need a performance pass on an AI feature? Talk to us about a Core Web Vitals review.

#edge computing #AI #performance #Web APIs