Real-time Architecture

Infrastructure for low-latency, high-availability recommendations

Overview Data & Features Models & Algorithms Decisioning Pipeline Real-time Architecture Explainability Feedback Loop Metrics & Impact

< 1-2 sec

Target Latency

99.9%

Availability

> 60%

Cache Hit Rate

> 85%

Model Accuracy

System Architecture

Client Request

API Gateway

Feature Store

Model Service

Cache Layer

Carrier APIs

Response

API Gateway

Entry point for recommendation requests

Rate limiting
Authentication
Request validation
Load balancing

Feature Store

Pre-computed features for real-time access

Redis cache layer
< 10ms lookup
Feature versioning
Batch updates

Model Service

ML model inference endpoint

Containerized models
A/B test routing
Model versioning
GPU inference

Cache Layer

Response caching for repeated queries

TTL-based expiry
Cache warming
Invalidation rules
Hit rate tracking

Carrier APIs

Real-time carrier integrations

Async calls
Circuit breakers
Timeout handling
Fallback logic

Failover & Fallback Strategy

Model Unavailable

If ML model service times out or errors:

Fall back to rules-based ranking
Use cached predictions if available
Alert ops team for investigation

Feature Store Down

If feature lookup fails:

Compute features on-the-fly
Use default feature values
Degrade gracefully with less precision

Carrier API Timeout

If carrier APIs don't respond:

Async retry with exponential backoff
Show available carriers immediately
Update UI when slow carriers respond

Async Update Patterns

Batch Feature Updates

Nightly recomputation of features

Daily

Model Retraining

Retrain with new bind/renewal data

Weekly

Cache Invalidation

Clear stale predictions

Hourly

Carrier Metadata Sync

Update appetite rules

Real-time