Real-time Architecture
Infrastructure for low-latency, high-availability recommendations
OverviewData & FeaturesModels & AlgorithmsDecisioning PipelineReal-time ArchitectureExplainabilityFeedback LoopMetrics & Impact
< 1-2 sec
Target Latency
99.9%
Availability
> 60%
Cache Hit Rate
> 85%
Model Accuracy
System Architecture
Client Request
API Gateway
Feature Store
Model Service
Cache Layer
Carrier APIs
Response
API Gateway
Entry point for recommendation requests
- Rate limiting
- Authentication
- Request validation
- Load balancing
Feature Store
Pre-computed features for real-time access
- Redis cache layer
- < 10ms lookup
- Feature versioning
- Batch updates
Model Service
ML model inference endpoint
- Containerized models
- A/B test routing
- Model versioning
- GPU inference
Cache Layer
Response caching for repeated queries
- TTL-based expiry
- Cache warming
- Invalidation rules
- Hit rate tracking
Carrier APIs
Real-time carrier integrations
- Async calls
- Circuit breakers
- Timeout handling
- Fallback logic
Failover & Fallback Strategy
Model Unavailable
If ML model service times out or errors:
- Fall back to rules-based ranking
- Use cached predictions if available
- Alert ops team for investigation
Feature Store Down
If feature lookup fails:
- Compute features on-the-fly
- Use default feature values
- Degrade gracefully with less precision
Carrier API Timeout
If carrier APIs don't respond:
- Async retry with exponential backoff
- Show available carriers immediately
- Update UI when slow carriers respond
Async Update Patterns
Batch Feature Updates
Nightly recomputation of features
Model Retraining
Retrain with new bind/renewal data
Cache Invalidation
Clear stale predictions
Carrier Metadata Sync
Update appetite rules