InsureFlow
/
Settings

Executive Overview

Real-time Architecture

Infrastructure for low-latency, high-availability recommendations

Next: Explainability

< 1-2 sec

Target Latency

99.9%

Availability

> 60%

Cache Hit Rate

> 85%

Model Accuracy

System Architecture

Client Request

API Gateway

Feature Store

Model Service

Cache Layer

Carrier APIs

Response

API Gateway

Entry point for recommendation requests

  • Rate limiting
  • Authentication
  • Request validation
  • Load balancing

Feature Store

Pre-computed features for real-time access

  • Redis cache layer
  • < 10ms lookup
  • Feature versioning
  • Batch updates

Model Service

ML model inference endpoint

  • Containerized models
  • A/B test routing
  • Model versioning
  • GPU inference

Cache Layer

Response caching for repeated queries

  • TTL-based expiry
  • Cache warming
  • Invalidation rules
  • Hit rate tracking

Carrier APIs

Real-time carrier integrations

  • Async calls
  • Circuit breakers
  • Timeout handling
  • Fallback logic
Failover & Fallback Strategy

Model Unavailable

If ML model service times out or errors:

  • Fall back to rules-based ranking
  • Use cached predictions if available
  • Alert ops team for investigation

Feature Store Down

If feature lookup fails:

  • Compute features on-the-fly
  • Use default feature values
  • Degrade gracefully with less precision

Carrier API Timeout

If carrier APIs don't respond:

  • Async retry with exponential backoff
  • Show available carriers immediately
  • Update UI when slow carriers respond
Async Update Patterns

Batch Feature Updates

Nightly recomputation of features

Daily

Model Retraining

Retrain with new bind/renewal data

Weekly

Cache Invalidation

Clear stale predictions

Hourly

Carrier Metadata Sync

Update appetite rules

Real-time