Cluster Topology and Deployment Patterns

Clusters are where the physical reality of data logistics meets your architecture. Understanding cluster topology, replication, and deployment patterns is essential for designing systems that balance performance, cost, and compliance.

Node-Based Clusters

A cluster consists of one or more nodes. Nodes provide compute and storage for entities and pipeline execution. Multi-node clusters enable high availability and load distribution.

defineCluster("production-us-east", {
  provider: "aws",
  region: "us-east-1",
  nodes: ["node-1", "node-2", "node-3"],
  replicationFactor: 2  // Each entity stored on 2 nodes
});

defineCluster("on-prem-factory", {
  location: "on-premise",
  nodes: ["edge-node-1", "edge-node-2"],
  replicateTo: ["production-us-east"],  // Sync to cloud
  replicationMode: "async"  // Don't block on cloud sync
});

Built-in vs. BYOC

Built-in AWS Clusters

✓Fully managed by Data Estuary
✓Automatic scaling and maintenance
✓Zero infrastructure management
✓Fast to get started

Bring Your Own Cluster

✓Deploy on-premise, edge, any cloud
✓Full control over infrastructure
✓Reduce bandwidth costs
✓Meet data sovereignty requirements

Replication Strategies

Define how entities are replicated across clusters. Data Estuary handles consistency, conflict resolution, and eventual consistency models.

// Synchronous replication (strong consistency)
defineCluster("critical-cluster", {
  replicationMode: "sync",
  replicateTo: ["backup-cluster"],
  consistency: "strong"  // Writes wait for replication
});

// Asynchronous replication (eventual consistency)
defineCluster("edge-cluster", {
  replicationMode: "async",
  replicateTo: ["central-cloud"],
  consistency: "eventual"  // Low-latency writes
});

// Selective replication (by entity type)
defineCluster("regional-cluster", {
  replicateEntityTypes: ["Order", "Customer"],
  excludeEntityTypes: ["InternalLog"],
  replicateTo: ["global-cluster"]
});

Common Deployment Patterns

Fully Cloud

Use built-in AWS clusters across multiple regions for a fully managed, globally distributed setup. Best for startups, SaaS products, and teams without infrastructure expertise.

aws-us-east-1↔aws-eu-west-1↔aws-ap-south-1

Hybrid (On-Prem + Cloud)

Run pipelines and store sensitive data on-premise while replicating summaries to cloud for analytics. Best for manufacturing, IoT, and enterprises with data sovereignty requirements.

on-prem-factory→ aggregates locally →aws-eu-central-1

Raw data stays on-prem, only processed insights sync to cloud

Multi-Region with Edge

Edge clusters for low-latency writes, regional clusters for compliance, global cluster for analytics. Best for global e-commerce, retail chains, and distributed enterprises.

Best Practices

Start Simple

Begin with a single cloud cluster and add complexity as needed. Don't over-architect from day one.

Consider Data Gravity

Place clusters where data is generated or consumed most frequently. This reduces latency and bandwidth costs.

Plan for Compliance

Use regional clusters and selective replication to meet data residency requirements. Design your topology with compliance in mind from the start.

Migration Scenarios

Clusters enable gradual migration between environments. Set up bidirectional sync between your current infrastructure and Data Estuary, then migrate workloads incrementally without downtime.

This approach transforms migration from a risky big-bang cutover into a controlled, reversible process.