Clusters are where the physical reality of data logistics meets your architecture. Understanding cluster topology, replication, and deployment patterns is essential for designing systems that balance performance, cost, and compliance.
Node-Based Clusters
A cluster consists of one or more nodes. Nodes provide compute and storage for entities and pipeline execution. Multi-node clusters enable high availability and load distribution.
defineCluster("production-us-east", {
provider: "aws",
region: "us-east-1",
nodes: ["node-1", "node-2", "node-3"],
replicationFactor: 2 // Each entity stored on 2 nodes
});
defineCluster("on-prem-factory", {
location: "on-premise",
nodes: ["edge-node-1", "edge-node-2"],
replicateTo: ["production-us-east"], // Sync to cloud
replicationMode: "async" // Don't block on cloud sync
});Built-in vs. BYOC
Built-in AWS Clusters
- ✓Fully managed by Data Estuary
- ✓Automatic scaling and maintenance
- ✓Zero infrastructure management
- ✓Fast to get started
Bring Your Own Cluster
- ✓Deploy on-premise, edge, any cloud
- ✓Full control over infrastructure
- ✓Reduce bandwidth costs
- ✓Meet data sovereignty requirements
Replication Strategies
Define how entities are replicated across clusters. Data Estuary handles consistency, conflict resolution, and eventual consistency models.
// Synchronous replication (strong consistency)
defineCluster("critical-cluster", {
replicationMode: "sync",
replicateTo: ["backup-cluster"],
consistency: "strong" // Writes wait for replication
});
// Asynchronous replication (eventual consistency)
defineCluster("edge-cluster", {
replicationMode: "async",
replicateTo: ["central-cloud"],
consistency: "eventual" // Low-latency writes
});
// Selective replication (by entity type)
defineCluster("regional-cluster", {
replicateEntityTypes: ["Order", "Customer"],
excludeEntityTypes: ["InternalLog"],
replicateTo: ["global-cluster"]
});Common Deployment Patterns
Fully Cloud
Use built-in AWS clusters across multiple regions for a fully managed, globally distributed setup. Best for startups, SaaS products, and teams without infrastructure expertise.
Hybrid (On-Prem + Cloud)
Run pipelines and store sensitive data on-premise while replicating summaries to cloud for analytics. Best for manufacturing, IoT, and enterprises with data sovereignty requirements.
Raw data stays on-prem, only processed insights sync to cloud
Multi-Region with Edge
Edge clusters for low-latency writes, regional clusters for compliance, global cluster for analytics. Best for global e-commerce, retail chains, and distributed enterprises.
Best Practices
Start Simple
Begin with a single cloud cluster and add complexity as needed. Don't over-architect from day one.
Consider Data Gravity
Place clusters where data is generated or consumed most frequently. This reduces latency and bandwidth costs.
Plan for Compliance
Use regional clusters and selective replication to meet data residency requirements. Design your topology with compliance in mind from the start.
Migration Scenarios
Clusters enable gradual migration between environments. Set up bidirectional sync between your current infrastructure and Data Estuary, then migrate workloads incrementally without downtime.
This approach transforms migration from a risky big-bang cutover into a controlled, reversible process.