Deep Dive: The Entity System

The entity system is the foundation of Data Estuary's approach to data logistics. Understanding how entities work—their identity model, type system, and state management—is crucial to architecting effective distributed systems with Data Estuary.

Identity Model

Every entity in Data Estuary is uniquely identified by the combination of its ID and Type. This combination is globally unique within the estuary, enabling consistent replication and references across clusters.

// Entity identity
{
  id: "usr_abc123",
  type: "User"
}

// Two entities with same ID but different types are distinct
{
  id: "2024-01-15",
  type: "DailySalesReport"  // Different entity
}
{
  id: "2024-01-15",
  type: "DailyLogSummary"   // Different entity
}

This dual-key approach provides flexibility while maintaining global uniqueness. You can use meaningful IDs (like dates, SKUs, or usernames) without worrying about collisions across different entity types.

Entity Types

An entity type defines the structure of entities through typed fields. Think of it as a schema that gets enforced across all clusters where the entity is stored.

defineEntityType("Order", {
  fields: {
    orderId: "string",
    customerId: "string",
    items: "json",
    totalAmount: "number",
    currency: "string",
    status: "string",
    createdAt: "date",
    updatedAt: "date"
  }
});

defineEntityType("InventoryItem", {
  fields: {
    sku: "string",
    name: "string",
    quantityAvailable: "number",
    warehouseLocation: "string",
    lastRestocked: "date",
    reorderThreshold: "number"
  }
});

Field types supported: string, number, boolean, date, json (and more). The platform validates field types across all operations.

Entity States

Entity states allow you to track lifecycle transitions with validation rules. States are represented by one or more fields on the entity and can have defined transition rules.

defineEntityState("Order", {
  stateField: "status",
  states: {
    "pending": {
      allowedTransitions: ["confirmed", "cancelled"]
    },
    "confirmed": {
      allowedTransitions: ["shipped", "cancelled"]
    },
    "shipped": {
      allowedTransitions: ["delivered", "returned"]
    },
    "delivered": {
      allowedTransitions: ["returned"]
    },
    "cancelled": {
      allowedTransitions: []  // Terminal state
    },
    "returned": {
      allowedTransitions: []  // Terminal state
    }
  }
});

The platform enforces state transitions across all clusters, ensuring consistency even in distributed scenarios. Invalid transitions are rejected automatically, preventing data integrity issues.

Why This Matters

This entity model enables several powerful capabilities:

•Global consistency: The same entity structure is enforced across all clusters
•Type safety: Field types are validated at runtime, preventing data corruption
•State guarantees: Invalid state transitions are impossible, even in distributed environments
•Flexible identity: Use meaningful IDs without collision concerns

Best Practices

Choosing Entity Types

Entity types should represent your domain concepts, not your database tables. Think in terms of business objects that need to move between systems.

Designing State Machines

Keep state machines simple and focused. Complex state machines can become difficult to reason about in distributed systems. Consider splitting complex workflows into multiple entity types.

Field Type Selection

Use the most specific type possible. Use date instead of string for timestamps, use number for quantities. The platform can optimize storage and querying based on types.

Next Steps

Understanding entities is just the first step. Next, explore how pipelines operate on entities and how clusters determine where your entities live. Together, these three concepts form the complete picture of Data Estuary's architecture.