Workflow Orchestration Patterns
Master workflow orchestration architecture with Temporal, covering fundamental design decisions, resilience patterns, and best practices for building reliable distributed systems.
Use this skill when
- Working on workflow orchestration patterns tasks or workflows
- Needing guidance, best practices, or checklists for workflow orchestration patterns
Do not use this skill when
- The task is unrelated to workflow orchestration patterns
- You need a different domain or tool outside this scope
Instructions
- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open
resources/implementation-playbook.md.
When to Use Workflow Orchestration
Ideal Use Cases (Source: docs.temporal.io)
- Multi-step processes spanning machines/services/databases
- Distributed transactions requiring all-or-nothing semantics
- Long-running workflows (hours to years) with automatic state persistence
- Failure recovery that must resume from last successful step
- Business processes: bookings, orders, campaigns, approvals
- Entity lifecycle management: inventory tracking, account management, cart workflows
- Infrastructure automation: CI/CD pipelines, provisioning, deployments
- Human-in-the-loop systems requiring timeouts and escalations
When NOT to Use
- Simple CRUD operations (use direct API calls)
- Pure data processing pipelines (use Airflow, batch processing)
- Stateless request/response (use standard APIs)
- Real-time streaming (use Kafka, event processors)
Critical Design Decision: Workflows vs Activities
The Fundamental Rule (Source: temporal.io/blog/workflow-engine-principles):
- Workflows = Orchestration logic and decision-making
- Activities = External interactions (APIs, databases, network calls)
Workflows (Orchestration)
Characteristics:
- Contain business logic and coordination
- MUST be deterministic (same inputs β same outputs)
- Cannot perform direct external calls
- State automatically preserved across failures
- Can run for years despite infrastructure failures
Example workflow tasks:
- Decide which steps to execute
- Handle compensation logic
- Manage timeouts and retries
- Coordinate child workflows
Activities (External Interactions)
Characteristics:
- Handle all external system interactions
- Can be non-deterministic (API calls, DB writes)
- Include built-in timeouts and retry logic
- Must be idempotent (calling N times = calling once)
- Short-lived (seconds to minutes typically)
Example activity tasks:
- Call payment gateway API
- Write to database
- Send emails or notifications
- Query external services
Design Decision Framework
Does it touch external systems? β Activity
Is it orchestration/decision logic? β Workflow
Core Workflow Patterns
1. Saga Pattern with Compensation
Purpose: Implement distributed transactions with rollback capability
Pattern (Source: temporal.io/blog/compensating-actions-part-of-a-complete-breakfast-with-sagas):
For each step:
1. Register compensation BEFORE executing
2. Execute the step (via activity)
3. On failure, run all compensations in reverse order (LIFO)
Example: Payment Workflow
- Reserve inventory (compensation: release inventory)
- Charge payment (compensation: refund payment)
- Fulfill order (compensation: cancel fulfillment)
Critical Requirements:
- Compensations must be idempotent
- Register compensation BEFORE executing step
- Run compensations in reverse order
- Handle partial failures gracefully
2. Entity Workflows (Actor Model)
Purpose: Long-lived workflow representing single entity instance
Pattern (Source: docs.temporal.io/evaluate/use-cases-design-patterns):
- One workflow execution = one entity (cart, account, inventory item)
- Workflow persists for entity lifetime
- Receives signals for state changes
- Supports queries for current state
Example Use Cases:
- Shopping cart (add items, checkout, expiration)
- Bank account (deposits, withdrawals, balance checks)
- Product inventory (stock updates, reservations)
Benefits:
- Encapsulates entity behavior
- Guarantees consistency per entity
- Natural event sourcing
3. Fan-Out/Fan-In (Parallel Execution)
Purpose: Execute multiple tasks in parallel, aggregate results
Pattern:
- Spawn child workflows or parallel activities
- Wait for all to complete
- Aggregate results
- Handle partial failures
Scaling Rule (Source: temporal.io/blog/workflow-engine-principles):
- Don't scale individual workflows
- For 1M tasks: spawn 1K child workflows Γ 1K tasks each
- Keep each workflow bounded
4. Async Callback Pattern
Purpose: Wait for external event or human approval
Pattern:
- Workflow sends request and waits for signal
- External system processes asynchronously
- Sends signal to resume workflow
- Workflow continues with response
Use Cases:
- Human approval workflows
- Webhook callbacks
- Long-running external processes
State Management and Determinism
Automatic State Preservation
How Temporal Works (Source: docs.temporal.io/workflows):
- Complete program state preserved automatically
- Event History records every command and event
- Seamless recovery from crashes
- Applications restore pre-failure state
Determinism Constraints
Workflows Execute as State Machines:
- Replay behavior must be consistent
- Same inputs β identical outputs every time
Prohibited in Workflows (Source: docs.temporal.io/workflows):
- β Threading, locks, synchronization primitives
- β Random number generation (
random())
- β Global state or static variables
- β System time (
datetime.now())
- β Direct file I/O or network calls
- β Non-deterministic libraries
Allowed in Workflows:
- β
workflow.now() (deterministic time)
- β
workflow.random() (deterministic random)
- β
Pure functions and calculations
- β
Calling activities (non-deterministic operations)
Versioning Strategies
Challenge: Changing workflow code while old executions still running
Solutions:
- Versioning API: Use
workflow.get_version() for safe changes
- New Workflow Type: Create new workflow, route new executions to it
- Backward Compatibility: Ensure old events replay correctly
Resilience and Error Handling
Retry Policies
Default Behavior: Temporal retries activities forever
Configure Retry:
- Initial retry interval
- Backoff coefficient (exponential backoff)
- Maximum interval (cap retry delay)
- Maximum attempts (eventually fail)
Non-Retryable Errors:
- Invalid input (validation failures)
- Business rule violations
- Permanent failures (resource not found)
Idempotency Requirements
Why Critical (Source: docs.temporal.io/activities):
- Activities may execute multiple times
- Network failures trigger retries
- Duplicate execution must be safe
Implementation Strategies:
- Idempotency keys (deduplication)
- Check-then-act with unique constraints
- Upsert operations instead of insert
- Track processed request IDs
Activity Heartbeats
Purpose: Detect stalled long-running activities
Pattern:
- Activity sends periodic heartbeat
- Includes progress information
- Timeout if no heartbeat received
- Enables progress-based retry
Best Practices
Workflow Design
- Keep workflows focused - Single responsibility per workflow
- Small workflows - Use child workflows for scalability
- Clear boundaries - Workflow orchestrates, activities execute
- Test locally - Use time-skipping test environment
Activity Design
- Idempotent operations - Safe to retry
- Short-lived - Seconds to minutes, not hours
- Timeout configuration - Always set timeouts
- Heartbeat for long tasks - Report progress
- Error handling - Distinguish retryable vs non-retryable
Common Pitfalls
Workflow Violations:
- Using
datetime.now() instead of workflow.now()
- Threading or async operations in workflow code
- Calling external APIs directly from workflow
- Non-deterministic logic in workflows
Activity Mistakes:
- Non-idempotent operations (can't handle retries)
- Missing timeouts (activities run forever)
- No error classification (retry validation errors)
- Ignoring payload limits (2MB per argument)
Operational Considerations
Monitoring:
- Workflow execution duration
- Activity failure rates
- Retry attempts and backoff
- Pending workflow counts
Scalability:
- Horizontal scaling with workers
- Task queue partitioning
- Child workflow decomposition
- Activity batching when appropriate
Additional Resources
Official Documentation:
- Temporal Core Concepts: docs.temporal.io/workflows
- Workflow Patterns: docs.temporal.io/evaluate/use-cases-design-patterns
- Best Practices: docs.temporal.io/develop/best-practices
- Saga Pattern: temporal.io/blog/saga-pattern-made-easy
Key Principles:
- Workflows = orchestration, Activities = external calls
- Determinism is non-negotiable for workflows
- Idempotency is critical for activities
- State preservation is automatic
- Design for failure and recovery