Error Handling

Things break. Databases go down, APIs return errors, and data doesn't always match your expectations. This guide covers how Acme handles failures and how to configure resilient pipelines.

Error strategies

Acme supports three error strategies:

graph TD
    E[Error occurs] --> S{Strategy?}
    S -->|fail| F[Stop pipeline immediately]
    S -->|retry| R[Retry up to N times]
    S -->|skip| K[Log error, skip row]

    R -->|success| C[Continue processing]
    R -->|exhausted| F

    style F fill:#fecaca,stroke:#dc2626
    style C fill:#dcfce7,stroke:#16a34a
    style K fill:#fef9c3,stroke:#ca8a04

fail (default)

The pipeline stops immediately on the first error. Best for pipelines where data integrity is critical.

error_handling:
  strategy: fail

retry

Retry the failed operation up to N times with configurable backoff.

error_handling:
  strategy: retry
  max_retries: 3
  retry_delay: 10s
  backoff: exponential # linear | exponential | constant
Exponential backoff

With exponential backoff and a base delay of 10s, retries happen at: 10s, 20s, 40s. This helps when the downstream system is temporarily overloaded.

skip

Skip the failing row and continue processing. Failed rows are logged and optionally sent to a dead-letter queue.

error_handling:
  strategy: skip
  max_errors: 100 # stop after 100 errors in a single run
  dead_letter:
    type: json
    path: ./errors/${pipeline_name}_${run_id}.json

Dead-letter queues

When using the skip strategy, failed rows can be captured for later inspection:

error_handling:
  strategy: skip
  dead_letter:
    type: json
    path: ./errors/

    # Or send to a database
    # type: postgres
    # connection: ${ERROR_DB_URL}
    # table: pipeline_errors

Each failed row is stored with metadata:

{
  "row": { "id": 42, "email": "bad-data", "age": "not_a_number" },
  "error": "ValueError: invalid literal for int(): 'not_a_number'",
  "pipeline": "user-analytics",
  "transform": "map",
  "timestamp": "2026-02-15T06:12:34Z"
}
Reprocessing failed rows

You can use a dead-letter JSON file as a source for a recovery pipeline:

sources:
  - type: json
    path: ./errors/user-analytics_run_001.json

Notifications

Get alerted when pipelines fail:

notifications:
  on_failure:
    - type: slack
      webhook: ${SLACK_WEBHOOK}
      channel: "#data-alerts"
    - type: email
      to: team@example.com
  on_success:
    - type: slack
      webhook: ${SLACK_WEBHOOK}
      channel: "#data-logs"
      # Only notify on the first success after a failure
      only_after_failure: true

Common errors and solutions

ErrorCauseSolution
ConnectionRefusedDatabase is down or unreachableCheck host/port, firewall rules, SSL config
AuthenticationFailedInvalid credentialsVerify ${DATABASE_URL}, check user permissions
SchemaValidationErrorRow doesn't match expected schemaCheck source data, add a filter before validation
RateLimitExceededAPI destination is throttlingReduce batch_size, add retry with backoff
OutOfMemoryBatch too large for available RAMReduce batch_size or workers
Warning

The skip strategy can silently drop data if misconfigured. Always set a max_errors limit and monitor your dead-letter queue.

  • Monitoring — set up dashboards for error rates
  • Pipelines — pipeline error handling configuration
  • Testing Pipelines — catch errors before they reach production
Built with LogoFlowershow