Products

Rate Limiting and Pagination

OpenETL provides configuration options for managing API rate limits and paginating large datasets. This document covers rate limiting and pagination configuration and implementation.

Rate Limiting

Definition

Rate limiting controls the frequency and concurrency of requests to data sources. This prevents API throttling, service bans, and HTTP 429 errors. OpenETL implements rate limiting at the pipeline level with configurable retry logic.

Pagination

Definition

Pagination divides large result sets into smaller chunks retrieved sequentially. OpenETL supports multiple pagination strategies (offset, cursor, page-based) to accommodate different API implementations.

Configuring Rate Limiting

Rate limiting is set in the pipeline's rate_limiting option, controlling request pace and retries:

Property Description
requests_per_second Max requests per second
concurrent_requests Max simultaneous requests
max_retries_on_rate_limit Retry attempts on rate limit errors
rate_limiting: {
  requests_per_second: 10,
  concurrent_requests: 5,
  max_retries_on_rate_limit: 3,
}

Configuration limits:

  • Maximum 10 requests per second
  • Maximum 5 concurrent requests
  • Up to 3 retry attempts on rate limit errors

Configuring Pagination

Pagination is configured in a connector's pagination option, defining how data is chunked:

  • type: 'offset', 'cursor', or 'page'.
  • itemsPerPage?: Items per fetch (default: 100).
  • pageOffsetKey?: Starting offset (e.g., '0').
  • cursorKey?: Cursor token for cursor-based APIs.

Offset Example:

pagination: {
  type: 'offset',
  itemsPerPage: 50,
  pageOffsetKey: '0',
}

Cursor Example:

pagination: {
  type: 'cursor',
  itemsPerPage: 100,
}

Offset pagination uses numeric offsets, cursor pagination uses opaque tokens from the API response.

Combined Configuration

Example combining rate limiting and pagination:

import Orchestrator from 'openetl';
import { hubspot } from '@openetl/hubspot';

const vault = { 'hs-auth': { type: 'oauth2', credentials: { /* ... */ } } };
const orchestrator = Orchestrator(vault, { hubspot });

orchestrator.runPipeline({
  id: 'hs-contacts',
  source: {
    adapter_id: 'hubspot',
    endpoint_id: 'contacts',
    credential_id: 'hs-auth',
    fields: ['firstname'],
    pagination: { type: 'cursor', itemsPerPage: 50 },
  },
  rate_limiting: {
    requests_per_second: 5,
    concurrent_requests: 2,
    max_retries_on_rate_limit: 2,
  },
  logging: event => console.log(event),
});

This configuration:

  • Retrieves 50 records per page using cursor pagination
  • Limits to 5 requests per second
  • Allows 2 concurrent requests
  • Retries twice on rate limit errors
  • Logs events for monitoring

Adapters manage pagination tokens (e.g., nextOffset), while the Orchestrator enforces rate limits.

Additional Resources