Products

Rate Limiting and Pagination

Managing data fetching in OpenETL often involves handling API rate limits and paginating large datasets. This section explains these concepts and how to configure them effectively.

What is Rate Limiting?

Rate limiting caps the frequency or volume of requests to a data source, preventing overload or bans (e.g., "429 Too Many Requests" from APIs). In OpenETL, it ensures pipelines respect source constraints, retrying when needed.

What is Pagination?

Pagination splits large datasets into smaller chunks, fetched page-by-page (e.g., 100 records at a time). OpenETL uses pagination to handle big data efficiently, avoiding memory overload and timeouts.

Configuring Rate Limiting

Rate limiting is set in the pipeline's rate_limiting option, controlling request pace and retries:

Property Description
requests_per_second Max requests per second
concurrent_requests Max simultaneous requests
max_retries_on_rate_limit Retry attempts on rate limit errors
rate_limiting: {
  requests_per_second: 10,
  concurrent_requests: 5,
  max_retries_on_rate_limit: 3,
}

This caps requests at 10/sec, allows 5 at once, and retries up to 3 times if throttled.

Configuring Pagination

Pagination is configured in a connector's pagination option, defining how data is chunked:

  • type: 'offset', 'cursor', or 'page'.
  • itemsPerPage?: Items per fetch (default: 100).
  • pageOffsetKey?: Starting offset (e.g., '0').
  • cursorKey?: Cursor token for cursor-based APIs.

Offset Example:

pagination: {
  type: 'offset',
  itemsPerPage: 50,
  pageOffsetKey: '0',
}

Cursor Example:

pagination: {
  type: 'cursor',
  itemsPerPage: 100,
}

Offset uses numeric steps; cursor uses tokens (e.g., HubSpot's after).

Handling Rate Limits and Pagination in Pipelines

Combine both in a pipeline for robust data fetching:

import Orchestrator from 'openetl';
import { hubspot } from '@openetl/hubspot';

const vault = { 'hs-auth': { type: 'oauth2', credentials: { /* ... */ } } };
const orchestrator = Orchestrator(vault, { hubspot });

orchestrator.runPipeline({
  id: 'hs-contacts',
  source: {
    adapter_id: 'hubspot',
    endpoint_id: 'contacts',
    credential_id: 'hs-auth',
    fields: ['firstname'],
    pagination: { type: 'cursor', itemsPerPage: 50 },
  },
  rate_limiting: {
    requests_per_second: 5,
    concurrent_requests: 2,
    max_retries_on_rate_limit: 2,
  },
  logging: event => console.log(event),
});

This fetches HubSpot contacts 50 at a time using cursor pagination, limits requests to 5/sec with 2 concurrent, retries twice on rate limits, and logs progress. Adapters handle pagination responses (e.g., nextOffset), while the Orchestrator enforces rate limits.

Next: Error Handling!