Rate Limiting and Pagination
OpenETL provides configuration options for managing API rate limits and paginating large datasets. This document covers rate limiting and pagination configuration and implementation.
Rate Limiting
Definition
Rate limiting controls the frequency and concurrency of requests to data sources. This prevents API throttling, service bans, and HTTP 429 errors. OpenETL implements rate limiting at the pipeline level with configurable retry logic.
Pagination
Definition
Pagination divides large result sets into smaller chunks retrieved sequentially. OpenETL supports multiple pagination strategies (offset, cursor, page-based) to accommodate different API implementations.
Configuring Rate Limiting
Rate limiting is set in the pipeline's rate_limiting option, controlling request pace and retries:
| Property | Description |
|---|---|
requests_per_second |
Max requests per second |
concurrent_requests |
Max simultaneous requests |
max_retries_on_rate_limit |
Retry attempts on rate limit errors |
rate_limiting: {
requests_per_second: 10,
concurrent_requests: 5,
max_retries_on_rate_limit: 3,
}
Configuration limits:
- Maximum 10 requests per second
- Maximum 5 concurrent requests
- Up to 3 retry attempts on rate limit errors
Configuring Pagination
Pagination is configured in a connector's pagination option, defining how data is chunked:
type:'offset','cursor', or'page'.itemsPerPage?: Items per fetch (default: 100).pageOffsetKey?: Starting offset (e.g.,'0').cursorKey?: Cursor token for cursor-based APIs.
Offset Example:
pagination: {
type: 'offset',
itemsPerPage: 50,
pageOffsetKey: '0',
}
Cursor Example:
pagination: {
type: 'cursor',
itemsPerPage: 100,
}
Offset pagination uses numeric offsets, cursor pagination uses opaque tokens from the API response.
Combined Configuration
Example combining rate limiting and pagination:
import Orchestrator from 'openetl';
import { hubspot } from '@openetl/hubspot';
const vault = { 'hs-auth': { type: 'oauth2', credentials: { /* ... */ } } };
const orchestrator = Orchestrator(vault, { hubspot });
orchestrator.runPipeline({
id: 'hs-contacts',
source: {
adapter_id: 'hubspot',
endpoint_id: 'contacts',
credential_id: 'hs-auth',
fields: ['firstname'],
pagination: { type: 'cursor', itemsPerPage: 50 },
},
rate_limiting: {
requests_per_second: 5,
concurrent_requests: 2,
max_retries_on_rate_limit: 2,
},
logging: event => console.log(event),
});
This configuration:
- Retrieves 50 records per page using cursor pagination
- Limits to 5 requests per second
- Allows 2 concurrent requests
- Retries twice on rate limit errors
- Logs events for monitoring
Adapters manage pagination tokens (e.g., nextOffset), while the Orchestrator enforces rate limits.
Additional Resources
- Error Handling: Error management and retry configuration
- Pipelines: Pipeline configuration options
- Adapters: Adapter-specific pagination implementation