Security

OpenETL implements multiple security layers to protect against common vulnerabilities. This document covers SQL injection protection, credential management, and secure development practices.

Overview

Security features in OpenETL include:

  • SQL Injection Protection - Parameterized queries for all database operations
  • Operator Validation - Whitelisted SQL operators prevent injection attacks
  • Identifier Escaping - Schema, table, and column names properly escaped
  • Credential Isolation - Vault-based credential management
  • Input Validation - Type checking and sanitization

SQL Injection Protection

Parameterized Queries

All database adapters use parameterized queries to prevent SQL injection. Values are never interpolated directly into SQL strings.

Vulnerable approach (NOT used):

// DANGEROUS - Never do this
const query = `SELECT * FROM users WHERE status = '${userInput}'`;

OpenETL approach:

// Safe - Values passed as parameters
const query = {
  text: 'SELECT * FROM "public"."users" WHERE "status" = $1 LIMIT $2 OFFSET $3',
  values: [userInput, limit, offset]
};
await pool.query(query.text, query.values);

PostgreSQL Example

// User-provided filter
filters: [
  { field: 'status', operator: '=', value: "active'; DROP TABLE users;--" }
]

// Generated query (safe)
// SELECT "id", "name" FROM "public"."users" WHERE "status" = $1
// Parameters: ["active'; DROP TABLE users;--"]

The malicious input is treated as a literal string value, not as SQL code.

MySQL Example

MySQL adapter uses ? placeholders:

// Generated query
// SELECT `id`, `name` FROM `mydb`.`users` WHERE `status` = ? LIMIT ? OFFSET ?
// Parameters: ['active', 10, 0]

MongoDB Example

MongoDB adapter validates collection names and field names:

// Collection name validation
if (!/^[a-zA-Z_][a-zA-Z0-9_]*$/.test(collection)) {
  throw new Error('Invalid collection name');
}

// Query built safely using MongoDB driver
const query = { status: userInput };  // Driver handles escaping

Operator Validation

Database adapters validate SQL operators against a whitelist:

const ALLOWED_OPERATORS = new Set([
  '=', '!=', '<>', '<', '>', '<=', '>=',
  'LIKE', 'ILIKE', 'NOT LIKE', 'NOT ILIKE',
  'IN', 'NOT IN', 'IS NULL', 'IS NOT NULL',
  'BETWEEN', 'NOT BETWEEN'
]);

function validateOperator(operator: string): string {
  const normalized = operator.trim().toUpperCase();
  if (!ALLOWED_OPERATORS.has(normalized)) {
    throw new Error(`Invalid SQL operator: '${operator}'`);
  }
  return normalized;
}

Example attack prevented:

// Attempted injection via operator
filters: [
  { field: 'id', operator: '1; DROP TABLE users; --', value: '1' }
]
// Throws: Invalid SQL operator: '1; DROP TABLE users; --'

Identifier Escaping

Schema, table, and column names are escaped to prevent injection via identifiers:

PostgreSQL (Double Quotes)

function escapeIdentifier(identifier: string): string {
  if (!identifier || typeof identifier !== 'string') {
    throw new Error('Invalid identifier');
  }
  // Reject control characters
  if (/[\x00-\x1f\x7f]/.test(identifier)) {
    throw new Error('Invalid identifier: contains control characters');
  }
  // Escape double quotes by doubling them
  return `"${identifier.replace(/"/g, '""')}"`;
}

Example:

config: { schema: 'my"schema', table: 'user data' }
// Escaped: FROM "my""schema"."user data"

MySQL (Backticks)

function escapeIdentifier(identifier: string): string {
  if (!identifier || typeof identifier !== 'string') {
    throw new Error('Invalid identifier');
  }
  // Escape backticks by doubling them
  return `\`${identifier.replace(/`/g, '``')}\``;
}

Credential Management

Vault Isolation

Credentials are stored in the Vault and referenced by ID in pipelines:

// Vault stores sensitive credentials
const vault: Vault = {
  'production-db': {
    id: 'production-db',
    type: 'basic',
    credentials: {
      host: 'db.example.com',
      database: 'production',
      username: process.env.DB_USER!,
      password: process.env.DB_PASSWORD!,
    },
  },
};

// Pipeline references credentials by ID only
const pipeline: Pipeline = {
  id: 'sync',
  source: {
    adapter_id: 'postgresql',
    credential_id: 'production-db',  // Reference, not credentials
    // ...
  },
};

Benefits

  1. Separation of Concerns - Credentials managed separately from pipeline logic
  2. No Credential Exposure - Pipeline configurations don't contain secrets
  3. Environment-Based - Use environment variables for actual credentials
  4. Audit Trail - Credential access tracked by ID

OAuth2 Token Refresh

OAuth2 credentials support automatic token refresh:

const vault: Vault = {
  'hubspot': {
    id: 'hubspot',
    type: 'oauth2',
    credentials: {
      client_id: process.env.HUBSPOT_CLIENT_ID!,
      client_secret: process.env.HUBSPOT_CLIENT_SECRET!,
      access_token: currentAccessToken,
      refresh_token: process.env.HUBSPOT_REFRESH_TOKEN!,
    },
    expires_at: '2024-01-15T10:00:00Z',  // Token expiration
  },
};

When the access token expires, adapters automatically refresh using the refresh token.

Input Validation

Filter Validation

The validatePipeline function validates filter configurations:

import { validatePipeline } from 'openetl';

const validation = validatePipeline(pipeline, adapters, vault);

// Checks performed:
// - Filter field is non-empty
// - Filter operator is non-empty
// - Credential references exist
// - Adapter references exist

Type Checking

OpenETL provides TypeScript definitions for type-safe configurations:

import { Pipeline, Connector, Filter } from 'openetl';

// TypeScript catches type errors at compile time
const filter: Filter = {
  field: 'status',
  operator: '=',
  value: 'active',  // Type checked
};

Custom Query Security

When using custom_query endpoints, SQL injection protection is bypassed:

// CAUTION: custom_query executes raw SQL
source: {
  adapter_id: 'postgresql',
  endpoint_id: 'custom_query',
  config: {
    custom_query: `
      SELECT u.id, u.name
      FROM users u
      WHERE u.created_at > '2024-01-01'
    `,
  },
}

Guidelines for custom queries:

  1. Never interpolate user input into custom queries
  2. Use custom queries only for complex joins or aggregations
  3. Validate any dynamic parts before inclusion
  4. Consider using table_query with filters instead

Secure Adapter Development

When building custom adapters, follow these practices:

1. Use Parameterized Queries

// Database adapter implementation
async download({ limit, offset }) {
  const query = {
    text: `SELECT ${escapedFields} FROM ${escapedTable} LIMIT $1 OFFSET $2`,
    values: [limit, offset]
  };
  return await pool.query(query.text, query.values);
}

2. Validate All Identifiers

function escapeIdentifier(identifier: string): string {
  if (!identifier || typeof identifier !== 'string') {
    throw new Error('Invalid identifier');
  }
  if (/[\x00-\x1f\x7f]/.test(identifier)) {
    throw new Error('Contains control characters');
  }
  return `"${identifier.replace(/"/g, '""')}"`;
}

3. Whitelist Operators

const ALLOWED_OPERATORS = new Set(['=', '!=', '<', '>', '<=', '>=', 'LIKE', 'IN']);

function validateOperator(op: string): string {
  const normalized = op.trim().toUpperCase();
  if (!ALLOWED_OPERATORS.has(normalized)) {
    throw new Error(`Invalid operator: ${op}`);
  }
  return normalized;
}

4. Never Log Credentials

// Bad - exposes credentials
console.log('Connecting with:', auth);

// Good - log without sensitive data
console.log('Connecting to:', auth.credentials.host);

5. Use AdapterError for Security Failures

import { AdapterError } from 'openetl';

// Authentication failure
throw new AdapterError(
  'Authentication failed',
  'AUTHENTICATION_FAILED',
  false  // Not retryable
);

Security Checklist

For Pipeline Authors

  • [ ] Store credentials in environment variables
  • [ ] Use Vault for credential management
  • [ ] Avoid custom_query unless necessary
  • [ ] Validate pipeline configuration before execution
  • [ ] Use validatePipeline() for pre-flight checks

For Adapter Developers

  • [ ] Use parameterized queries for all database operations
  • [ ] Escape all identifiers (tables, columns, schemas)
  • [ ] Validate operators against whitelist
  • [ ] Never interpolate user input into SQL
  • [ ] Never log credential values
  • [ ] Use AdapterError for security-related failures
  • [ ] Validate collection/table names in NoSQL adapters

Additional Resources