Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.conversion.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through connecting Databricks to Conversion and setting up your first sync.

Before You Begin

Before connecting Databricks, we recommend the following:
  • Create a dedicated service principal: Set up a Databricks service principal specifically for Conversion with read-only access to the tables you need. This limits exposure and makes it easy to revoke access if needed.
  • Use a dedicated SQL warehouse: Consider creating a dedicated SQL warehouse for Conversion queries to prevent sync operations from impacting other workloads.
  • Sync only the data you need: Select only the columns you’ll use, and always filter with last_sync_time to limit results to changed rows.
Queries that return millions of rows should not run more frequently than once per day. Large, frequent syncs can impact both your Databricks costs and Conversion performance.

Step 1: Set Up Databricks Access

Conversion connects to Databricks using OAuth authentication with a service principal. You’ll need to create a service principal, generate OAuth credentials, and grant the appropriate permissions.

Create a Service Principal

  1. In your Databricks workspace, click your user dropdown at the top right and select Settings
  2. Navigate to Identity and accessService Principals and click Manage
  3. Click Add new, give your service principal a name (e.g., conversion_sync), and click Add
  4. Note the Application ID (a UUID like 63fc7e90-a8a2-4639-afd8-36ef6bb67cfa); you’ll need this later

Generate OAuth Credentials

  1. On the Service Principal page, go to the Secrets tab
  2. Click Generate secret
  3. Save the Client ID and Secret securely; the secret is only displayed once
Keep your OAuth credentials secure. You’ll enter them into Conversion when setting up the connection.

Grant Catalog Permissions

Grant your service principal read access to the tables you want to sync:
  1. In the Databricks sidebar, click Catalog and locate your schema
  2. Select the schema and open the Permissions page
  3. Click Grant and enter your service principal name
  4. Grant the following permissions:
    • USE SCHEMA
    • EXECUTE
    • READ VOLUME
    • SELECT
  5. Enable “Also grant USE CATALOG” at the bottom
-- Alternatively, run these commands in a SQL worksheet
GRANT USE CATALOG ON CATALOG my_catalog TO `conversion_sync`;
GRANT USE SCHEMA ON SCHEMA my_catalog.my_schema TO `conversion_sync`;
GRANT SELECT ON SCHEMA my_catalog.my_schema TO `conversion_sync`;

Grant Warehouse Permissions

Ensure your service principal can use the SQL warehouse:
  1. In the Databricks sidebar, click SQL Warehouses
  2. Find your warehouse, click the menu, then Permissions
  3. Add your service principal and set permissions to Can use

Step 2: Allow Conversion’s IP Addresses

If your Databricks workspace uses IP access lists, allow connections from Conversion’s IP addresses:
RegionIP Addresses
US35.239.90.161, 35.188.167.166, 34.56.101.43, 34.122.97.230, 34.29.176.66, 35.226.154.44

Step 3: Gather Connection Details

Before connecting to Conversion, gather these details from your Databricks workspace:
  1. Go to SQL Warehouses and select your warehouse
  2. Click the Connection details tab
  3. Note the following:
    • Server hostname (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)
    • Port (typically 443)
    • HTTP path (e.g., /sql/1.0/warehouses/abc123def456)

Step 4: Connect Databricks to Conversion

  1. In Conversion, go to Settings → Integrations → Data Warehouse
  2. Click Add Integration and select Databricks
  3. Enter your connection details:
    • Name: A friendly name for this connection (e.g., “Production Databricks”)
    • Server hostname: Your Databricks server hostname
    • Port: The port number (default: 443)
    • HTTP path: The HTTP path to your SQL warehouse
    • Catalog: The Unity Catalog name (optional, uses default catalog if empty)
    • Schema: The default schema
    • Client ID: The OAuth Client ID from your service principal
    • Client Secret: The OAuth secret you generated
  4. Click Connect
Conversion will verify the connection. If successful, you’re ready to create syncs.

Step 5: Create a Sync

After connecting, open your new BigQuery connection and go to the Syncs tab to create a sync. Read more about Setting Up a Sync.

Databricks SQL Reference

Converting Timestamps

Conversion expects Unix timestamps for date/time fields. Use unix_timestamp() to convert Databricks TIMESTAMP values:
SELECT 
  email,
  unix_timestamp(created_at) AS created_at,
  unix_timestamp(last_login) AS last_login
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Using last_sync_time

The last_sync_time variable is a Unix timestamp. Convert it to a timestamp for comparison:
WHERE updated_at >= from_unixtime({{last_sync_time}})
Or compare directly if your column stores Unix timestamps:
WHERE updated_at_epoch >= {{last_sync_time}}

Building Nested Objects with named_struct

Use named_struct to create nested objects like relationshipFields:
named_struct(
  'role', role,
  'quantity', quantity,
  'started_at', unix_timestamp(started_at)
) AS relationshipFields
Alternatively, use to_json with a struct:
to_json(named_struct(
  'role', role,
  'quantity', quantity
)) AS relationshipFields

Converting Booleans

Databricks booleans work directly, but you can convert to strings if needed:
SELECT 
  email,
  CASE WHEN is_active THEN 'true' ELSE 'false' END AS is_active,
  CASE WHEN is_admin THEN 'true' ELSE 'false' END AS is_admin
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Handling NULLs

Use COALESCE or NVL to provide default values:
SELECT 
  email,
  COALESCE(first_name, '') AS first_name,
  NVL(phone, '') AS phone
FROM users

Casting Types

Use CAST to convert between types:
SELECT 
  CAST(user_id AS STRING) AS id,
  CAST(score AS INT) AS lead_score
FROM users

Querying Across Catalogs

Use fully-qualified table names to query from multiple catalogs:
SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id

Troubleshooting

”Permission denied” errors

Ensure your service principal has the required grants:
  • USE CATALOG on the catalog
  • USE SCHEMA on the schema
  • SELECT on each table you want to query
  • Can use permission on the SQL warehouse

”Could not connect” errors

Verify that:
  • Your server hostname is correct
  • The HTTP path points to a valid SQL warehouse
  • The OAuth Client ID and Secret are correct
  • Conversion’s IP addresses are allowed if you use IP access lists

”Unauthorized/Forbidden: 403” errors

This typically indicates expired or invalid OAuth credentials. Generate a new secret for your service principal and update the connection in Conversion.

Sync taking too long

  • Ensure you’re filtering by last_sync_time to reduce rows
  • Select only the columns you need
  • Consider using a larger SQL warehouse size
  • Reduce sync frequency for large datasets

Subquery errors

If you encounter errors like Subquery has not finished, try rewriting your query to remove subqueries. Use Common Table Expressions (CTEs) instead:
-- Instead of subqueries
WITH filtered_users AS (
  SELECT * FROM users
  WHERE status = 'active'
)
SELECT * FROM filtered_users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Frequently Asked Questions

The service principal needs USE CATALOG, USE SCHEMA, and SELECT on the specific tables you want to sync, plus Can use permission on the SQL warehouse. Conversion only reads data; it never writes to your Databricks workspace.
We recommend using OAuth with a service principal for better security and manageability. However, if you need to use a Personal Access Token:
  1. Go to User Settings → Developer → Access tokens in Databricks
  2. Click Generate new token
  3. Use this token in place of the OAuth credentials when connecting
Use fully-qualified table names in your query:
SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id
If you sync a custom object with an email that doesn’t exist in Conversion, we create the contact automatically. You can enrich that contact’s profile through a separate contacts sync.