Skip to main content
This guide walks you through connecting Databricks to Conversion and setting up your first sync.

Before You Begin

Before connecting Databricks, we recommend the following:
  • Create a dedicated service principal — Set up a Databricks service principal specifically for Conversion with read-only access to the tables you need. This limits exposure and makes it easy to revoke access if needed.
  • Use a dedicated SQL warehouse — Consider creating a dedicated SQL warehouse for Conversion queries to prevent sync operations from impacting other workloads.
  • Sync only the data you need — Select only the columns you’ll use, and always filter with last_sync_time to limit results to changed rows.
Queries that return millions of rows should not run more frequently than once per day. Large, frequent syncs can impact both your Databricks costs and Conversion performance.

Step 1: Set Up Databricks Access

Conversion connects to Databricks using OAuth authentication with a service principal. You’ll need to create a service principal, generate OAuth credentials, and grant the appropriate permissions.

Create a Service Principal

  1. In your Databricks workspace, click your user dropdown at the top right and select Settings
  2. Navigate to Identity and accessService Principals and click Manage
  3. Click Add new, give your service principal a name (e.g., conversion_sync), and click Add
  4. Note the Application ID (a UUID like 63fc7e90-a8a2-4639-afd8-36ef6bb67cfa) — you’ll need this later

Generate OAuth Credentials

  1. On the Service Principal page, go to the Secrets tab
  2. Click Generate secret
  3. Save the Client ID and Secret securely — the secret is only displayed once
Keep your OAuth credentials secure. You’ll enter them into Conversion when setting up the connection.

Grant Catalog Permissions

Grant your service principal read access to the tables you want to sync:
  1. In the Databricks sidebar, click Catalog and locate your schema
  2. Select the schema and open the Permissions page
  3. Click Grant and enter your service principal name
  4. Grant the following permissions:
    • USE SCHEMA
    • EXECUTE
    • READ VOLUME
    • SELECT
  5. Enable “Also grant USE CATALOG” at the bottom
-- Alternatively, run these commands in a SQL worksheet
GRANT USE CATALOG ON CATALOG my_catalog TO `conversion_sync`;
GRANT USE SCHEMA ON SCHEMA my_catalog.my_schema TO `conversion_sync`;
GRANT SELECT ON SCHEMA my_catalog.my_schema TO `conversion_sync`;

Grant Warehouse Permissions

Ensure your service principal can use the SQL warehouse:
  1. In the Databricks sidebar, click SQL Warehouses
  2. Find your warehouse, click the menu, then Permissions
  3. Add your service principal and set permissions to Can use

Step 2: Allow Conversion’s IP Addresses

If your Databricks workspace uses IP access lists, allow connections from Conversion’s IP addresses:
RegionIP Addresses
US35.239.90.161, 35.188.167.166, 34.56.101.43, 34.122.97.230, 34.29.176.66, 35.226.154.44

Step 3: Gather Connection Details

Before connecting to Conversion, gather these details from your Databricks workspace:
  1. Go to SQL Warehouses and select your warehouse
  2. Click the Connection details tab
  3. Note the following:
    • Server hostname (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)
    • Port (typically 443)
    • HTTP path (e.g., /sql/1.0/warehouses/abc123def456)

Step 4: Connect Databricks to Conversion

  1. In Conversion, go to Settings → Integrations → Data Warehouse
  2. Click Add Integration and select Databricks
  3. Enter your connection details:
    • Name — A friendly name for this connection (e.g., “Production Databricks”)
    • Server hostname — Your Databricks server hostname
    • Port — The port number (default: 443)
    • HTTP path — The HTTP path to your SQL warehouse
    • Catalog — The Unity Catalog name (optional, uses default catalog if empty)
    • Schema — The default schema
    • Client ID — The OAuth Client ID from your service principal
    • Client Secret — The OAuth secret you generated
  4. Click Connect
Conversion will verify the connection. If successful, you’re ready to create syncs.

Step 5: Create a Sync

After connecting, create a sync to start pulling data:
  1. Click Add Sync
  2. Select the destination type:
    • Contacts — Create or update contact records
    • Custom Objects — Create or update custom object records (with optional relationships)
    • Events — Record events on contacts
  3. For Custom Objects, select the specific object type
  4. Enter your SQL query (see examples below)
  5. Click Run Query to preview results
  6. Set the sync schedule (how often to run)
  7. Click Enable

Writing Queries for Databricks

Contact Sync

SELECT 
  email,
  first_name,
  last_name,
  title AS job_title,
  phone,
  city
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Custom Objects (Without Relationships)

To sync objects without relating them to contacts, omit the email column:
SELECT 
  id,
  name,
  status,
  plan_tier,
  monthly_price
FROM subscriptions
WHERE updated_at >= from_unixtime({{last_sync_time}})

Custom Objects with Relationships

To sync custom objects and their relationships to contacts in a single query, include the contact’s email and use named_struct or to_json for relationship-specific fields:
SELECT 
  -- Object fields
  p.id,
  p.name,
  p.sku,
  p.category,
  p.price,
  
  -- Contact to relate this object to
  u.email,
  
  -- Relationship-specific fields
  named_struct(
    'role', up.role,
    'quantity', up.quantity,
    'purchased_at', unix_timestamp(up.purchased_at)
  ) AS relationshipFields
  
FROM user_products up
JOIN products p ON up.product_id = p.id
JOIN users u ON up.user_id = u.id
WHERE up.updated_at >= from_unixtime({{last_sync_time}})
In this example:
  • Object fields (id, name, sku, category, price) are stored on the Product object
  • email identifies which contact to create a relationship with
  • relationshipFields contains fields stored on the relationship itself (this contact’s role, quantity, etc.)
If multiple rows share the same object id but have different email values, Conversion creates one object with multiple relationships—one per contact.

Events

SELECT 
  u.email,
  'purchase_completed' AS event_name,
  o.created_at AS timestamp,
  CAST(o.id AS STRING) AS message_id,
  o.total_amount,
  o.product_name,
  o.currency
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at >= from_unixtime({{last_sync_time}})

Databricks SQL Reference

Converting Timestamps

Conversion expects Unix timestamps for date/time fields. Use unix_timestamp() to convert Databricks TIMESTAMP values:
SELECT 
  email,
  unix_timestamp(created_at) AS created_at,
  unix_timestamp(last_login) AS last_login
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Using last_sync_time

The last_sync_time variable is a Unix timestamp. Convert it to a timestamp for comparison:
WHERE updated_at >= from_unixtime({{last_sync_time}})
Or compare directly if your column stores Unix timestamps:
WHERE updated_at_epoch >= {{last_sync_time}}

Building Nested Objects with named_struct

Use named_struct to create nested objects like relationshipFields:
named_struct(
  'role', role,
  'quantity', quantity,
  'started_at', unix_timestamp(started_at)
) AS relationshipFields
Alternatively, use to_json with a struct:
to_json(named_struct(
  'role', role,
  'quantity', quantity
)) AS relationshipFields

Converting Booleans

Databricks booleans work directly, but you can convert to strings if needed:
SELECT 
  email,
  CASE WHEN is_active THEN 'true' ELSE 'false' END AS is_active,
  CASE WHEN is_admin THEN 'true' ELSE 'false' END AS is_admin
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Handling NULLs

Use COALESCE or NVL to provide default values:
SELECT 
  email,
  COALESCE(first_name, '') AS first_name,
  NVL(phone, '') AS phone
FROM users

Casting Types

Use CAST to convert between types:
SELECT 
  CAST(user_id AS STRING) AS id,
  CAST(score AS INT) AS lead_score
FROM users

Querying Across Catalogs

Use fully-qualified table names to query from multiple catalogs:
SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id

Troubleshooting

”Permission denied” errors

Ensure your service principal has the required grants:
  • USE CATALOG on the catalog
  • USE SCHEMA on the schema
  • SELECT on each table you want to query
  • Can use permission on the SQL warehouse

”Could not connect” errors

Verify that:
  • Your server hostname is correct
  • The HTTP path points to a valid SQL warehouse
  • The OAuth Client ID and Secret are correct
  • Conversion’s IP addresses are allowed if you use IP access lists

”Unauthorized/Forbidden: 403” errors

This typically indicates expired or invalid OAuth credentials. Generate a new secret for your service principal and update the connection in Conversion.

Sync taking too long

  • Ensure you’re filtering by last_sync_time to reduce rows
  • Select only the columns you need
  • Consider using a larger SQL warehouse size
  • Reduce sync frequency for large datasets

Subquery errors

If you encounter errors like Subquery has not finished, try rewriting your query to remove subqueries. Use Common Table Expressions (CTEs) instead:
-- Instead of subqueries
WITH filtered_users AS (
  SELECT * FROM users
  WHERE status = 'active'
)
SELECT * FROM filtered_users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Frequently Asked Questions

The service principal needs USE CATALOG, USE SCHEMA, and SELECT on the specific tables you want to sync, plus Can use permission on the SQL warehouse. Conversion only reads data—it never writes to your Databricks workspace.
We recommend using OAuth with a service principal for better security and manageability. However, if you need to use a Personal Access Token:
  1. Go to User Settings → Developer → Access tokens in Databricks
  2. Click Generate new token
  3. Use this token in place of the OAuth credentials when connecting
Use fully-qualified table names in your query:
SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id
If you sync a custom object with an email that doesn’t exist in Conversion, we create the contact automatically. You can enrich that contact’s profile through a separate contacts sync.