Databricks - Conversion Docs

This guide walks you through connecting Databricks to Conversion and setting up your first sync.

Before You Begin

Before connecting Databricks, we recommend the following:

Create a dedicated service principal — Set up a Databricks service principal specifically for Conversion with read-only access to the tables you need. This limits exposure and makes it easy to revoke access if needed.
Use a dedicated SQL warehouse — Consider creating a dedicated SQL warehouse for Conversion queries to prevent sync operations from impacting other workloads.
Sync only the data you need — Select only the columns you’ll use, and always filter with last_sync_time to limit results to changed rows.

Queries that return millions of rows should not run more frequently than once per day. Large, frequent syncs can impact both your Databricks costs and Conversion performance.

Step 1: Set Up Databricks Access

Conversion connects to Databricks using OAuth authentication with a service principal. You’ll need to create a service principal, generate OAuth credentials, and grant the appropriate permissions.

Create a Service Principal

In your Databricks workspace, click your user dropdown at the top right and select Settings
Navigate to Identity and access → Service Principals and click Manage
Click Add new, give your service principal a name (e.g., conversion_sync), and click Add
Note the Application ID (a UUID like 63fc7e90-a8a2-4639-afd8-36ef6bb67cfa) — you’ll need this later

Generate OAuth Credentials

On the Service Principal page, go to the Secrets tab
Click Generate secret
Save the Client ID and Secret securely — the secret is only displayed once

Keep your OAuth credentials secure. You’ll enter them into Conversion when setting up the connection.

Grant Catalog Permissions

Grant your service principal read access to the tables you want to sync:

In the Databricks sidebar, click Catalog and locate your schema
Select the schema and open the Permissions page
Click Grant and enter your service principal name
Grant the following permissions:
- USE SCHEMA
- EXECUTE
- READ VOLUME
- SELECT
Enable “Also grant USE CATALOG” at the bottom

-- Alternatively, run these commands in a SQL worksheet
GRANT USE CATALOG ON CATALOG my_catalog TO `conversion_sync`;
GRANT USE SCHEMA ON SCHEMA my_catalog.my_schema TO `conversion_sync`;
GRANT SELECT ON SCHEMA my_catalog.my_schema TO `conversion_sync`;

Grant Warehouse Permissions

Ensure your service principal can use the SQL warehouse:

In the Databricks sidebar, click SQL Warehouses
Find your warehouse, click the ⋮ menu, then Permissions
Add your service principal and set permissions to Can use

Step 2: Allow Conversion’s IP Addresses

If your Databricks workspace uses IP access lists, allow connections from Conversion’s IP addresses:

Region	IP Addresses
US	35.239.90.161, 35.188.167.166, 34.56.101.43, 34.122.97.230, 34.29.176.66, 35.226.154.44

Step 3: Gather Connection Details

Before connecting to Conversion, gather these details from your Databricks workspace:

Go to SQL Warehouses and select your warehouse
Click the Connection details tab
Note the following:
- Server hostname (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)
- Port (typically 443)
- HTTP path (e.g., /sql/1.0/warehouses/abc123def456)

Step 4: Connect Databricks to Conversion

In Conversion, go to Settings → Integrations → Data Warehouse
Click Add Integration and select Databricks
Enter your connection details:
- Name — A friendly name for this connection (e.g., “Production Databricks”)
- Server hostname — Your Databricks server hostname
- Port — The port number (default: 443)
- HTTP path — The HTTP path to your SQL warehouse
- Catalog — The Unity Catalog name (optional, uses default catalog if empty)
- Schema — The default schema
- Client ID — The OAuth Client ID from your service principal
- Client Secret — The OAuth secret you generated
Click Connect

Conversion will verify the connection. If successful, you’re ready to create syncs.

Step 5: Create a Sync

After connecting, create a sync to start pulling data:

Click Add Sync
Select the destination type:
- Contacts — Create or update contact records
- Custom Objects — Create or update custom object records (with optional relationships)
- Events — Record events on contacts
For Custom Objects, select the specific object type
Enter your SQL query (see examples below)
Click Run Query to preview results
Set the sync schedule (how often to run)
Click Enable

Writing Queries for Databricks

Contact Sync

SELECT 
  email,
  first_name,
  last_name,
  title AS job_title,
  phone,
  city
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Custom Objects (Without Relationships)

To sync objects without relating them to contacts, omit the email column:

SELECT 
  id,
  name,
  status,
  plan_tier,
  monthly_price
FROM subscriptions
WHERE updated_at >= from_unixtime({{last_sync_time}})

Custom Objects with Relationships

To sync custom objects and their relationships to contacts in a single query, include the contact’s email and use named_struct or to_json for relationship-specific fields:

SELECT 
  -- Object fields
  p.id,
  p.name,
  p.sku,
  p.category,
  p.price,
  
  -- Contact to relate this object to
  u.email,
  
  -- Relationship-specific fields
  named_struct(
    'role', up.role,
    'quantity', up.quantity,
    'purchased_at', unix_timestamp(up.purchased_at)
  ) AS relationshipFields
  
FROM user_products up
JOIN products p ON up.product_id = p.id
JOIN users u ON up.user_id = u.id
WHERE up.updated_at >= from_unixtime({{last_sync_time}})

In this example:

Object fields (id, name, sku, category, price) are stored on the Product object
email identifies which contact to create a relationship with
relationshipFields contains fields stored on the relationship itself (this contact’s role, quantity, etc.)

If multiple rows share the same object id but have different email values, Conversion creates one object with multiple relationships—one per contact.

Events

SELECT 
  u.email,
  'purchase_completed' AS event_name,
  o.created_at AS timestamp,
  CAST(o.id AS STRING) AS message_id,
  o.total_amount,
  o.product_name,
  o.currency
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at >= from_unixtime({{last_sync_time}})

Databricks SQL Reference

Converting Timestamps

Conversion expects Unix timestamps for date/time fields. Use unix_timestamp() to convert Databricks TIMESTAMP values:

SELECT 
  email,
  unix_timestamp(created_at) AS created_at,
  unix_timestamp(last_login) AS last_login
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Using last_sync_time

The last_sync_time variable is a Unix timestamp. Convert it to a timestamp for comparison:

WHERE updated_at >= from_unixtime({{last_sync_time}})

Or compare directly if your column stores Unix timestamps:

WHERE updated_at_epoch >= {{last_sync_time}}

Building Nested Objects with named_struct

Use named_struct to create nested objects like relationshipFields:

named_struct(
  'role', role,
  'quantity', quantity,
  'started_at', unix_timestamp(started_at)
) AS relationshipFields

Alternatively, use to_json with a struct:

to_json(named_struct(
  'role', role,
  'quantity', quantity
)) AS relationshipFields

Converting Booleans

Databricks booleans work directly, but you can convert to strings if needed:

SELECT 
  email,
  CASE WHEN is_active THEN 'true' ELSE 'false' END AS is_active,
  CASE WHEN is_admin THEN 'true' ELSE 'false' END AS is_admin
FROM users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Handling NULLs

Use COALESCE or NVL to provide default values:

SELECT 
  email,
  COALESCE(first_name, '') AS first_name,
  NVL(phone, '') AS phone
FROM users

Casting Types

Use CAST to convert between types:

SELECT 
  CAST(user_id AS STRING) AS id,
  CAST(score AS INT) AS lead_score
FROM users

Querying Across Catalogs

Use fully-qualified table names to query from multiple catalogs:

SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id

Troubleshooting

”Permission denied” errors

Ensure your service principal has the required grants:

USE CATALOG on the catalog
USE SCHEMA on the schema
SELECT on each table you want to query
Can use permission on the SQL warehouse

”Could not connect” errors

Verify that:

Your server hostname is correct
The HTTP path points to a valid SQL warehouse
The OAuth Client ID and Secret are correct
Conversion’s IP addresses are allowed if you use IP access lists

”Unauthorized/Forbidden: 403” errors

This typically indicates expired or invalid OAuth credentials. Generate a new secret for your service principal and update the connection in Conversion.

Sync taking too long

Ensure you’re filtering by last_sync_time to reduce rows
Select only the columns you need
Consider using a larger SQL warehouse size
Reduce sync frequency for large datasets

Subquery errors

If you encounter errors like Subquery has not finished, try rewriting your query to remove subqueries. Use Common Table Expressions (CTEs) instead:

-- Instead of subqueries
WITH filtered_users AS (
  SELECT * FROM users
  WHERE status = 'active'
)
SELECT * FROM filtered_users
WHERE updated_at >= from_unixtime({{last_sync_time}})

Frequently Asked Questions

What Databricks permissions does Conversion need?

The service principal needs USE CATALOG, USE SCHEMA, and SELECT on the specific tables you want to sync, plus Can use permission on the SQL warehouse. Conversion only reads data—it never writes to your Databricks workspace.

Can I use a Personal Access Token instead of OAuth?

We recommend using OAuth with a service principal for better security and manageability. However, if you need to use a Personal Access Token:

Go to User Settings → Developer → Access tokens in Databricks
Click Generate new token
Use this token in place of the OAuth credentials when connecting

How do I sync from multiple schemas?

Use fully-qualified table names in your query:

SELECT u.email, o.order_id
FROM my_catalog.users_schema.users u
JOIN my_catalog.orders_schema.orders o ON u.id = o.user_id

What if a contact doesn't exist yet?

If you sync a custom object with an email that doesn’t exist in Conversion, we create the contact automatically. You can enrich that contact’s profile through a separate contacts sync.

Objects

Fields

Integrations

​Before You Begin

​Step 1: Set Up Databricks Access

​Create a Service Principal

​Generate OAuth Credentials

​Grant Catalog Permissions

​Grant Warehouse Permissions

​Step 2: Allow Conversion’s IP Addresses

​Step 3: Gather Connection Details

​Step 4: Connect Databricks to Conversion

​Step 5: Create a Sync

​Writing Queries for Databricks

​Contact Sync

​Custom Objects (Without Relationships)

​Custom Objects with Relationships

​Events

​Databricks SQL Reference

​Converting Timestamps

​Using last_sync_time

​Building Nested Objects with named_struct

​Converting Booleans

​Handling NULLs

​Casting Types

​Querying Across Catalogs

​Troubleshooting

​”Permission denied” errors

​”Could not connect” errors

​”Unauthorized/Forbidden: 403” errors

​Sync taking too long

​Subquery errors

​Frequently Asked Questions

Before You Begin

Step 1: Set Up Databricks Access

Create a Service Principal

Generate OAuth Credentials

Grant Catalog Permissions

Grant Warehouse Permissions

Step 2: Allow Conversion’s IP Addresses

Step 3: Gather Connection Details

Step 4: Connect Databricks to Conversion

Step 5: Create a Sync

Writing Queries for Databricks

Contact Sync

Custom Objects (Without Relationships)

Custom Objects with Relationships

Events

Databricks SQL Reference

Converting Timestamps

Using last_sync_time

Building Nested Objects with named_struct

Converting Booleans

Handling NULLs

Casting Types

Querying Across Catalogs

Troubleshooting

”Permission denied” errors

”Could not connect” errors

”Unauthorized/Forbidden: 403” errors

Sync taking too long

Subquery errors

Frequently Asked Questions