Before You Begin
Before connecting Databricks, we recommend the following:- Create a dedicated service principal — Set up a Databricks service principal specifically for Conversion with read-only access to the tables you need. This limits exposure and makes it easy to revoke access if needed.
- Use a dedicated SQL warehouse — Consider creating a dedicated SQL warehouse for Conversion queries to prevent sync operations from impacting other workloads.
- Sync only the data you need — Select only the columns you’ll use, and always filter with
last_sync_timeto limit results to changed rows.
Step 1: Set Up Databricks Access
Conversion connects to Databricks using OAuth authentication with a service principal. You’ll need to create a service principal, generate OAuth credentials, and grant the appropriate permissions.Create a Service Principal
- In your Databricks workspace, click your user dropdown at the top right and select Settings
- Navigate to Identity and access → Service Principals and click Manage
- Click Add new, give your service principal a name (e.g.,
conversion_sync), and click Add - Note the Application ID (a UUID like
63fc7e90-a8a2-4639-afd8-36ef6bb67cfa) — you’ll need this later
Generate OAuth Credentials
- On the Service Principal page, go to the Secrets tab
- Click Generate secret
- Save the Client ID and Secret securely — the secret is only displayed once
Keep your OAuth credentials secure. You’ll enter them into Conversion when setting up the connection.
Grant Catalog Permissions
Grant your service principal read access to the tables you want to sync:- In the Databricks sidebar, click Catalog and locate your schema
- Select the schema and open the Permissions page
- Click Grant and enter your service principal name
- Grant the following permissions:
USE SCHEMAEXECUTEREAD VOLUMESELECT
- Enable “Also grant USE CATALOG” at the bottom
Grant Warehouse Permissions
Ensure your service principal can use the SQL warehouse:- In the Databricks sidebar, click SQL Warehouses
- Find your warehouse, click the ⋮ menu, then Permissions
- Add your service principal and set permissions to Can use
Step 2: Allow Conversion’s IP Addresses
If your Databricks workspace uses IP access lists, allow connections from Conversion’s IP addresses:| Region | IP Addresses |
|---|---|
| US | 35.239.90.161, 35.188.167.166, 34.56.101.43, 34.122.97.230, 34.29.176.66, 35.226.154.44 |
Step 3: Gather Connection Details
Before connecting to Conversion, gather these details from your Databricks workspace:- Go to SQL Warehouses and select your warehouse
- Click the Connection details tab
- Note the following:
- Server hostname (e.g.,
dbc-a1b2c3d4-e5f6.cloud.databricks.com) - Port (typically
443) - HTTP path (e.g.,
/sql/1.0/warehouses/abc123def456)
- Server hostname (e.g.,
Step 4: Connect Databricks to Conversion
- In Conversion, go to Settings → Integrations → Data Warehouse
- Click Add Integration and select Databricks
- Enter your connection details:
- Name — A friendly name for this connection (e.g., “Production Databricks”)
- Server hostname — Your Databricks server hostname
- Port — The port number (default:
443) - HTTP path — The HTTP path to your SQL warehouse
- Catalog — The Unity Catalog name (optional, uses default catalog if empty)
- Schema — The default schema
- Client ID — The OAuth Client ID from your service principal
- Client Secret — The OAuth secret you generated
- Click Connect
Step 5: Create a Sync
After connecting, create a sync to start pulling data:- Click Add Sync
- Select the destination type:
- Contacts — Create or update contact records
- Custom Objects — Create or update custom object records (with optional relationships)
- Events — Record events on contacts
- For Custom Objects, select the specific object type
- Enter your SQL query (see examples below)
- Click Run Query to preview results
- Set the sync schedule (how often to run)
- Click Enable
Writing Queries for Databricks
Contact Sync
Custom Objects (Without Relationships)
To sync objects without relating them to contacts, omit theemail column:
Custom Objects with Relationships
To sync custom objects and their relationships to contacts in a single query, include the contact’semail and use named_struct or to_json for relationship-specific fields:
- Object fields (
id,name,sku,category,price) are stored on the Product object emailidentifies which contact to create a relationship withrelationshipFieldscontains fields stored on the relationship itself (this contact’s role, quantity, etc.)
If multiple rows share the same object
id but have different email values, Conversion creates one object with multiple relationships—one per contact.Events
Databricks SQL Reference
Converting Timestamps
Conversion expects Unix timestamps for date/time fields. Useunix_timestamp() to convert Databricks TIMESTAMP values:
Using last_sync_time
Thelast_sync_time variable is a Unix timestamp. Convert it to a timestamp for comparison:
Building Nested Objects with named_struct
Usenamed_struct to create nested objects like relationshipFields:
to_json with a struct:
Converting Booleans
Databricks booleans work directly, but you can convert to strings if needed:Handling NULLs
UseCOALESCE or NVL to provide default values:
Casting Types
UseCAST to convert between types:
Querying Across Catalogs
Use fully-qualified table names to query from multiple catalogs:Troubleshooting
”Permission denied” errors
Ensure your service principal has the required grants:USE CATALOGon the catalogUSE SCHEMAon the schemaSELECTon each table you want to queryCan usepermission on the SQL warehouse
”Could not connect” errors
Verify that:- Your server hostname is correct
- The HTTP path points to a valid SQL warehouse
- The OAuth Client ID and Secret are correct
- Conversion’s IP addresses are allowed if you use IP access lists
”Unauthorized/Forbidden: 403” errors
This typically indicates expired or invalid OAuth credentials. Generate a new secret for your service principal and update the connection in Conversion.Sync taking too long
- Ensure you’re filtering by
last_sync_timeto reduce rows - Select only the columns you need
- Consider using a larger SQL warehouse size
- Reduce sync frequency for large datasets
Subquery errors
If you encounter errors likeSubquery has not finished, try rewriting your query to remove subqueries. Use Common Table Expressions (CTEs) instead:
Frequently Asked Questions
What Databricks permissions does Conversion need?
What Databricks permissions does Conversion need?
The service principal needs
USE CATALOG, USE SCHEMA, and SELECT on the specific tables you want to sync, plus Can use permission on the SQL warehouse. Conversion only reads data—it never writes to your Databricks workspace.Can I use a Personal Access Token instead of OAuth?
Can I use a Personal Access Token instead of OAuth?
We recommend using OAuth with a service principal for better security and manageability. However, if you need to use a Personal Access Token:
- Go to User Settings → Developer → Access tokens in Databricks
- Click Generate new token
- Use this token in place of the OAuth credentials when connecting
How do I sync from multiple schemas?
How do I sync from multiple schemas?
Use fully-qualified table names in your query:
What if a contact doesn't exist yet?
What if a contact doesn't exist yet?
If you sync a custom object with an
email that doesn’t exist in Conversion, we create the contact automatically. You can enrich that contact’s profile through a separate contacts sync.