Enable built-in CDC in SQL Server

Preview

LakeFlow Connect is in gated Public Preview. To participate in the preview, contact your Databricks account team.

This article describes how to enable built-in change data capture (CDC) in SQL Server. Either change tracking or CDC is required for ingestion into Databricks. For guidance on which option to choose, see Change tracking vs. change data capture.

Enable built-in CDC for the source database

To enable the source database for CDC, run the following stored procedure in Azure SQL, replacing the value for <database-name>. You must be logged into the database you want to enable for CDC.

EXEC sys.sp_cdc_enable_db

To enable CDC in a database in Amazon RDS for SQL Server, run the following:

EXEC msdb.dbo.rds_cdc_enable_db '<database-name>'

For more information, see Enable change data capture for a database in the SQL Server documentation.

Enable built-in CDC on the source table

To enable CDC on the source table, run the following stored procedure in Azure SQL:

EXEC sys.sp_cdc_enable_table
@source_schema = N'MySchema',
@source_name   = N'MyTable',
@role_name     = NULL,
@supports_net_changes = 1
  • Replace the values for source_schema, source_name, and role_name.

  • @support_net_changes only supports a value of 1 if the table has a primary key.

For more information, see Enable change data capture for a table in the SQL Server documentation.

Grant SELECT on the CDC schema

In addition to the privileges described in the source setup, the database user needs the SELECT privilege on the schema cdc that contains the change tables that are created when CDC is enabled.

GRANT SELECT ON SCHEMA::cdc to [cdc-username];

Set up DDL capture and schema evolution

The SQL Server connector can track the data definition language (DDL) on replicated database objects and apply relevant table schema changes to the destination tables or add new tables in case of full schema replication.

To perform DDL capture, additional database object setup is required (for example, internal tables, stored procedures, and triggers). The T-SQL script provided in this section drops any pre-existing DDL support objects and creates the DDL support objects required to capture DDL changes happening on the database.

To set up DDL capture and schema evolution, do the following:

  1. Download the following T-SQL script:

    ddl_support_objects.sql

  2. Modify the script to set the mode value:

    • BOTH: Initializes both CT and CDC objects (default)

    • CT: Initializes CT objects

    • CDC: Initializes CDC objects

    • NONE: Deletes all pre-existing CT and CDC objects

  3. Run the script on each database that contains tables you want to replicate.

Grant additional user privileges

Grant the following permissions to the database user:

GRANT VIEW DEFINITION ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;
GRANT VIEW DEFINITION ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT VIEW DEFINITION TO <database-user>;
GRANT VIEW DATABASE PERFORMANCE STATE TO <database-user>;
GRANT UPDATE ON object::dbo.lakeflowCaptureInstanceInfo_1_1 TO <database-user>;
GRANT EXECUTE ON schema :: dbo TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowMergeCaptureInstances_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowDisableOldCaptureInstance_1_1 TO <database-user>;
GRANT EXECUTE ON object::dbo.lakeflowRefreshCaptureInstance_1_1 TO <database-user>;