Change Data Capture (CDC) in SQL

Arunachalam C

Collecting the information about frequent data modification on the objects (tables), it is very useful in maintaining the Data security.

By using these information, we can analysis about amount of events (Insert, Update and Delete) operations are happening on tables.

It will helpful for the reporting purpose, for example HR department need to pull out the report who are joined the organization on yesterday. As the capturing the event, from that we can get the filtered report.

And also if any bulk inserts and update or delete events are happening, it may lead to data to be scattered on the disk. If data are scattered, it leads to performance issue and if we have any indexes present on the table, and again index rebuild and reorg needs to be performed.

By using this analysis, we can move the affected table to different storage that has more efficient I/O operation.

Usual way to track the data changed information are using triggers, user defined programs.

From SQL server 2008, Microsoft introduced new feature called CDC.

Below are the steps to enable the CDC for the particular database on SQL server 2016 (for e.g GeopitsDB)

Before enabling, check whether CDC already enabled on the Database or not by using below mentioned TSQL statement.

USE master

GO

SELECT [name], database_id, is_cdc_enabled

FROM sys.databases

GO

Is_cdc_enabled = 1, then its enabled. Now it’s not enabled.

Please fire the below mentioned command to enable the CDC.

USE GeopitsDB

GO

EXEC sys.sp_cdc_enable_db

GO

Once enabled, kindly check whether following system tables are created on database where we enable the CDC.

To enable the CDC on the table (Vacation_details).

Following command is to check, which tables are participating in CDC

Use GeopitsDB

SELECT [name], is_tracked_by_cdc

FROM sys.tables

GO

As the is_tracked_by_cdc column values is 0. CDC is not enabled at the table level. It will get enabled by following TSQL command.

Use GeopitsDB

EXEC sys.sp_cdc_enable_table

@source_schema = N’dbo’,

@source_name = N’vacation_details’,

@role_name = NULL

GO

After it get enabled, check whether table named cdc.dbo_vacation_details_CT are created under the system table.

When any Modification or changes happen on the table vacation_details. That information is collect on the table cdc.dbo_vacation_details_CT.

Before inserting the values to the table, we will select the both tables (Vacation_details and cdc.dbo_vacation_details_ct)

Select * from cdc.dbo_vacation_details_CT

Use GeopitsDB

Select * from dbo.vacation_detail

As it shows, we have three records. Now I am going to insert a new row.

Use geopitsDB

insert into Vacation_Details values (‘04’,’Clara’,’Yes’, ‘2018–12–20’, ‘2018–12–27’,’HR’);

Now we check whether these changes are captured by table cdc.dbo_vacation_details_CT

Now we find one record created, in this we need to check the column (_$operation, _$update_mask).

_$operation column stores the information about what event have been performed on the table.

_$update_mask column holds the information about which are all column are affected.

How it is calculated?

Convert this Hexa-decimal to binary, then we will get the value 0000 0000 0011 1111. We need to consider from right to left, 1 represent column gets affected. As we have performed the insert operation and we have six columns get affected. so all are one’s.

We run all kinds of database services that vow your success!!