Real-Time Data Migration from MSSQL to PostgreSQL Using CDC and Kafka

Suvathi S

Apr

2026

Introduction

In many projects, databases need to be migrated or synchronized between different systems. In our case, the requirement was to move data from Microsoft SQL Server to PostgreSQL while the application was still running. The data in the source database was continuously changing, so performing a simple export and import would not work. We needed a solution that could capture ongoing changes and send them to the target database in near real time.

Problem statement

The main challenge was to replicate data from MSSQL to PostgreSQL without stopping the application. The database contained tables that were actively receiving inserts and updates. If we performed a normal database dump, the target system would immediately become outdated once new transactions started happening.

Another complication was that the setup was done on a Linux-based virtual machine. Services such as MSSQL Server, Kafka, and Kafka Connect had to be configured manually. This increased the chances of configuration errors and required careful troubleshooting.

Proposed solution

To solve this problem, a streaming architecture was implemented using Change Data Capture (CDC), Debezium, and Apache Kafka. The goal was to capture row-level changes in MSSQL and send them through a streaming platform before writing them into PostgreSQL.

Architecture diagram

CDC Pipeline Architecture

Architecture Diagram

MSSQL Database

Source system

CDC captures changes

Debezium Connector

Streams change events

Apache Kafka

Message broker

Kafka Connect Sink

Writes to target

PostgreSQL Database

Target system

‍

The architecture worked in the following way:

MSSQL with CDC enabled captures changes made to selected tables
Debezium reads these change tables and converts them into Kafka events.
Apache Kafka acts as the message broker that transports these events.
Kafka Connect consumes the events and writes them into PostgreSQL.

‍

Implementation steps

Step 1 – Enabling CDC in MSSQL

CDC was enabled at the database level and then for specific tables that required replication. After enabling CDC,SQL Server created change tables that store insert, update, and delete events.

During testing, one issue encountered was that the CDC capture job was not running. Even though CDC was enabled, the change tables were not receiving data. After investigation, it was found that the capture job had stopped. Restarting the job solved the issue.

Step 2 – Setting Up Kafka

Kafka was installed on the Linux virtual machine and configured to run in KRaft mode. This mode allows Kafka to run without ZooKeeper. While configuring Kafka, an error related to cluster ID mismatch occurred. The error happened because the log directory already contained metadata from a previous setup.

The fix was simple: remove the existing log directory, generate a new cluster ID, format the storage again, and restart Kafka.

Step 3 – Configuring Kafka Connect and Debezium

Kafka Connect was used to run the Debezium source connector. The connector reads changes captured by CDC and publishes them as messages to Kafka topics.

Initially, the Kafka Connect REST API was not responding. The issue was traced back to the service not starting properly due to an incorrect plugin path configuration. After correcting the plugin path and restarting the service, the connectors were successfully detected.

Step 4 – Writing Data to PostgreSQL

A sink connector was configured to read messages from Kafka topics and insert them into PostgreSQL tables. Proper primary keys had to be defined in PostgreSQL to allow updates and deletes to work correctly.

Challenges faced

One of the biggest challenges during the setup was service connectivity. Many services were initially configured with 'localhost'. When accessing them from different containers or environments, the connections failed because the services were not bound to external interfaces.

Another issue involved connector plugins not appearing in the Kafka Connect API. This was caused by incorrect plugin directory configuration. Once the correct path was added to the configuration file and the service restarted, the connectors became visible.

Results

After resolving these configuration and troubleshooting issues, the pipeline started working as expected. Data inserted or updated in MSSQL was captured through CDC and streamed through Kafka. Kafka Connect then applied the same changes to PostgreSQL.

The system was able to replicate data with minimal delay, making it suitable for real-time data synchronization scenarios.

Conclusion

This implementation demonstrated that real-time data migration between different database systems is possible using CDC and streaming technologies. However, the setup requires careful configuration and monitoring. Understanding how each component works - CDC, Kafka, connectors, and the target database - is essential for building a reliable data pipeline.

Database Migration

Suvathi S

Associate DBA

You may be interested in

MySQL vs PostgreSQL DBA Guide: Storage, WAL & Performance

Explore advanced MySQL and PostgreSQL DBA concepts, including storage internals, WAL, backup and recovery, partitioning, monitoring, security, and performance optimization.

Jun 2026

Database Modernization Strategy: Migrating Legacy Systems to Cloud-Native Architecture

Explore database modernization strategies for migrating legacy systems to cloud-native architecture. Learn about database transformation, replatforming, modernization roadmaps, and cloud modernization services.

Jun 2026

Modern Data Stack: Components, Architecture & Enterprise Implementation Guide

Explore the modern data stack, its core components, architecture, and enterprise implementation strategies. Learn how ELT, analytics engineering, data orchestration tools, and data observability power scalable data infrastructure.

Jun 2026

We run all kinds of database services that vow your success!!

Get in Touch

Get Free DB Audit