Introduction
In many projects, databases need to be migrated or synchronized between different systems. In our case, the requirement was to move data from Microsoft SQL Server to PostgreSQL while the application was still running. The data in the source database was continuously changing, so performing a simple export and import would not work. We needed a solution that could capture ongoing changes and send them to the target database in near real time.
Problem statement
The main challenge was to replicate data from MSSQL to PostgreSQL without stopping the application. The database contained tables that were actively receiving inserts and updates. If we performed a normal database dump, the target system would immediately become outdated once new transactions started happening.
Another complication was that the setup was done on a Linux-based virtual machine. Services such as MSSQL Server, Kafka, and Kafka Connect had to be configured manually. This increased the chances of configuration errors and required careful troubleshooting.
Proposed solution
To solve this problem, a streaming architecture was implemented using Change Data Capture (CDC), Debezium, and Apache Kafka. The goal was to capture row-level changes in MSSQL and send them through a streaming platform before writing them into PostgreSQL.
Architecture diagram
The architecture worked in the following way:
- MSSQL with CDC enabled captures changes made to selected tables
- Debezium reads these change tables and converts them into Kafka events.
- Apache Kafka acts as the message broker that transports these events.
- Kafka Connect consumes the events and writes them into PostgreSQL.
Implementation steps
Step 1 – Enabling CDC in MSSQL
CDC was enabled at the database level and then for specific tables that required replication. After enabling CDC,SQL Server created change tables that store insert, update, and delete events.
During testing, one issue encountered was that the CDC capture job was not running. Even though CDC was enabled, the change tables were not receiving data. After investigation, it was found that the capture job had stopped. Restarting the job solved the issue.
Step 2 – Setting Up Kafka
Kafka was installed on the Linux virtual machine and configured to run in KRaft mode. This mode allows Kafka to run without ZooKeeper. While configuring Kafka, an error related to cluster ID mismatch occurred. The error happened because the log directory already contained metadata from a previous setup.
The fix was simple: remove the existing log directory, generate a new cluster ID, format the storage again, and restart Kafka.
Step 3 – Configuring Kafka Connect and Debezium
Kafka Connect was used to run the Debezium source connector. The connector reads changes captured by CDC and publishes them as messages to Kafka topics.
Initially, the Kafka Connect REST API was not responding. The issue was traced back to the service not starting properly due to an incorrect plugin path configuration. After correcting the plugin path and restarting the service, the connectors were successfully detected.
Step 4 – Writing Data to PostgreSQL
A sink connector was configured to read messages from Kafka topics and insert them into PostgreSQL tables. Proper primary keys had to be defined in PostgreSQL to allow updates and deletes to work correctly.
Challenges faced
One of the biggest challenges during the setup was service connectivity. Many services were initially configured with 'localhost'. When accessing them from different containers or environments, the connections failed because the services were not bound to external interfaces.
Another issue involved connector plugins not appearing in the Kafka Connect API. This was caused by incorrect plugin directory configuration. Once the correct path was added to the configuration file and the service restarted, the connectors became visible.
Results
After resolving these configuration and troubleshooting issues, the pipeline started working as expected. Data inserted or updated in MSSQL was captured through CDC and streamed through Kafka. Kafka Connect then applied the same changes to PostgreSQL.
The system was able to replicate data with minimal delay, making it suitable for real-time data synchronization scenarios.
Conclusion
This implementation demonstrated that real-time data migration between different database systems is possible using CDC and streaming technologies. However, the setup requires careful configuration and monitoring. Understanding how each component works - CDC, Kafka, connectors, and the target database - is essential for building a reliable data pipeline.



