Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
-
Updated
Apr 8, 2025 - Java
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Flink CDC is a streaming data integration tool
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
Kafka Streams made easy with a YAML file
cron replacement to schedule complex data workflows
Data pipeline using Apache Kafka, Apache Spark and HDFS
Toolkit for describing data transformation pipelines by compositing simple reusable components.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
An end to end data pipeline with Kafka Spark Streaming Integration
LinkedIn's previous generation Kafka to HDFS pipeline.
Data-processing and common libraries used in main project, all available under Apache 2.0
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
A real-time data pipeline using Kafka, Spark, and Cassandra for processing and storing credit card expenses. Includes a Spring Boot application for retrieving personnel data from MySQL, storing images in S3, and displaying employee details with expense reports on a web interface.
Real Time Data Streaming Pipeline
A real-time cryptocurrency data streaming pipeline.
CS502Capstone
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."