Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Apr 14, 2025 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Making data lake work for time series
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
A tool for building feature stores.
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Data pipelines from re-usable components
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Python MLS and Real-Estate Data Scraper for the Realtor.ca Website
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Airflow DAGs for the Stellar ETL project
SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子
TAC is an airflow plugin which helps you to Extract transform and Load your data, bit more easily
Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows
Add a description, image, and links to the etl-framework topic page so that developers can more easily learn about it.
To associate your repository with the etl-framework topic, visit your repo's landing page and select "manage topics."