Track Awesome Streaming Updates Weekly
a curated list of awesome streaming frameworks, applications, etc
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 manuzhang/awesome-streaming · ⭐ 2.7K · 🏷️ Big Data
Nov 18 - Nov 24, 2024
Table of Contents / Streaming Engine
- RisingWave (⭐7k) [Rust] - A PostgreSQL-compatible streaming database that is designed to build event-driven applications, real-time ETL pipelines, continuous analytics services, and feature stores for AI applications. It excels in extracting fresh and consistent insights from real-time event streams, database CDC, and time series data within sub-seconds. It unifies streaming and batch processing, enabling users to ingest, join, and analyze both live and historical data at a cloud scale.
Aug 19 - Aug 25, 2024
Table of Contents / Data Pipeline
- AutoMQ (⭐3.8k) [Scala/Java] - cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 100% Kafka compatible. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.
Jul 22 - Jul 28, 2024
Table of Contents / Streaming Library
- SwimOS (⭐314) [Rust] - A framework for building real-time streaming data processing applications written in Rust.
Feb 19 - Feb 25, 2024
Table of Contents / Toolkit
- Streamdal [Go/Node.js/Python] - A tool to embed privacy controls in your application code to detect PII as it enters and leaves your systems, preventing it from reaching unintended data streams or pipelines.
Feb 12 - Feb 18, 2024
Table of Contents / Toolkit
- Apache Pekko (⭐1.2k) [Scala, Java] - Fork of Akka 2.6.x, prior to the Akka project's adoption of the Business Source License.
Feb 05 - Feb 11, 2024
Table of Contents / Streaming Engine
- Numaflow (⭐1.3k) [Java/Python/Go/Rust] - Kubernetes native stream processing platform with language agnostic framework. Scalable and cost-efficient
Table of Contents / Online Machine Learning
- [Numalogic] (https://github.com/numaproj/numalogic (⭐167)) [Python] - Collection of ML models and libraries for real-time anomaly detection and forecasting on time series data. Built on Numaflow, a K8s native stream processing platform
Dec 25 - Dec 31, 2023
Table of Contents / Streaming SQL
- Proton (⭐1.6k) [C++] - A unified streaming and historical data analytics database in a single binary, powered by ClickHouse.
Nov 06 - Nov 12, 2023
Table of Contents / Streaming Engine
- Pathway (⭐4.3k) [Python] - The fastest data processing engine supporting unified workflows for batch, streaming data, and LLM applications.
Sep 18 - Sep 24, 2023
Table of Contents / Streaming Library
- FastStream (⭐3.1k) [Python] - powerful and easy-to-use Python library simplifying the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation generation automatically. Supports multiple protocols such as Apache Kafka, RabbitMQ and alike.
Jul 03 - Jul 09, 2023
Table of Contents / Streaming Application
- javactrl-kafka (⭐9) [Java] - An application of a stateful stream processing for workflow as Java code (microservices orchestration, business process automation, and more).
Jun 05 - Jun 11, 2023
Table of Contents / Streaming Application
- Zilla (⭐546) [Java] - Cross-platform, API gateway built for event-driven architectures and streaming that supports standard protocols such as HTTP, SSE, gRPC, MQTT and the native Kafka protocol.
May 08 - May 14, 2023
Table of Contents / Streaming Library
- Substation (⭐330) [Go] - Substation is a cloud native data pipeline and transformation toolkit written in Go.
Feb 20 - Feb 26, 2023
Table of Contents / Streaming Library
- Quix Streams (⭐1.2k) [Python] - a streaming library originally designed for the McLaren Formula 1 racing team that can process high volumes of time-series data with up to nanosecond precision using Apache Kafka as a message broker.
Dec 19 - Dec 25, 2022
Table of Contents / Streaming Library
- Mediapipe (⭐28k) - Cross-platform, customizable ML solutions for live and streaming media.
Table of Contents / Closed Source
- NVIDIA Deep Stream [Python/C/C++] - a platform for real-time image, video and audio processing, preferably using on edge devices or cloud.
Nov 21 - Nov 27, 2022
Table of Contents / Streaming Engine
- Bytewax (⭐1.6k) [Python] - data parallel, distributed, stateful stream processing framework.
Table of Contents / Online Machine Learning
- River (⭐5.1k) [Python] - online machine learning library.
Sep 05 - Sep 11, 2022
Table of Contents / Streaming Library
- Streamiz (⭐469) [C#] - a .Net Stream Processing Library for Apache Kafka
Aug 01 - Aug 07, 2022
Table of Contents / Streaming Engine
- Apache Ballista (⭐1.5k) [Rust] - distributed compute platform powered by Apache Arrow.
- Scramjet Cloud Platform (⭐67) [Python/JavaScript/Node.js] - data processing engine for running multiple data processing apps (sequences) written in Python, JavaScript or TypeScript
Table of Contents / Streaming Library
- Scramjet Node.js (⭐38) - [Node.js] functional reactive stream programming framework written on top of Node.js object streams + the legacy Scramjet.js version (⭐253)
- Scramjet Python (⭐35) - [Python] functional reactive stream programming framework written from scratch operating on object, string and buffer streams.
- Scramjet C++ (⭐3) - [C++] functional reactive stream programming framework written on top of Node.js object streams.
Mar 14 - Mar 20, 2022
Table of Contents / Data Pipeline
- Redpanda (⭐9.6k) [C++] - Redpanda is Kafka compatible, ZooKeeper-free, JVM-free and source available.
Feb 07 - Feb 13, 2022
Table of Contents / Data Pipeline
- Apache RocketMQ (⭐21k) [Java] - distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Table of Contents / Toolkit
- Nussknacker (⭐659) [Scala] - A visual tool to define and run real-time decision algorithms.
Jan 31 - Feb 06, 2022
Table of Contents / Streaming Library
- Akka Streams (⭐13k) [Scala] - stream processing library on Akka Actors.
- Daggy (⭐152) [C++] - real-time streams aggregation and catching.
Dec 20 - Dec 26, 2021
Table of Contents / Streaming Engine
- WindFlow [C++] - A C++17 Data Stream Processing Parallel Library for Multicores and GPUs.
Oct 25 - Oct 31, 2021
Table of Contents / Readings
- Data Pipelines with Apache Airflow by Bas P. Harenslak and Julian Rutger de Ruiter
Oct 04 - Oct 10, 2021
Table of Contents / Streaming Library
- YoMo (⭐1.7k) [Go] - An open source Streaming Serverless Framework for building Low-latency Geo-distributed system. YoMo Built atop QUIC Transport Protocol and Functional Reactive Programming interface.
Table of Contents / Data Pipeline
- fluvio (⭐3.9k) [Rust/WASM] - Real-time programmable data streaming platform with in-line computation capabilities.
Sep 27 - Oct 03, 2021
Table of Contents / Readings
- Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing by Reuven Lax, Slava Chernyak, and Tyler Akidau
May 03 - May 09, 2021
Table of Contents / Data Pipeline
- StreamSets Data Collector (⭐90) [Java] - continuous big data ingestion infrastructure that reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others.
Table of Contents / Streaming SQL
- Siddhi (⭐1.5k) [Java] - A cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to capture events from diverse data sources, process them, detect complex conditions, and publish output to various endpoints in real time.
Mar 29 - Apr 04, 2021
Table of Contents / Streaming SQL
- ksqlDB (⭐127) [Java] - A cloud-native, source-available database purpose-built for stream processing applications
- Materialize [Rust] - A source-available streaming SQL engine for maintaining materialized views on data from message brokers and databases.
Mar 22 - Mar 28, 2021
Table of Contents / Streaming Engine
- HStreamDB (⭐708) [Haskell] - The streaming database built for IoT data storage and real-time processing.
- Kuiper (⭐1.5k) [Golang] - An edge lightweight IoT data analytics/streaming software implemented by Golang, and it can be run at all kinds of resource-constrained edge devices.
Feb 15 - Feb 21, 2021
Table of Contents / Streaming Engine
- Maki Nage (⭐39) [Python] - A stream processing framework for data scientists, based on Kafka and ReactiveX.
Aug 17 - Aug 23, 2020
Table of Contents / Streaming Library
- Tributary (⭐442) [Python] - A python library for constructing dataflow graphs. Supports synchronous, reactive data streams built using python generators that mimic complex event processors, as well as lazily-evaluated acyclic graphs and functional currying streams.
Jun 15 - Jun 21, 2020
Table of Contents / Readings
- Grokking Streaming Systems by Josh Fischer & Ning Wang
May 11 - May 17, 2020
Table of Contents / Data Pipeline
- RudderStack (⭐4.1k) [Go] - an open source customer data infrastructure (segment, mparticle alternative).
May 04 - May 10, 2020
Table of Contents / Data Pipeline
- Gazette (⭐719) [golang] - Distributed streaming infrastructure built on cloud storage which makes it easy to mix and match batch and streaming paradigms.
Mar 23 - Mar 29, 2020
Table of Contents / Streaming Engine
- LightSaber (⭐70) [C++] - Multi-core Window-Based Stream Processing Engine. LightSaber uses code generation for efficient window aggregation.
Jan 06 - Jan 12, 2020
Table of Contents / IoT
- Apache StreamPipes (⭐608) [Java] - a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
Dec 16 - Dec 22, 2019
Table of Contents / Streaming Engine
- Apache Heron (incubating) (⭐3.6k) [Java] - a realtime, distributed, fault-tolerant stream processing engine from Twitter.
Oct 28 - Nov 03, 2019
Table of Contents / Streaming Engine
- mantis (⭐1.4k) [Java] - Netflix's platform to build an ecosystem of realtime stream processing applications
Table of Contents / Data Pipeline
- LogDevice [C++] - a high-performant distributed system by Facebook for streaming and storing sequential data, using a log structure.
Oct 14 - Oct 20, 2019
Table of Contents / DSL
- Apache Beam (⭐7.9k) [Java, Python, SQL, Scala, Go] - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
Table of Contents / Closed Source
- Cloud Dataflow[Java, Python, SQL, Scala] - Google's managed stream and batch data processing engine. Supports running Beam pipelines.
Sep 09 - Sep 15, 2019
Table of Contents / Streaming Library
- Stream Ops (⭐48) [Java] - A fully embeddable data streaming engine and stream processing API for Java.
Sep 02 - Sep 08, 2019
Table of Contents / Streaming Engine
- Gearpump (⭐763) [Scala] - lightweight real-time distributed streaming engine built on Akka.
Aug 12 - Aug 18, 2019
Table of Contents / Streaming Engine
- Trill (⭐1.2k) [.NET/C#] - Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research.
- Wallaroo (⭐1.5k) [Python] - A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.
Table of Contents / Streaming Library
- Streamz (⭐1.2k) [Python] - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.
Jul 29 - Aug 04, 2019
Table of Contents / Data Pipeline
- brooklin (⭐920) [Java] - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).
Table of Contents / Online Machine Learning
- streamDM (⭐492) [Scala] - mining Big Data streams using Spark Streaming from Huawei.
- StormCV (⭐167) [Java] - enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.
- trident-ml (⭐382) [Java] - realtime online machine learning library based on Trident.
- yurita (⭐107) [Scala] - Anomaly detection framework built on Spark Structured Streaming from Paypal.
Jul 15 - Jul 21, 2019
Table of Contents / Streaming SQL
- StreamCQL (⭐0) [Java] - Continuous Query Language on RealTime Computation System.
Table of Contents / Closed Source
- Amazon Kinesis Streams [Java] - real-time, fully managed and scalable data stream engine provided by AWS.
- Azure Stream Analytics [.NET] a massively scalable, fully managed, real-time, data stream engine provided by Microsoft Azure.
- concord [C++] - a distributed stream processing framework built in C++ on top of Apache.
- IBM Streams [Python/Java/Scala] - platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of the box.
- jubatus [C++] - distributed processing framework and streaming machine learning library.
- millwheel - framework for building low-latency data-processing applications that is widely used at Google.
Apr 15 - Apr 21, 2019
Table of Contents / Streaming Engine
- Apache Apex (⭐350) [Java] - unified platform for big data stream and batch processing.
- Apache Flink (⭐24k) [Java] - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
- Apache Samza (⭐820) [Scala/Java] - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
- Apache Spark Streaming (⭐40k) [Scala] - makes it easy to build scalable fault-tolerant streaming applications.
- Apache Storm (⭐6.6k) [Clojure/Java] - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
- AthenaX (⭐1.2k) [Java] - Uber's Stream Analytics Framework used in production
- Faust (⭐6.7k) [Python] - stream processing library, porting the ideas from Kafka Streams to Python
- Hazelcast Jet (⭐1.1k) [Java] - A general purpose distributed data processing engine, built on top of Hazelcast.
- hailstorm (⭐90) [Haskell] - distributed stream processing with exactly-once semantics based on Storm.
- mupd8(muppet) (⭐126) [Scala/Java] - mapReduce-style framework for processing fast/streaming data.
- Onyx (⭐2k) [Clojure] - Distributed, masterless, high performance, fault tolerant data processing.
- s4 (⭐42) [Java] - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
- SABER (⭐38) [Java/C] - Window-Based Hybrid CPU/GPU Stream Processing Engine.
- SPQR (⭐29) [Java] - dynamic framework for processing high volumn data streams through pipelines.
- tigon (⭐284) [C++/Java] - high throughput real-time streaming processing framework built on Hadoop and HBase.
- Teknek (⭐8) [Java] - Simple elegant stream processing with interactive prototying shell SOL (Stream Operator Language) Mesos, designed for high performance data processing jobs that require flexibility & control.
Table of Contents / Streaming Library
- Apache Kafka Streams (⭐29k) [Java] - lightweight stream processing library included in Apache Kafka (since 0.10 version).
- Benthos (⭐8.1k) [Go] - Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads
- FS2(prev. 'Scalaz-Stream') (⭐2.4k) [Scala] - Compositional, streaming I/O library for Scala.
- monix (⭐1.9k) [Scala] - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.
- Streamline (⭐164) [Java] - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus on business logic.
- StreamAlert (⭐2.9k) [Python] - Airbnb's Real-time Data Analysis and Alerting.
- Swave (⭐171) [Scala] - A lightweight Reactive Streams Infrastructure Toolkit for Scala.
Table of Contents / Streaming Application
- straw (⭐103) [Python/Java] - A platform for real-time streaming search.
- storm-crawler (⭐891) [Java] - Web crawler SDK based on Apache Storm.
Table of Contents / IoT
- sensorbee (⭐231) [Go] - lightweight stream processing engine for IoT.
- Apache Edgent (⭐217) [Java] - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
Table of Contents / DSL
- coast (⭐60) [Scala] - a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
- Esper (⭐840) [Java] - component for complex event processing (CEP) and event series analysis.
- Streamparse (⭐1.5k) [Python] - lets you run Python code against real-time streams of data via Apache Storm.
- summingbird (⭐2.1k) [Scala] - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
Table of Contents / Data Pipeline
- Apache Kafka (⭐29k) [Scala/Java] - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
- Apache Pulsar (⭐14k) [Java] - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
- camus (⭐881) [Java] - Linkedin's Kafka -> HDFS pipeline.
- databus (⭐3.6k) [Java] - Linkedin's source-agnostic distributed change data capture system.
- flume (⭐2.5k) [Java] - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
- metaq (⭐1.3k) [Java] - Taobao's high available, high performance distributed messaging system
- NATS streaming (⭐2.5k) [Go] - fast disk-backed messaging solution
- nsq (⭐25k) [Go] - realtime distributed messaging platform designed to operate at scale, handling billions of messages per day.
- suro (⭐794) [Java] - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.
Table of Contents / Online Machine Learning
- Apache Samoa (⭐248) [Java] - distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
- DataSketches (⭐896) [Java] - sketches library from Yahoo!.
- StreamingBandit (⭐79) [Python] - Provides a webserver to quickly setup and evaluate possible solutions to contextual multi-armed bandit (cMAB) problems.
Table of Contents / Streaming SQL
- pipelinedb (⭐2.6k) [C] - An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
- squall (⭐270) [Java] - Squall executes SQL queries on top of Storm for doing online processing.
Table of Contents / Benchmark
- storm-perf-test (⭐76) [Java] - a simple storm performance/stress test.
- streaming-benchmarks (⭐633) [Java] - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, etc.
- flotilla (⭐236) [Go] - Automated message queue orchestration for scaled-up benchmarking.
Table of Contents / Toolkit
- akka (⭐13k) [Scala] - toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.
- pulsar (⭐1.9k) [Python] - Actor based event driven concurrent framework for Python.
- aeron (⭐7.4k) [Java/C++] - efficient reliable unicast and multicast message transport.
- StreamFlow (⭐253) [Java] - stream processing tool designed to help build and monitor processing workflows.
- samza-luwak (⭐99) [Java] - uses Luwak, a stored-query engine built on Lucene, to implement full-text search on streams.
- Turbine (⭐835) [Java] - tool for aggregating streams of Server-Sent Event (SSE) JSON data into a single stream.
Jun 13 - Jun 19, 2016
Table of Contents / Readings
Feb 08 - Feb 14, 2016
Table of Contents / Readings
- The world beyond batch: Streaming 101 by Tyler Akidau.