Awesome List Updates on Jul 08, 2014
3 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Bigdata
NewSQL Databases
- Haeinsa (⭐157) - linearly scalable multi-row, multi-table transaction library for HBase based on Percolator.
- Amazon RedShift - data warehouse service, based on PostgreSQL.
- H-Store - is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications.
- InfiniSQL - infinity scalable RDBMS.
- MemSQL - in memory SQL database witho optimized columnar storage on flash.
- NuoDB - SQL/ACID compliant distributed database.
- Sky - database used for flexible, high performance analysis of behavioral data.
- SymmetricDS - open source software for both file and database synchronization.
Distributed Programming
- AMPLab SIMR - run Spark on Hadoop MapReduce v1.
- Apache Crunch - a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce.
- Apache Gora - framework for in-memory data model and persistence.
- Apache Hama - BSP (Bulk Synchronous Parallel) computing framework.
- Apache Pig - high level language to express data analysis programs for Hadoop.
- Apache Twill - abstraction over YARN that reduces the complexity of developing distributed applications.
- Cascalog - data processing and querying library.
- Cheetah - High Performance, Custom Data Warehouse on Top of MapReduce.
- Concurrent Cascading - framework for data management/analytics on Hadoop.
- Damballa Parkour (⭐258) - MapReduce library for Clojure.
- Datasalt Pangool (⭐57) - alternative MapReduce paradigm.
- Facebook Corona - Hadoop enhancement which removes single point of failure.
- Facebook Peregrine - Map Reduce framework.
- Facebook Scuba - distributed in-memory datastore.
- JAQL - declarative programming language for working with structured, semi-structured and unstructured data.
- Kite - is a set of libraries, tools, examples, and documentation focused on making it easier to build systems on top of the Hadoop ecosystem.
- Nokia Disco - MapReduce framework developed by Nokia.
- Stratosphere - general purpose cluster computing framework.
- Twitter Scalding (⭐3.4k) - Scala library for Map Reduce jobs, built on Cascading.
- Twitter Summingbird (⭐2.1k) - Streaming MapReduce with Scalding and Storm, by Twitter.
Distributed Filesystem
- Apache HDFS - a way to store large files across multiple machines.
- Ceph Filesystem - software storage platform designed.
- Facebook Haystack - object storage system.
Document Data Model
- Crate Data - is an open source massively scalable data store. It requires zero administration.
- jumboDB - document oriented datastore over Hadoop.
- MarkLogic - Schema-agnostic Enterprise NoSQL database technology.
Key-value Data Model
- ElephantDB (⭐553) - Distributed database specialized in exporting data from Hadoop.
- LinkedIn Krati (⭐26) - is a simple persistent data store with very low latency and high throughput.
- Linkedin Voldemort - distributed key/value storage system.
- Storehaus (⭐466) - library to work with asynchronous key value stores, by Twitter.
Graph Data Model
- Apache Giraph - implementation of Pregel, based on Hadoop.
- Facebook TAO - TAO is the distributed data store that is widely used at facebook to store and serve the social graph.
- Google Pregel - graph processing framework.
- GraphX - resilient Distributed Graph System on Spark.
- Intel GraphBuilder - tools to construct large-scale graphs on top of Hadoop.
- Phoebus (⭐382) - framework for large scale graph processing.
- Titan - distributed graph database, built over Cassandra.
2. Awesome D
Lexers, Parsers, Parser Generators / Bare metal / kernel development
- Martin Nowak's Lexer (⭐12) - A lexer generator.
- Goldie - Goldie Parsing System.
Web Frameworks / Bare metal / kernel development
- cmsed (⭐17) - A component library for Vibe that functions as a CMS.
Command Line / XML
- scriptlike (⭐90) - Utility library to aid writing script-like programs in D.
- todod (⭐15) - Todod is a command line based todo list manager. It also has support for shell interaction based on linenoise (⭐3.5k).
3. Awesome Hadoop
Hadoop
- Apache Hadoop - Apache Hadoop
- SpatialHadoop - SpatialHadoop is a MapReduce extension to Apache Hadoop designed specially to work with spatial data.
- GIS Tools for Hadoop - Big Data Spatial Analytics for the Hadoop Framework
NoSQL
- Apache HBase - Apache HBase
- happybase (⭐595) - A developer-friendly Python library to interact with Apache HBase.
- Hannibal (⭐170) - Hannibal is tool to help monitor and maintain HBase-Clusters that are configured for manual splitting.
- hindex (⭐588) - Secondary Index for HBase
Workflow, Lifecycle and Governance
Data Ingestion and Integration
- Apache Flume - Apache Flume
DSL
- Apache Pig - Apache Pig
- Apache DataFu - A collection of libraries for working with large-scale data in Hadoop
- packetpig (⭐301) - Open Source Big Data Security Analytics
- akela (⭐76) - Mozilla's utility library for Hadoop, HBase, Pig, etc.
- seqpig - Simple and scalable scripting for large sequencing data set(ex: bioinfomation) in Hadoop
- Lipstick (⭐460) - Pig workflow visualization tool. Introducing Lipstick on A(pache) Pig
Libraries and Tools
- Kite Software Development Kit - A set of libraries, tools, examples, and documentation
- gohadoop (⭐307) - Native go clients for Apache Hadoop YARN.
Packaging, Provisioning and Monitoring
Benchmark
- Prev: Jul 09, 2014
- Next: Jul 07, 2014