Awesome List Updates on Aug 02, 2014
3 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Bigdata
Frameworks
- Apache Hadoop - framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system).
Distributed Programming
- AddThis Hydra (⭐438) - distributed data processing and storage system originally developed at AddThis.
- Apache DataFu - collection of user-defined functions for Hadoop and Pig developed by LinkedIn.
- DataTorrent StrAM - real-time engine is designed to enable distributed, asynchronous, real time in-memory big-data computations in as unblocked a way as possible, with minimal overhead and impact on performance.
Document Data Model
- Facebook Apollo - Facebook’s Paxos-like NoSQL database.
Key-value Data Model
- Oracle NoSQL Database - distributed key-value database by Oracle Corporation.
Graph Data Model
- Gremlin (⭐1.9k) - graph traversal Language.
- Infovore (⭐148) - RDF-centric Map/Reduce framework.
NewSQL Databases
- Actian Ingres - commercially supported, open-source SQL relational database management system.
- Cockroach (⭐27k) - Scalable, Geo-Replicated, Transactional Datastore.
- FoundationDB - distributed database, inspired by F1.
- Oracle TimesTen in-Memory Database - in-memory, relational database management system with persistence and recoverability.
Time-Series Databases
- OpenTSDB - distributed time series database on top of HBase.
SQL-like processing
- RainstorDB - database for storing petabyte-scale volumes of structured and semi-structured data.
- Trafodion - enterprise-class SQL-on-HBase solution targeting big data transactional or operational workloads.
Data Ingestion
- LinkedIn White Elephant (⭐190) - log aggregator and dashboard.
Service Programming
- Spotify Luigi (⭐17k) - a Python package for building complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Scheduling
- Apache Oozie - workflow job scheduler.
Benchmarking
- PUMA Benchmarking - benchmark suite for MapReduce applications.
Security
- Apache Sentry - security module for data stored in Hadoop.
System Deployment
- Brooklyn - library that simplifies application deployment and management.
Applications
- Apache OODT - capturing, processing and sharing of data for NASA's scientific archives.
Search engine and framework
- HBase Coprocessor - implementation of Percolator, part of HBase.
- Lily HBase Indexer - quickly and easily search for any content stored in HBase.
PostgreSQL forks and evolutions
- HadoopDB - hybrid of MapReduce and DBMS.
- IBM Netezza - high-performance data warehouse appliances.
- Postgres-XL - Scalable Open Source PostgreSQL-based Database Cluster.
- RecDB - Open Source Recommendation Engine Built Entirely Inside PostgreSQL.
- Stado - open source MPP database system solely targeted at data warehousing and data mart applications.
Memcached forks and evolutions
- Twemproxy (⭐12k) - A fast, light-weight proxy for memcached and redis.
Embedded Databases
- Actian PSQL - ACID-compliant DBMS developed by Pervasive Software, optimized for embedding in applications.
Data Visualization
- Cytoscape - JavaScript library for visualizing complex networks.
- Google Charts - simple charting API.
- Peity (⭐4.2k) - Progressive SVG bar, line and pie charts.
2. Awesome Laravel
Guidelines / Meetups
- Please make an individual pull request for each suggestion
3. Awesome Machine Learning
Clojure / Data Analysis
- PigPen (⭐565) - Map-Reduce for Clojure.
Python / Data Analysis / Data Visualization
- Petrel (⭐246) - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
- emcee (⭐1.5k) - The Python ensemble sampling toolkit for affine-invariant MCMC.
Python / Misc Scripts / iPython Notebooks / Codebases
- Allen Downey’s Data Science Course (⭐42) - Code for Data Science at Olin College, Spring 2014.
- Allen Downey’s Think Bayes Code (⭐1.6k) - Code repository for Think Bayes.
- Allen Downey’s Think Complexity Code (⭐96) - Code for Allen Downey's book Think Complexity.
- Allen Downey’s Think OS Code (⭐548) - Text and supporting code for Think OS: A Brief Introduction to Operating Systems.
- Prev: Aug 03, 2014
- Next: Aug 01, 2014