Awesome List Updates on Jul 09, 2014
5 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Shell
Customization / Directory Navigation
- bash-powerline (⭐874) - Powerline-style Bash prompt in pure Bash script
Games / Directory Navigation
- bash2048 (⭐885) - Bash implementation of 2048 game
2. Awesome Clojure
Build Automation and Package management
Date and Time
Audio
Database
ORM and SQL generation
RESTful API
HTML Manipulation
Data Validation
Async processing
Monads
WebSocket
Code Analysis and Linter
Science and Data Analysis
Websites / YouTube
Twitter / YouTube
3. Awesome Bigdata
SQL-like processing
- Apache Hive - SQL-like data warehouse system for Hadoop.
- Datasalt Splout SQL - full SQL query engine for big datasets.
- Spark Catalyst (⭐36k) - is a Query Optimization Framework for Spark and Shark.
Data Ingestion
- Apache Flume - service to manage large amount of log data.
- Apache Kafka - distributed publish-subscribe messaging system.
- Apache Sqoop - tool to transfer data between Hadoop and a structured datastore.
- HIHO (⭐90) - framework for connecting disparate data sources with Hadoop.
- LinkedIn Kamikaze (⭐22) - utility package for compressing sorted integer arrays.
- Netflix Suro (⭐777) - log agregattor like Storm and Samza based on Chukwa.
- Pinterest Secor (⭐1.8k) - is a service implementing Kafka log persistance.
Service Programming
- Akka Toolkit - runtime for distributed, and fault tolerant event-driven applications on the JVM.
- Apache Avro - data serialization system.
- Apache Curator - Java libaries for Apache ZooKeeper.
- Apache Karaf - OSGi runtime that runs on top of any OSGi framework.
- Apache Thrift - framework to build binary protocols.
- Apache Zookeeper - centralized service for process management.
- Spring XD (⭐479) - distributed and extensible system for data ingestion, real time analytics, batch processing, and data export.
- Twitter Finagle - asynchronous network stack for the JVM.
Scheduling
- Sparrow (⭐314) - scheduling platform.
Machine Learning
- etcML - text classification with machine learning.
- Etsy Conjecture (⭐358) - scalable Machine Learning in Scalding.
- MLbase - distributed machine learning libraries for the BDAS stack.
- Spark MLlib - a Spark implementation of some common machine learning (ML) functionality.
- Vowpal Wabbit (⭐8.2k) - learning system sponsored by Microsoft and Yahoo!.
- WEKA - suite of machine learning software.
Benchmarking
- Apache Hadoop Benchmarking - micro-benchmarks for testing Hadoop performances.
- Berkeley SWIM Benchmark (⭐126) - real-world big data workload benchmark.
- Intel HiBench (⭐1.4k) - a Hadoop benchmark suite.
Security
- Apache Knox Gateway - single point of secure access for Hadoop clusters.
System Deployment
- Apache Ambari - operational framework for Hadoop mangement.
- Apache Bigtop - system deployment framework for the Hadoop ecosystem.
- Apache Helix - cluster management framework.
- Apache Mesos - cluster manager.
- Apache Whirr - set of libraries for running cloud services.
- Buildoop - Similar to Apache BigTop based on Groovy language.
- Cloudera HUE - web application for interacting with Hadoop.
- Facebook Prism - multi datacenters replication system.
- Google Omega - job scheduling and monitoring system.
- Marathon (⭐4.1k) - Mesos framework for long-running services.
Applications
- Apache Nutch - open source web crawler.
- Apache Tika - content analysis toolkit.
- Eclipse BIRT - Eclipse-based reporting system.
- SparkR - R frontend for Spark.
Search engine and framework
- Apache Lucene - Search engine library.
- Apache Solr - Search platform for Apache Lucene.
- LinkedIn Bobo - is a Faceted Search implementation written purely in Java, an extension to Apache Lucene.
- LinkedIn Cleo (⭐558) - is a flexible software library for enabling rapid development of partial, out-of-order and real-time typeahead search.
- LinkedIn Zoie (⭐362) - is a realtime search/indexing system written in Java.
MySQL forks and evolutions
- Drizzle - evolution of MySQL 6.0.
- MariaDB - enhanced, drop-in replacement for MySQL.
- ProxySQL (⭐24) - High Performance Proxy for MySQL.
- WebScaleSQL - is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale.
Memcached forks and evolutions
- Facebook McDipper - key/value cache for flash storage.
- Facebook Memcached - fork of Memcache.
- Twitter Fatcache (⭐1.3k) - key/value cache for flash storage.
- Twitter Twemcache (⭐930) - fork of Memcache.
Embedded Databases
- HanoiDB (⭐298) - Erlang LSM BTree Storage.
- RocksDB - embeddable persistent key-value store for fast storage based on LevelDB.
Business Intelligence
- Jaspersoft - powerful business intelligence suite.
- Microsoft - business intelligence software and platform.
- Pentaho - business intelligence platform.
Data Visualization
- Chart.js - open source HTML5 Charts visualizations.
- NVD3 - chart components for d3.js.
Interesting Readings
- Big Data Benchmark - Benchmark of Redshift, Hive, Shark, Impala and Stiger/Tez.
2013 - 2014
- 2013 - AMPLab - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
- 2013 - AMPLab - MLbase: A Distributed Machine-learning System.
- 2013 - AMPLab - Shark: SQL and Rich Analytics at Scale.
- 2013 - AMPLab - GraphX: A Resilient Distributed Graph System on Spark.
- 2013 - Microsoft - Scalable Progressive Analytics on Big Data in the Cloud.
- 2013 - Metamarkets - Druid: A Real-time Analytical Data Store.
- 2013 - Google - Online, Asynchronous Schema Change in F1.
- 2013 - Google - MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
- 2013 - Facebook - Scuba: Diving into Data at Facebook.
- 2013 - Facebook - Unicorn: A System for Searching the Social Graph.
- 2013 - Facebook - Scaling Memcache at Facebook.
2011 - 2012
- 2012 - AMPLab - Blink and It’s Done: Interactive Queries on Very Large Data.
- 2012 - AMPLab - Fast and Interactive Analytics over Hadoop Data with Spark.
- 2012 - AMPLab - Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
- 2012 - Microsoft - Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
- 2012 - Microsoft - Paxos Made Parallel.
- 2012 - Google - Processing a trillion cells per mouse click.
- 2011 - AMPLab - Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
- 2011 - AMPLab - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
2001 - 2010
- 2010 - Facebook - Finding a needle in Haystack: Facebook’s photo storage.
- 2010 - AMPLab - Spark: Cluster Computing with Working Sets.
- 2010 - Google - Pregel: A System for Large-Scale Graph Processing.
- 2007 - Amazon - Dynamo: Amazon’s Highly Available Key-value Store.
- 2006 - Google - Bigtable: A Distributed Storage System for Structured Data.
4. Awesome Hadoop
Hadoop
- hadoopy (⭐243) - Python MapReduce library written in Cython.
- mrjob (⭐2.6k) - mrjob is a Python 2.5+ package that helps you write and run Hadoop Streaming jobs.
- pydoop - Pydoop is a package that provides a Python API for Hadoop.
- hdfs-du (⭐228) - HDFS-DU is an interactive visualization of the Hadoop distributed file system.
- White Elephant (⭐191) - Hadoop log aggregator and dashboard
Packaging, Provisioning and Monitoring
- Apache Bigtop - Apache Bigtop: Packaging and tests of the Apache Hadoop ecosystem
Websites
5. Awesome Javascript
Editors / Runner
- ace (⭐27k) - Ace (Ajax.org Cloud9 Editor).
- esprima (⭐409) - ECMAScript parsing infrastructure for multipurpose analysis.
- quill (⭐42k) - A cross browser rich text editor with an API.
- pen (⭐4.8k) - enjoy live editing (+markdown).
- jquery-notebook (⭐1.7k) - A simple, clean and elegant text editor. Inspired by the awesomeness of Medium.
- ckeditor-releases (⭐518) - The best web text editor for everyone.
- editor (⭐2.8k) - A markdown editor. still on development.
- EpicEditor (⭐4.3k) - An embeddable JavaScript Markdown editor with split fullscreen editing, live previewing, automatic draft saving, offline support, and more.
- jsoneditor (⭐11k) - A web-based tool to view, edit and format JSON.
Color / Runner
- colors (⭐9.3k) - Smarter defaults for colors on the web.
- Prev: Jul 10, 2014
- Next: Jul 08, 2014