Track Nlp with Ruby Updates Daily
Curated List: Practical Natural Language Processing done in Ruby
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 arbox/nlp-with-ruby · ⭐ 967 · 🏷️ Computer Science
Aug 27, 2022
Multipurpose Engines
- ruby-spacy (⭐34) — Wrapper module for spaCy NLP library via PyCall (⭐866).
Mar 30, 2021
Language Aware String Manipulation / Constituency Parsing
- iuliia (⭐8) — transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation (⭐59)).
Jan 17, 2019
Spelling and Error Correction / Constituency Parsing
- gingerice (⭐479) - Spelling and Grammar corrections via the Ginger API.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2018
- Natural Language Processing and Tweet Sentiment Analysis by Cassandra Corrales [post]
- 2017
- The Google NLP API Meets Ruby by Aja Hammerly [post]
- Syntax Isn't Everything: NLP For Rubyists by Aja Hammerly [slides]
- Scientific Computing on JRuby by Prasun Anand [slides | video | slides | slides]
- Unicode Normalization in Ruby by Starr Horne [post]
Related Resources / Constituency Parsing
Sep 02, 2017
Machine Learning Libraries / Constituency Parsing
- rblearn (⭐1) - Feature Extraction and Crossvalidation library.
Jul 27, 2017
Language Aware String Manipulation / Constituency Parsing
- regex_sample (⭐1) - sample string generation from a given Regular Expression.
Jul 24, 2017
Language Aware String Manipulation / Constituency Parsing
- translit_kit (⭐5) - Transliterate Hebrew & Yiddish text into Latin characters.
- re2 (⭐87) - hight-speed Regular Expression library for Text Mining and Text Extraction.
Jun 14, 2017
Related Resources / Constituency Parsing
May 24, 2017
Community / Constituency Parsing
May 19, 2017
Dialog Agents, Assistants, and Chatbots / Constituency Parsing
- chatterbot (⭐493) - Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.
- lita (⭐1.7k) - Highly extensible chat operation bot framework written with persistent storage on Redis.
May 17, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2011
- Ruby one-liners by Benoit Hamelin [post]
- Clustering in Ruby by Colin Drake [post/)]
May 16, 2017
Multipurpose Engines / On-line APIs
- google-cloud-language (⭐1.2k) - Google's Natural Language service API for Ruby.
May 07, 2017
Pipeline Generation
- parallel (⭐3.9k) - Supervisor for parallel execution on multiple CPUs or in many threads.
- pwrake (⭐57) - Rake extensions to run local and remote tasks in parallel.
May 02, 2017
Projects and Code Examples / Constituency Parsing
- RSyntaxTree - Web based demonstration of the syntactic tree visualization.
Apr 21, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2006
- Speak My Language: Natural Language Processing With Ruby by Michael Granger [slides | write-up | write-up]
Apr 16, 2017
Lexical Processing / Lexical Statistics: Counting Types and Tokens
- words_counted (⭐156) - Pure Ruby library counting word statistics with different custom options.
Full Text Search, Information Retrieval, Indexing / Constituency Parsing
- google-api-client (⭐2.6k) - Ruby API library for Google services.
Machine Translation / Constituency Parsing
- zipf (⭐2) - implementation of BLEU and other base algorithms.
Text Extraction / Constituency Parsing
- yomu (⭐478) - library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.
Projects and Code Examples / Constituency Parsing
- Words Counted - examples of customizable word statistics powered by words_counted (⭐156).
Apr 14, 2017
Optical Character Recognition / Constituency Parsing
- tesseract-ocr (⭐605) - FFI based wrapper over the Tesseract OCR Engine (⭐47k).
Language Aware String Manipulation / Constituency Parsing
- fuzzy_match (⭐650) - Fuzzy string comparison with Distance measures and Regular Expression.
- fuzzy_tools (⭐22) - Toolset for fuzzy searches in Ruby tuned for accuracy.
Needs your Help! / Constituency Parsing
- summarize (⭐205) - Ruby native wrapper for Open Text Summarizer (⭐220).
Apr 11, 2017
Machine Learning Libraries / Constituency Parsing
- ruby-fann (⭐451) - Ruby bindings to the Fast Artificial Neural Network Library (FANN).
Full Text Search, Information Retrieval, Indexing / Constituency Parsing
- rsolr (⭐416) - Ruby and Rails client library for Apache Solr.
- sunspot (⭐3k) - Rails centric client for Apache Solr.
- thinking-sphinx (⭐1.6k) - Active Record plugin for using Sphinx in (not only) Rails based projects.
- elasticsearch (⭐1.9k) - Ruby client and API for Elasticsearch.
- elasticsearch-rails (⭐3k) - Ruby and Rails integrations for Elasticsearch.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2009
- Porting the UEA-Lite Stemmer to Ruby by Jason Adams [post]
- NLP Resources for Ruby by Jason Adams [post]
Projects and Code Examples / Constituency Parsing
- Going the Distance (⭐60) - Implementations of various distance algorithms with example calculations.
- Named entity recognition with Stanford NER and Ruby (⭐17) - NER Examples in Ruby and Java with some explanations.
Needs your Help! / Constituency Parsing
- ferret (⭐280) - Information Retrieval in C and Ruby.
Apr 10, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2016
- Quickly Create a Telegram Bot in Ruby by Ardian Haxha [tutorial]
- Deep Learning: An Introduction for Ruby Developers by Geoffrey Litt [slides]
- How I made a pure-Ruby word2vec program more than 3x faster by Kei Sawada [slides]
- Dōmo arigatō, Mr. Roboto: Machine Learning with Ruby by Eric Weinstein [slides | video]
Apr 06, 2017
Machine Learning Libraries / Constituency Parsing
- decisiontree (⭐1.4k) - Decision Tree ID3 Algorithm in pure Ruby [post].
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2007
- Decision Tree Learning in Ruby by Ilya Grigorik [post]
Apr 05, 2017
Multipurpose Engines / On-line APIs
- wlapi (⭐19) - Ruby client library for Wortschatz Leipzig web services.
Lexical Processing / Filtering Stop Words
- stopwords-filter (⭐68) - Filter and Stop Word Lexicon based on the SnowBall lemmatizer.
Sentiment Analysis / Constituency Parsing
- stimmung (⭐20) - Semantic Polarity based on the SentiWS lexicon.
Mar 03, 2017
Numbers, Dates, and Time Parsing / Constituency Parsing
- numerizer (⭐34) - Ruby parser for English number expressions.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2015
- N-gram Analysis for Fun and Profit by Jesus Castello [tutorial]
- Machine Learning made simple with Ruby by Lorenzo Masini [tutorial]
- Using Ruby Machine Learning to Find Paris Hilton Quotes by Rick Carlino [tutorial]
- Exploring Natural Language Processing in Ruby by Kevin Dias [slides]
- Machine Learning made simple with Ruby by Lorenzo Masini [post]
- Practical Data Science in Ruby by Bobby Grayson [slides]
Feb 24, 2017
Spelling and Error Correction / Constituency Parsing
- hunspell-i18n (⭐4) - Ruby bindings to the standard Hunspell Spell Checker.
- ffi-hunspell (⭐49) - FFI based Ruby bindings for Hunspell.
- hunspell (⭐34) - Ruby bindings to Hunspell via Ruby C API.
Feb 17, 2017
Pipeline Generation
- phobos (⭐211) - Simplified Ruby Client for Apache Kafka.
Feb 15, 2017
Language Identification / On-line APIs
- scylla (⭐34) - Language Categorization and Identification.
Feb 10, 2017
Linguistic Resources / Constituency Parsing
- rwordnet (⭐88) - Pure Ruby self contained API library for the Princeton WordNet®.
- wordnet (⭐134) - Performance tuned bindings for the Princeton WordNet®.
Feb 03, 2017
Multipurpose Engines
- open_nlp (⭐11) - JRuby Bindings for the OpenNLP Toolkit.
Jan 29, 2017
Multipurpose Engines
- nlp_toolz (⭐2) - Wrapper over some OpenNLP classes and the original Berkeley Parser (⭐176).
Machine Learning Libraries / Constituency Parsing
- lda-ruby (⭐132) - Ruby implementation of the LDA (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.
Jan 23, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2010
- bayes_motel – Bayesian classification for Ruby by Mike Perham [post]
Jan 10, 2017
Multipurpose Engines / On-line APIs
- monkeylearn-ruby (⭐79) - Sentiment Analysis, Topic Modelling, Language Detection, Named Entity Recognition via a Ruby based Web API client.
Jan 09, 2017
Multipurpose Engines
- open-nlp (⭐89) - Ruby Bindings for the OpenNLP Toolkit.
- stanford-core-nlp (⭐429) - Ruby Bindings for the Stanford CoreNLP (⭐8.7k) tools.
Numbers, Dates, and Time Parsing / Constituency Parsing
- chronic (⭐3.1k) - Pure Ruby natural language date parser.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2014
- Natural Language Parsing with Ruby by Glauco Custódio [tutorial]
- Demystifying Data Science: Analyzing Conference Talks with Rails and Ngrams by Todd Schneider [video | code (⭐33)]
- Natural Language Processing with Ruby by Konstantin Tennhard [video | video | video | slides]
Jan 06, 2017
Multipurpose Engines
- treat (⭐1.4k) - Natural Language Processing framework for Ruby (like NLTK for Python).
Multipurpose Engines / On-line APIs
- wit-ruby (⭐280) - Ruby client library for the Wit.ai Language Understanding Platform.
Related Resources / Constituency Parsing
- Awesome TensorFlow (⭐17k) - Machine Learning with TensorFlow libraries.
Jan 04, 2017
Multipurpose Engines / On-line APIs
- alchemyapi_ruby (⭐36) - Legacy Ruby SDK for AlchemyAPI/Bluemix.
Related Resources / Constituency Parsing
- Awesome Ruby (⭐12k) - Among other awesome items a short list of NLP related projects.
- Awesome OCR (⭐2.2k) - Multitude of OCR (Optical Character Recognition) resources.
Dec 19, 2016
Pipeline Generation
- ruby-spark (⭐224) - Spark bindings with an easy to understand DSL.
Dec 12, 2016
Pipeline Generation
- composable_operations (⭐47) - Definition framework for operation pipelines.
Dec 08, 2016
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2012
- Machine Learning with Ruby, Part One by Vasily Vasinov [tutorial]
Dec 07, 2016
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2013
- How to parse 'go' - Natural Language Processing in Ruby by Tom Cartwright [slides | video]
- Natural Language Processing in Ruby by Brandon Black [slides | video]
- Natural Language Processing with Ruby: n-grams by Nathan Kleyn [tutorial | code (⭐33)]
- Seeking Lovecraft, Part 1: An introduction to NLP and the Treat Gem by Robert Qualls [tutorial]
Related Resources / Constituency Parsing
- Speech and Natural Language Processing (⭐2.1k) - General List of NLP related resources (mostly not for Ruby programmers).
Dec 06, 2016
Related Resources / Constituency Parsing
- Ruby NLP (⭐1.2k) - State-of-Art collection of Ruby libraries for NLP.
- Scientific Ruby - Linear Algebra, Visualization and Scientific Computing for Ruby.
- iRuby (⭐728) - IRuby kernel for Jupyter (formelly IPython).
Nov 30, 2016
Community / Constituency Parsing
Nov 29, 2016
Language Aware String Manipulation / Constituency Parsing
- active_support (⭐52k) -
RoR
ActiveSupport
gem has various string extensions that can handle case.
Books / Constituency Parsing
- Miller, Rob. Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Programmers, 2015. [link]
- Watson, Mark. Practical Semantic Web and Linked Data Applications. Lulu, 2010. [link]
Nov 27, 2016
Segmentation / On-line APIs
- tokenizer (⭐44) - Simple multilingual tokenizer. [tutorial]
- pragmatic_tokenizer (⭐87) - Multilingual tokenizer to split a string into tokens.
- textoken (⭐31) - Simple and customizable text tokenization library.
- pragmatic_segmenter (⭐503) - Word Boundary Disambiguation with many cookies.
- punkt-segmenter (⭐89) - Pure Ruby implementation of the Punkt Segmenter.
- tactful_tokenizer (⭐79) - RegExp based tokenizer for different languages.
- scapel (⭐53) - Sentence Boundary Disambiguation tool.
Lexical Processing / Stemming
- ruby-stemmer (⭐255) - Ruby-Stemmer exposes the SnowBall API to Ruby.
- uea-stemmer (⭐50) - Conservative stemmer for search and indexing.
Lexical Processing / Lemmatization
- lemmatizer (⭐102) - WordNet based Lemmatizer for English texts.
Lexical Processing / Lexical Statistics: Counting Types and Tokens
- wc (⭐6) - Facilities to count word occurrences in a text.
- word_count (⭐4) -
Word counter for
String
andHash
objects.
Phrasal Level Processing / Filtering Stop Words
- n_gram (⭐36) - N-Gram generator.
- ruby-ngram (⭐11) - Break words and phrases into ngrams.
- raingrams (⭐69) - Flexible and general-purpose ngrams library written in pure Ruby.
Semantic Analysis / Constituency Parsing
- amatch (⭐354) - Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').
- damerau-levenshtein (⭐127) - Calculates edit distance using the Damerau-Levenshtein algorithm.
- hotwater (⭐80) - Fast Ruby FFI string edit distance algorithms.
- levenshtein-ffi (⭐148) - Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
- tf_idf (⭐36) - Term Frequency / Inverse Document Frequency in pure Ruby.
- tf-idf-similarity (⭐652) - Calculate the similarity between texts using TF/IDF.
Pragmatical Analysis / Constituency Parsing
- SentimentLib (⭐13) - Simple extensible sentiment analysis gem.
Text Alignment / Constituency Parsing
- alignment (⭐1) - Alignment routines for bilingual texts (Gale-Church implementation).
Machine Translation / Constituency Parsing
- microsoft_translator (⭐21) - Ruby client for the microsoft translator API.
- termit (⭐507) - Google Translate with speech synthesis in your terminal.
Numbers, Dates, and Time Parsing / Constituency Parsing
- chronic_between (⭐27) - Simple Ruby natural language parser for date and time ranges.
- chronic_duration (⭐347) - Pure Ruby parser for elapsed time.
- kronic (⭐151) - Methods for parsing and formatting human readable dates.
- nickel (⭐107) - Extracts date, time, and message information from naturally worded text.
- tickle (⭐75) - Parser for recurring and repeating events.
Named Entity Recognition / Constituency Parsing
- ruby-ner (⭐17) - Named Entity Recognition with Stanford NER and Ruby.
- ruby-nlp (⭐87) - Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.
Text-to-Speech-to-Text / Constituency Parsing
- espeak-ruby (⭐186) - Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.
- tts (⭐89) - Text-to-Speech conversion using the Google translate service.
- att_speech (⭐20) - Ruby wrapper over the AT&T Speech API for speech to text.
- pocketsphinx-ruby (⭐253) - Pocketsphinx bindings.
Machine Learning Libraries / Constituency Parsing
- rb-libsvm (⭐276) - Support Vector Machines with Ruby.
- rtimbl (⭐5) - Memory based learners from the Timbl framework.
- classifier-reborn (⭐531) - General classifier module to allow Bayesian and other types of classifications.
- liblinear-ruby-swig (⭐82) - Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).
- linnaeus (⭐37) - Redis-backed Bayesian classifier.
- maxent_string_classifier (⭐9) - JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.
- naive_bayes (⭐46) - Simple Naive Bayes classifier.
- nbayes (⭐149) - Full-featured, Ruby implementation of Naive Bayes.
- omnicat (⭐11) - Generalized rack framework for text classifications.
- omnicat-bayes (⭐32) - Naive Bayes text classification implementation as an OmniCat classifier strategy.
Language Aware String Manipulation / Constituency Parsing
- fuzzy-string-match (⭐268) - Fuzzy string matching library for Ruby.
- u - U extends Ruby’s Unicode support.
- unicode (⭐82) - Unicode normalization library.
- CommonRegexRuby (⭐78) - Find a lot of kinds of common information in a string.
- regexp-examples (⭐508) - Generate strings that match a given regular expression.
- verbal_expressions (⭐570) - Make difficult regular expressions easy.
Apr 22, 2016
Segmentation / On-line APIs
- nlp-pure (⭐19) - Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
Machine Learning Libraries / Constituency Parsing
- weka (⭐67) - JRuby bindings for Weka, different ML algorithms implemented through Weka.