Track Awesome Linguistics Updates Weekly
A curated list of anything remotely related to linguistics
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 theimpossibleastronaut/awesome-linguistics · ⭐ 357 · 🏷️ Computer Science
Dec 05 - Dec 11, 2022
Platforms and toolkits
- CLARIN-D web tools - Tools for Analysing Research Data
- CorpusExplorer - Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface.
- Snowball - Snowball is a language in which stemming algorithms can be easily represented.
- Mate Tools, webservice via WebLicht
- textblob-de (⭐102) - Nice alternative for spacy (see above).
Data sets
- CEHugeWebCorpus - German corpus based on CommonCrawl
- GC4 Corpus (CommonCrawl)
- IDS Corpora - German Reference Corpus
- Leipzig Corpora Collection - sampled sentences in different languages.
- SdeWaC - big german internet corpus
Deep learning models and transformers
- GermLM (⭐14) (NER exploration)
On Wikipedia
Books
- Natural Language Processing with Python - The book from the NLTK package.
Standards
Lists
- GitHub topics corpus-linguistics & nlp
Aug 08 - Aug 14, 2022
Books
- Essentials of Linguistics, 2nd edition - An introductory book (2nd edition).
Mar 07 - Mar 13, 2022
Resources
- Language Science Press - Language Science Press is a born-digital scholar-led open access publisher in linguistics.
Jul 05 - Jul 11, 2021
Platforms and toolkits
- UBIAI - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling.
Dec 30 - Jan 05, 2019
Platforms and toolkits
- Spacy - Industrial-strength National Language Processing in Python.
Dec 02 - Dec 08, 2019
Resources
- How To Label Data - Guide on managing large scale linguistic annotation projects.
Communities
Nov 18 - Nov 24, 2019
Resources
- Low Resource Languages (⭐380) - A list of resources for conservation, development, and documentation of low resource (human) languages.
Nov 04 - Nov 10, 2019
On Youtube
- The Virtual Linguistics Campus - CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name.
Communities
Jun 03 - Jun 09, 2019
Books
Apr 08 - Apr 14, 2019
Platforms and toolkits
- UralicNLP (⭐70) - An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages.
Mar 06 - Mar 12, 2017
Data sets
- EuroRomCom Data (⭐20) - JSON formatted Pan-Romance word lists.
On Youtube
- Computational Linguistics Lecture Playlist (Youtube) - Lectures for University of Maryland class on computational linguistics.
Jan 12 - Jan 18, 2015
Books
Oct 20 - Oct 26, 2014
Platforms and toolkits
- Haxe-linguistics (⭐26) - Early linguistical analysis and natural language processing library for Haxe.
- Natural (⭐11k) - General natural language tools for Node.js.
- Natural Language ToolKit (NLTK) - The most complete platform for building Python programs to work with human language data.
Algorithms
- Stemming algorithms for various European languages - Various stemming algorithms from snowball.
- The Porter Stemmer Algorithm - The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter.