Awesome Linguistics Overview
A curated list of anything remotely related to linguistics
🏠 Home · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 theimpossibleastronaut/awesome-linguistics · ⭐ 380 · 🏷️ Computer Science
Awesome Linguistics
A curated list of anything remotely related to linguistics, sorted in alphabetical order.
Libraries, frameworks and applications useful for developing applications.
Platforms and toolkits
- CLARIN-D web tools - Tools for Analysing Research Data
- CorpusExplorer - Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface.
- Haxe-linguistics (⭐26) - Early linguistical analysis and natural language processing library for Haxe.
- Natural (⭐11k) - General natural language tools for Node.js.
- Natural Language ToolKit (NLTK) - The most complete platform for building Python programs to work with human language data.
- Snowball - Snowball is a language in which stemming algorithms can be easily represented.
- Spacy - Industrial-strength National Language Processing in Python.
- Mate Tools, webservice via WebLicht
- UBIAI - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling.
- textblob-de (⭐105) - Nice alternative for spacy (see above).
- UralicNLP (⭐72) - An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages.
- Stemming algorithms for various European languages - Various stemming algorithms from snowball.
- The Porter Stemmer Algorithm - The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter.
Data sets
- EuroRomCom Data (⭐20) - JSON formatted Pan-Romance word lists.
- Araneum Germanicum
- CEHugeWebCorpus - German corpus based on CommonCrawl
- Digitales Wörterbuch der deutschen Sprache (DWDS)
- GC4 Corpus (CommonCrawl)
- IDS Corpora - German Reference Corpus
- Leipzig Corpora Collection - sampled sentences in different languages.
- SdeWaC - big german internet corpus
- DysList (list of dyslexic errors) (⭐5)
- Falko
- Litkey
- OpinionSpam (⭐2)
- Low Resource Languages (⭐395) - A list of resources for conservation, development, and documentation of low resource (human) languages.
- Language Science Press - Language Science Press is a born-digital scholar-led open access publisher in linguistics.
Deep learning models and transformers
- dbmdz BERT models (⭐155)
- Deepset German BERT model
- Evaluating German Transformer Language Models with Syntactic Agreement Tests (⭐7)
- German ELMo Model (⭐28)
- german-transformer-training (⭐23)
- GermLM (⭐14) (NER exploration)
- GerPT2 (⭐20)
- Sentence Transformers (⭐16k)
On Wikipedia
- Bag of words model
- Document classification
- Language models
- Naive Bayes classification
- Natural language processing
- Outline of natural language processing
- Parts of speech tagging
- Sentiment analysis
- Term frequency - inverse document frequency
- Vector space model
On Youtube
- Computational Linguistics Lecture Playlist (Youtube) - Lectures for University of Maryland class on computational linguistics.
- The Virtual Linguistics Campus - CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name.
Some of the more interesting and complete books.
- Essentials of Linguistics, 2nd edition - An introductory book (2nd edition).
- Introduction to Linguistics
- Natural Language Processing with Python - The book from the NLTK package.
- Text Mining with R
Non free
- Foundations of Computational Linguistics
- Foundations of Statistical Natural Language Processing
- Semisupervised Learning for Computational Linguistics
- Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition
- The Oxford Handbook of Computational Linguistics
- 15 most popular books on good reads
- GitHub topics corpus-linguistics & nlp
- nlp-datasets (⭐5.8k)
- NLP-progress (⭐23k)
- /r/LanguageTechnology/
- awesome-nlp (⭐17k)
- Awesome Community-Curated NLP List (⭐197)
- awesome-chinese-nlp (⭐7.8k)
- awesome-danish (⭐169)
- awesome-hungarian-nlp (⭐229)
- awesome Information Retrieval (⭐1.1k)
- Indonesian NLP (⭐279)
- Norwegian NLP resources (⭐178)
- German NLP resources (⭐454)
- awesome-nlp-polish (⭐294)
- awesome-spanish-nlp (⭐333)
- M. Weisser's list of NLP/Computational Linguistics Resources