Track Awesome Python Data Science Updates Daily
Probably the best curated list of data science software in Python.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 krzjoa/awesome-python-data-science · ⭐ 2.5K · 🏷️ Programming Languages
Oct 04, 2024
Data Validation / Synthetic Data
- DataComPy (⭐470)- A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy.
Aug 29, 2024
Time Series / Others
- skforecast (⭐1.1k) - Time series forecasting with machine learning models
May 10, 2024
Genetic Programming / Others
- PyGAD (⭐1.8k) - Genetic Algorithm in Python.
May 06, 2024
Optimization / Others
- pymoo (⭐2.2k) - Multi-objective Optimization in Python.
- pycma (⭐1.1k) - Python implementation of CMA-ES.
Oct 19, 2023
Machine Learning / General Purpose Machine Learning
- PyCaret (⭐8.9k) - An open-source, low-code machine learning library in Python.
Reinforcement Learning / Others
- DI-engine (⭐3k) - OpenDILab Decision AI Engine.
- Imitation (⭐1.3k) - Clean PyTorch implementations of imitation and reward learning algorithms.
Oct 17, 2023
Computer Vision / Others
- PyTorch3D (⭐8.7k) - PyTorch3D is FAIR's library of reusable components for deep learning with 3D data.
- Decord (⭐1.8k) - An efficient video loader for deep learning with smart shuffling that's super easy to digest.
- MMEngine (⭐1.2k) - OpenMMLab Foundational Library for Training Deep Learning Models.
- LAVIS (⭐9.7k) - A One-stop Library for Language-Vision Intelligence.
Reinforcement Learning / Others
- MAgent2 (⭐219) - An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments.
Learning-to-Rank & Recommender Systems / Others
- LightFM (⭐4.7k) - A Python implementation of LightFM, a hybrid recommendation algorithm.
- Spotlight - Deep recommender models using PyTorch.
- Surprise (⭐6.4k) - A Python scikit for building and analyzing recommender systems.
- RecBole (⭐3.4k) - A unified, comprehensive and efficient recommendation library.
- allRank (⭐857) - allRank is a framework for training learning-to-rank neural models based on PyTorch.
- TensorFlow Recommenders (⭐1.8k) - A library for building recommender system models using TensorFlow.
- TensorFlow Ranking (⭐2.7k) - Learning to Rank in TensorFlow.
Deployment / NLP
- streamsync (⭐1.3k) - No-code in the front, Python in the back. An open-source framework for creating data apps.
- Vizro (⭐2.6k) - A toolkit for creating modular data visualization applications.
Conversion / Synthetic Data
- treelite (⭐730) - Universal model exchange and serialization format for decision tree forests.
Sep 25, 2023
Automated Machine Learning / Others
- Auto-PyTorch (⭐2.4k) - Automatic architecture search and hyperparameter optimization for PyTorch.
Reinforcement Learning / Others
- PettingZoo (⭐2.6k) - An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities.
Sep 24, 2023
Reinforcement Learning / Others
- Shimmy (⭐133) - An API conversion tool for popular external reinforcement learning environments.
- EnvPool (⭐1.1k) - C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
Sep 22, 2023
Deep Learning / JAX
- JAX (⭐30k) - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
- FLAX (⭐6k) - A neural network library for JAX that is designed for flexibility.
- Optax (⭐1.6k) - A gradient processing and optimization library for JAX.
Reinforcement Learning / Others
- rlpyt (⭐2.2k) - Reinforcement Learning in PyTorch.
- cleanrl (⭐5.4k) - High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG).
- Machin (⭐397) - A reinforcement library designed for pytorch.
- SKRL (⭐518) - Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym.
Graph Machine Learning / Others
- PyTorch Geometric Signed Directed (⭐121) - A signed/directed graph neural network extension library for PyTorch Geometric.
- StellarGraph (⭐2.9k) - Machine Learning on Graphs.
- Graph Nets (⭐5.3k) - Build Graph Nets in Tensorflow.
- TensorFlow GNN (⭐1.3k) - A library to build Graph Neural Networks on the TensorFlow platform.
- Auto Graph Learning (⭐1.1k) - An autoML framework & toolkit for machine learning on graphs.
- PyTorch-BigGraph (⭐3.4k) - Generate embeddings from large-scale graph-structured data.
- GreatX (⭐83) - A graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG).
- Jraph (⭐1.4k) - A Graph Neural Network Library in Jax.
Sep 21, 2023
Machine Learning / General Purpose Machine Learning
- Shogun (⭐3k) - Machine learning toolbox.
Machine Learning / Gradient Boosting
- NGBoost (⭐1.6k) - Natural Gradient Boosting for Probabilistic Prediction.
- TensorFlow Decision Forests (⭐658) - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Deep Learning / Others
- transformers (⭐133k) - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Automated Machine Learning / Others
- AutoKeras (⭐9.1k) - AutoML library for deep learning.
Natural Language Processing / Others
- KerasNLP (⭐766) - Modular Natural Language Processing workflows with Keras.
Computer Vision / Others
- KerasCV (⭐1k) - Industry-strength Computer Vision workflows with Keras.
Feature Engineering / General
- OpenFE (⭐775) - Automated feature generation with expert-level performance.
Sep 20, 2023
Graph Machine Learning / Others
- dgl (⭐13k) - Python package built to ease deep learning on graph, on top of existing DL frameworks.
Sep 18, 2023
Reinforcement Learning / Others
- Gymnasium (⭐6.9k) - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym (⭐35k)).
- Stable Baselines3 (⭐8.8k) - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- Tianshou (⭐7.8k) - An elegant PyTorch deep reinforcement learning library.
- Acme (⭐3.5k) - A library of reinforcement learning components and agents.
- Catalyst-RL (⭐46) - PyTorch framework for RL research.
- d3rlpy (⭐1.3k) - An offline deep reinforcement learning library.
Probabilistic Graphical Models / Others
- pyAgrum - A GRaphical Universal Modeler.
Aug 24, 2023
Data Manipulation / Pipelines
- Hamilton (⭐1.8k) - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
May 26, 2023
Quantum Computing / Synthetic Data
- qiskit (⭐5.1k) - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
Feb 23, 2023
Optimization / Others
- Optuna (⭐11k) - A hyperparameter optimization framework.
Feature Engineering / General
- dirty_cat (⭐15) - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression).
- NitroFE (⭐106) - Moving window features.
Feature Engineering / Feature Selection
- zoofs (⭐240) - A feature selection library based on evolutionary algorithms.
Jan 30, 2023
Data Manipulation / Data Frames
- polars (⭐30k) - A fast multi-threaded, hybrid-out-of-core DataFrame library.
Jan 08, 2023
Deployment / NLP
- gradio (⭐33k) - Create UIs for your machine learning model in Python in 3 minutes.
Dec 22, 2022
Data Validation / Synthetic Data
- great_expectations (⭐9.9k) - Always know what to expect from your data.
- pandera (⭐3.3k) - A lightweight, flexible, and expressive statistical data testing library.
- deepchecks (⭐3.6k) - Validation & testing of ML models and data during model development, deployment, and production.
- evidently (⭐5.2k) - Evaluate and monitor ML models from validation to production.
- TensorFlow Data Validation (⭐758) - Library for exploring and validating machine learning data.
Dec 17, 2022
Deep Learning / PyTorch
- pytorch-lightning (⭐28k) - PyTorch Lightning is just organized PyTorch.
Model Explanation / Others
- dalex (⭐1.4k) - moDel Agnostic Language for Exploration and explanation.
Optimization / Others
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
Feature Engineering / General
- sk-transformer (⭐8) - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps
Data Manipulation / Data Frames
- xarray (⭐3.6k) - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
Data Manipulation / Synthetic Data
- ydata-synthetic (⭐1.4k) - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models.
Experimentation / Synthetic Data
- mlflow (⭐18k) - Open source platform for the machine learning lifecycle.
- dvc (⭐14k) - Data Version Control | Git for Data & Models | ML Experiments Management.
Computations / Synthetic Data
- NumExpr (⭐2.2k) - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
Quantum Computing / Synthetic Data
- cirq (⭐4.2k) - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
Nov 16, 2022
Automated Machine Learning / Others
- AutoGluon (⭐7.7k) - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
Data Manipulation / Data-centric AI
- cleanlab (⭐9.4k) - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
- snorkel (⭐5.8k) - A system for quickly generating training data with weak supervision.
- dataprep (⭐2k) - Collect, clean, and visualize your data in Python with a few lines of code.
Aug 31, 2022
Optimization / Others
- sklearn-genetic-opt (⭐307) - Hyperparameters tuning and feature selection using evolutionary algorithms.
Aug 24, 2022
Feature Engineering / General
- Feature Engine (⭐1.9k) - Feature engineering package with sklearn-like functionality.
Aug 10, 2022
Probabilistic Graphical Models / Others
- pomegranate (⭐3.4k) - Probabilistic and graphical models for Python.
Jul 29, 2022
Deep Learning / PyTorch
- ChemicalX (⭐708) - A PyTorch-based deep learning library for drug pair scoring.
Time Series / Others
- darts (⭐8k) - A python library for easy manipulation and forecasting of time series.
- statsforecast (⭐3.9k) - Lightning fast forecasting with statistical and econometric models.
- mlforecast (⭐858) - Scalable machine learning-based time series forecasting.
- neuralforecast (⭐3k) - Scalable machine learning-based time series forecasting.
- greykite (⭐1.8k) - A flexible, intuitive, and fast forecasting library next.
- Chaos Genius (⭐728) - ML powered analytics engine for outlier/anomaly detection and root cause analysis
Experimentation / Synthetic Data
- envd (⭐2k) - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
Jan 12, 2022
Machine Learning / General Purpose Machine Learning
- sklearn-expertsys (⭐488) - Highly interpretable classifiers for scikit learn.
Dec 03, 2021
Time Series / Others
- sktime (⭐7.8k) - A unified framework for machine learning with time series.
- tslearn (⭐2.9k) - Machine learning toolkit dedicated to time-series data.
- tick (⭐487) - Module for statistical learning, with a particular emphasis on time-dependent modeling.
- Prophet (⭐18k) - Automatic Forecasting Procedure.
- PyFlux (⭐2.1k) - Open source time series library for Python.
- bayesloop (⭐152) - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- luminol (⭐1.2k) - Anomaly Detection and Correlation library.
- dateutil - Powerful extensions to the standard datetime module
- maya (⭐3.4k) - makes it very easy to parse a string and for changing timezones
Sep 02, 2021
Experimentation / Synthetic Data
- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
Mar 25, 2021
Visualization / Interactive plots
- pyecharts (⭐15k) - Migrated from Echarts (⭐60k), a charting and visualization library, to Python's interactive visual drawing library.
Jan 01, 2021
Model Explanation / Others
- Shapley (⭐218) - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
Oct 13, 2020
Deep Learning / TensorFlow
- Keras - A high-level neural networks API running on top of TensorFlow.
Deep Learning / Others
- Tangent (⭐2.3k) - Source-to-Source Debuggable Derivatives in Pure Python.
- autograd (⭐7k) - Efficiently computes derivatives of numpy code.
- Caffe (⭐34k) - A fast open framework for deep learning.
- nnabla (⭐2.7k) - Neural Network Libraries by Sony.
Sep 25, 2020
Reinforcement Learning / Others
- TF-Agents (⭐2.8k) - A library for Reinforcement Learning in TensorFlow.
Deployment / NLP
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- binder - Enable sharing and execute Jupyter Notebooks
Web Scraping / Synthetic Data
- Pattern (⭐8.7k): High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
Jul 31, 2020
Visualization / Interactive plots
- Bokeh (⭐19k) - Interactive Web Plotting for Python.
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
- bqplot (⭐3.6k) - Plotting library for IPython/Jupyter notebooks
Visualization / Automatic Plotting
- HoloViews (⭐2.7k) - Stop plotting your data - annotate your data and let it visualize itself.
- AutoViz (⭐1.7k): Visualize data automatically with 1 line of code (ideal for machine learning)
- SweetViz (⭐2.9k): Visualize and compare datasets, target values and associations, with one line of code.
Visualization / NLP
- pyLDAvis (⭐1.8k): Visualize interactive topic model
Data Manipulation / Data Frames
- pandas_profiling (⭐12k) - Create HTML profiling reports from pandas DataFrame objects
Web Scraping / Synthetic Data
- BeautifulSoup: The easiest library to scrape static websites for beginners
- Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
- Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
- twitterscraper (⭐2.4k): Efficient library to scrape Twitter
Jul 25, 2020
Graph Machine Learning / Others
- pytorch_geometric_temporal (⭐2.6k) - Temporal Extension Library for PyTorch Geometric.
Jul 23, 2020
Visualization / Map
- folium - Makes it easy to visualize data on an interactive open street map
- geemap (⭐3.4k) - Python package for interactive mapping with Google Earth Engine (GEE)
Jul 21, 2020
Deployment / NLP
- streamlit - Make it easy to deploy the machine learning model
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
Jun 17, 2020
Machine Learning / General Purpose Machine Learning
- causalml (⭐5k) - Uplift modeling and causal inference with machine learning algorithms.
Data Manipulation / Data Frames
- vaex (⭐8.3k) - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
May 18, 2020
Graph Machine Learning / Others
- Little Ball of Fur (⭐700) - A library for sampling graph structured data.
Jan 25, 2020
Graph Machine Learning / Others
- Karate Club (⭐2.1k) - An unsupervised machine learning library for graph-structured data.
Nov 20, 2019
Deep Learning / PyTorch
- Catalyst (⭐3.3k) - High-level utils for PyTorch DL & RL research.
Nov 10, 2019
Data Manipulation / Pipelines
- dopanda (⭐473) - Hints and tips for using pandas in an analysis environment.
Oct 29, 2019
Optimization / Others
- scikit-opt (⭐5.2k) - Heuristic Algorithms for optimization.
Oct 28, 2019
Data Manipulation / Data Frames
- pandas-log (⭐214) - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
Oct 26, 2019
Visualization / Interactive plots
- plotly - A Python library that makes interactive and publication-quality graphs.
Oct 06, 2019
Machine Learning / General Purpose Machine Learning
- hyperlearn (⭐1.8k) - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels.
Natural Language Processing / Others
- spaCy - Industrial-Strength Natural Language Processing.
Sep 24, 2019
Deep Learning / TensorFlow
- tensorpack (⭐6.3k) - A Neural Net Training Interface on TensorFlow.
Sep 23, 2019
Reinforcement Learning / Others
- Dopamine (⭐11k) - A research framework for fast prototyping of reinforcement learning algorithms.
Statistics / NLP
- weightedcalcs (⭐103) - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
Distributed Computing / Synthetic Data
- PaddlePaddle (⭐22k) - PArallel Distributed Deep LEarning.
Evaluation / Synthetic Data
- sklearn-evaluation (⭐3) - Model evaluation made easy: plots, tables, and markdown reports.
Sep 15, 2019
Statistics / NLP
- statsmodels (⭐10k) - Statistical modeling and econometrics in Python.
Sep 05, 2019
Quantum Computing / Synthetic Data
- PennyLane (⭐2.3k) - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
Sep 04, 2019
Deep Learning / TensorFlow
- keras-contrib (⭐1.6k) - Keras community contributions.
- Hyperas (⭐2.2k) - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter.
- Elephas (⭐1.6k) - Distributed Deep learning with Keras & Spark.
- qkeras (⭐532) - A quantization deep learning library.
Reinforcement Learning / Others
- RLlib - Scalable Reinforcement Learning.
- TensorForce (⭐3.3k) - A TensorFlow library for applied reinforcement learning.
- TRFL (⭐3.1k) - TensorFlow Reinforcement Learning.
- keras-rl (⭐5.5k) - Deep Reinforcement Learning for Keras.
- garage (⭐1.9k) - A toolkit for reproducible reinforcement learning research.
- Horizon (⭐3.6k) - A platform for Applied Reinforcement Learning.
Graph Machine Learning / Others
- Spektral (⭐2.4k) - Deep learning on graphs.
Sep 03, 2019
Distributed Computing / Synthetic Data
- Horovod (⭐14k) - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
- PySpark - Exposes the Spark programming model to Python.
- Veles (⭐906) - Distributed machine learning platform.
- Jubatus (⭐705) - Framework and Library for Distributed Online Machine Learning.
- DMTK (⭐2.7k) - Microsoft Distributed Machine Learning Toolkit.
- dask-ml (⭐894) - Distributed and parallel machine learning.
- Distributed (⭐1.6k) - Distributed computation in Python.
Sep 02, 2019
Visualization / General Purposes
- chartify (⭐3.5k) - Python library that makes it easy for data scientists to create charts.
- physt (⭐131) - Improved histograms.
Visualization / Interactive plots
- animatplot (⭐410) - A python package for animating plots built on matplotlib.
Aug 31, 2019
Machine Learning / General Purpose Machine Learning
- scikit-learn - Machine learning in Python.
- cuML (⭐4.2k) - RAPIDS Machine Learning Library.
- modAL (⭐2.2k) - Modular active learning framework for Python3.
- Sparkit-learn (⭐1.2k) - PySpark + scikit-learn = Sparkit-learn.
- MLxtend (⭐4.9k) - Extension and helper modules for Python's data analysis and machine learning libraries.
- Reproducible Experiment Platform (REP) (⭐687) - Machine Learning toolbox for Humans.
- scikit-multilearn (⭐918) - Multi-label classification for python.
- seqlearn (⭐686) - Sequence classification toolkit for Python.
- pystruct (⭐665) - Simple structured learning framework for Python.
- RuleFit (⭐408) - Implementation of the rulefit.
- metric-learn (⭐1.4k) - Metric learning algorithms in Python.
Machine Learning / Gradient Boosting
- XGBoost (⭐26k) - Scalable, Portable, and Distributed Gradient Boosting.
- LightGBM (⭐17k) - A fast, distributed, high-performance gradient boosting.
- CatBoost (⭐8k) - An open-source gradient boosting on decision trees library.
- ThunderGBM (⭐692) - Fast GBDTs and Random Forests on GPUs.
Machine Learning / Ensemble Methods
- ML-Ensemble - High performance ensemble learning.
- Stacking (⭐217) - Simple and useful stacking library written in Python.
- stacked_generalization (⭐117) - Library for machine learning stacking generalization.
- vecstack (⭐683) - Python package for stacking (machine learning technique).
Machine Learning / Imbalanced Datasets
- imbalanced-learn (⭐6.8k) - Module to perform under-sampling and over-sampling with various techniques.
- imbalanced-algorithms (⭐234) - Python-based implementations of algorithms for learning on imbalanced data.
Machine Learning / Random Forests
- rpforest (⭐223) - A forest of random projection trees.
- sklearn-random-bits-forest (⭐9) - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).
- rgf_python (⭐373) - Python Wrapper of Regularized Greedy Forest.
Machine Learning / Kernel Methods
- pyFM (⭐921) - Factorization machines in python.
- fastFM (⭐1.1k) - A library for Factorization Machines.
- tffm (⭐780) - TensorFlow implementation of an arbitrary order Factorization Machine.
- scikit-rvm (⭐229) - Relevance Vector Machine implementation using the scikit-learn API.
- ThunderSVM (⭐1.6k) - A fast SVM Library on GPUs and CPUs.
Deep Learning / PyTorch
- PyTorch (⭐82k) - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
- ignite (⭐4.5k) - High-level library to help with training neural networks in PyTorch.
- skorch (⭐5.8k) - A scikit-learn compatible neural network library that wraps PyTorch.
Deep Learning / TensorFlow
- TensorFlow (⭐186k) - Computation using data flow graphs for scalable machine learning by Google.
- TensorLayer (⭐7.3k) - Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
- TFLearn (⭐9.6k) - Deep learning library featuring a higher-level API for TensorFlow.
- Sonnet (⭐9.8k) - TensorFlow-based neural network library.
- Polyaxon (⭐3.6k) - A platform that helps you build, manage and monitor deep learning models.
- tfdeploy (⭐352) - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy.
- tensorflow-upstream (⭐686) - TensorFlow ROCm port.
- TensorFlow Fold (⭐1.8k) - Deep learning with dynamic computation graphs in TensorFlow.
- TensorLight (⭐11) - A high-level framework for TensorFlow.
- Mesh TensorFlow (⭐1.6k) - Model Parallelism Made Easier.
- Ludwig (⭐11k) - A toolbox that allows one to train and test deep learning models without the need to write code.
Deep Learning / MXNet
- MXNet (⭐21k) - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler.
- Gluon (⭐2.3k) - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet).
- Xfer (⭐253) - Transfer Learning library for Deep Neural Networks.
- MXNet (⭐28) - HIP Port of MXNet.
Automated Machine Learning / Others
- auto-sklearn (⭐7.6k) - An AutoML toolkit and a drop-in replacement for a scikit-learn estimator.
- TPOT (⭐9.7k) - AutoML tool that optimizes machine learning pipelines using genetic programming.
Natural Language Processing / Others
- torchtext (⭐3.5k) - Data loaders and abstractions for text and NLP.
- gluon-nlp (⭐2.6k) - NLP made easy.
- pyMorfologik (⭐18) - Python binding for Morfologik.
- skift (⭐234) - Scikit-learn wrappers for Python fastText.
- flair (⭐14k) - Very simple framework for state-of-the-art NLP.
Computer Audition / Others
- torchaudio (⭐2.5k) - An audio library for PyTorch.
Computer Vision / Others
- torchvision (⭐16k) - Datasets, Transforms, and Models specific to Computer Vision.
- gluon-cv (⭐5.8k) - Provides implementations of the state-of-the-art deep learning models in computer vision.
Graph Machine Learning / Others
- pytorch_geometric (⭐21k) - Geometric Deep Learning Extension Library for PyTorch.
Probabilistic Methods / Others
- pyro (⭐8.5k) - A flexible, scalable deep probabilistic programming library built on PyTorch.
- ZhuSuan - Bayesian Deep Learning.
- GPflow - Gaussian processes in TensorFlow.
- InferPy (⭐146) - Deep Probabilistic Modelling Made Easy.
- sklearn-bayes (⭐513) - Python package for Bayesian Machine Learning with scikit-learn API.
- skpro (⭐232) - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute.
- PyVarInf (⭐358) - Bayesian Deep Learning methods with Variational Inference for PyTorch.
- GPyTorch (⭐3.5k) - A highly efficient and modular implementation of Gaussian Processes in PyTorch.
- sklearn-crfsuite (⭐426) - A scikit-learn-inspired API for CRFsuite.
Model Explanation / Others
- Contrastive Explanation (⭐44) - Contrastive Explanation (Foil Trees).
- yellowbrick (⭐4.3k) - Visual analysis and diagnostic tools to facilitate machine learning model selection.
- scikit-plot (⭐2.4k) - An intuitive library to add plotting functionality to scikit-learn objects.
- shap (⭐23k) - A unified approach to explain the output of any machine learning model.
- Lime (⭐12k) - Explaining the predictions of any machine learning classifier.
- FairML (⭐359) - FairML is a python toolbox auditing the machine learning models for bias.
- model-analysis (⭐1.3k) - Model analysis tools for TensorFlow.
- themis-ml (⭐124) - A library that implements fairness-aware machine learning algorithms.
- treeinterpreter (⭐744) - Interpreting scikit-learn's decision tree and random forest predictions.
Genetic Programming / Others
- gplearn (⭐1.6k) - Genetic Programming in Python.
- karoo_gp (⭐157) - A Genetic Programming platform for Python with GPU support.
- sklearn-genetic (⭐324) - Genetic feature selection module for scikit-learn.
Optimization / Others
- BoTorch (⭐3.1k) - Bayesian optimization in PyTorch.
- hyperopt-sklearn (⭐1.6k) - Hyper-parameter optimization for sklearn.
- sklearn-deap (⭐772) - Use evolutionary algorithms instead of gridsearch in scikit-learn.
- sigopt_sklearn (⭐75) - SigOpt wrappers for scikit-learn methods.
- GPflowOpt (⭐270) - Bayesian Optimization using GPflow.
Feature Engineering / General
- skl-groups (⭐41) - A scikit-learn addon to operate on set/"group"-based features.
- Feature Forge (⭐382) - A set of tools for creating and testing machine learning features.
- few (⭐51) - A feature engineering wrapper for sklearn.
- scikit-mdr (⭐126) - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
- tsfresh (⭐8.4k) - Automatic extraction of relevant features from time series.
Feature Engineering / Feature Selection
- scikit-feature (⭐1.5k) - Feature selection repository in Python.
- boruta_py (⭐1.5k) - Implementations of the Boruta all-relevant feature selection method.
- BoostARoota (⭐218) - A fast xgboost feature selection algorithm.
- scikit-rebate (⭐408) - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Statistics / NLP
- pandas_summary (⭐501) - Extension to pandas dataframes describe function.
- Pandas Profiling (⭐12k) - Create HTML profiling reports from pandas DataFrame objects.
- Alphalens (⭐3.3k) - Performance analysis of predictive (alpha) stock factors.
Data Manipulation / Data Frames
- datatable (⭐1.8k) - Data.table for Python.
- cuDF (⭐8.3k) - GPU DataFrame Library.
- blaze (⭐3.2k) - NumPy and pandas interface to Big Data.
- pandasql (⭐1.3k) - Allows you to query pandas DataFrames using SQL syntax.
- pandas-gbq (⭐446) - pandas Google Big Query.
- pysparkling (⭐261) - A pure Python implementation of Apache Spark's RDD and DStream interfaces.
- modin (⭐9.8k) - Speed up your pandas workflows by changing a single line of code.
Data Manipulation / Pipelines
- pandas-ply (⭐200) - Functional data manipulation for pandas.
- Dplython (⭐764) - Dplyr for Python.
- sklearn-pandas (⭐2.8k) - pandas integration with sklearn.
- pyjanitor (⭐1.3k) - Clean APIs for data cleaning.
Experimentation / Synthetic Data
- Sacred (⭐4.2k) - A tool to help you configure, organize, log, and reproduce experiments.
- Ax (⭐2.4k) - Adaptive Experimentation Platform.
Computations / Synthetic Data
- Dask (⭐12k) - Parallel computing with task scheduling.
Spatial Analysis / Synthetic Data
- GeoPandas (⭐4.5k) - Python tools for geographic data.
Aug 30, 2019
Model Explanation / Others
- Auralisation (⭐42) - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization (⭐393) - A visualization of the CapsNet layers to better understand how it works.
- lucid (⭐4.7k) - A collection of infrastructure and tools for research in neural network interpretability.
- Netron (⭐28k) - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
- FlashLight - Visualization Tool for your NeuralNetwork.
- tensorboard-pytorch (⭐7.9k) - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
Data Manipulation / Data Frames
- swifter (⭐2.5k) - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
Data Manipulation / Pipelines
- meza (⭐414) - A Python toolkit for processing tabular data.
Aug 27, 2019
Machine Learning / General Purpose Machine Learning
- xLearn (⭐3.1k) - High Performance, Easy-to-use, and Scalable Machine Learning Package.
- mlpack (⭐5k) - A scalable C++ machine learning library (Python bindings).
- dlib (⭐13k) - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
- pyGAM (⭐866) - Generalized Additive Models in Python.
Machine Learning / Kernel Methods
- liquidSVM (⭐66) - An implementation of SVMs.
Automated Machine Learning / Others
- MLBox (⭐1.5k) - A powerful Automated Machine Learning python library.
Natural Language Processing / Others
- NLTK (⭐13k) - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- CLTK (⭐835) - The Classical Language Toolkik.
- gensim - Topic Modelling for Humans.
- Phonemizer (⭐1.2k) - Simple text-to-phonemes converter for multiple languages.
Computer Audition / Others
- librosa (⭐7.1k) - Python library for audio and music analysis.
- Yaafe (⭐243) - Audio features extraction.
- aubio (⭐3.3k) - A library for audio and music analysis.
- Essentia (⭐2.8k) - Library for audio and music analysis, description, and synthesis.
- LibXtract (⭐226) - A simple, portable, lightweight library of audio feature extraction functions.
- Marsyas (⭐406) - Music Analysis, Retrieval, and Synthesis for Audio Signals.
- muda (⭐231) - A library for augmenting annotated audio data.
- madmom (⭐1.3k) - Python audio and music signal processing library.
Computer Vision / Others
- OpenCV (⭐78k) - Open Source Computer Vision Library.
- scikit-image (⭐6k) - Image Processing SciKit (Toolbox for SciPy).
- imgaug (⭐14k) - Image augmentation for machine learning experiments.
- imgaug_extension - Additional augmentations for imgaug.
- Augmentor (⭐5.1k) - Image augmentation library in Python for machine learning.
- albumentations (⭐14k) - Fast image augmentation library and easy-to-use wrapper around other libraries.
Probabilistic Graphical Models / Others
- pgmpy (⭐2.7k) - A python library for working with Probabilistic Graphical Models.
Probabilistic Methods / Others
- PyMC (⭐8.7k) - Bayesian Stochastic Modelling in Python.
- PyStan (⭐338) - Bayesian inference using the No-U-Turn sampler (Python interface).
- emcee (⭐1.5k) - The Python ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn (⭐78) - A library for hidden semi-Markov models with explicit durations.
- pyhsmm (⭐548) - Bayesian inference in HSMMs and HMMs.
Model Explanation / Others
- Alibi (⭐2.4k) - Algorithms for monitoring and explaining machine learning models.
- anchor (⭐795) - Code for "High-Precision Model-Agnostic Explanations" paper.
- aequitas (⭐679) - Bias and Fairness Audit Toolkit.
- ELI5 (⭐2.8k) - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- L2X (⭐124) - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
- PDPbox (⭐841) - Partial dependence plot toolbox.
- PyCEbox (⭐164) - Python Individual Conditional Expectation Plot Toolbox.
- Skater - Python Library for Model Interpretation.
- AI Explainability 360 (⭐1.6k) - Interpretability and explainability of data and machine learning models.
Genetic Programming / Others
- DEAP (⭐5.8k) - Distributed Evolutionary Algorithms in Python.
- monkeys (⭐122) - A strongly-typed genetic programming framework for Python.
Optimization / Others
- Spearmint (⭐1.5k) - Bayesian optimization.
- SMAC3 (⭐1.1k) - Sequential Model-based Algorithm Configuration.
- Optunity (⭐415) - Is a library containing various optimizers for hyperparameter tuning.
- hyperopt (⭐7.2k) - Distributed Asynchronous Hyperparameter Optimization in Python.
- SafeOpt (⭐140) - Safe Bayesian Optimization.
- scikit-optimize (⭐2.7k) - Sequential model-based optimization with a
scipy.optimize
interface.
- Solid (⭐576) - A comprehensive gradient-free optimization framework written in Python.
- PySwarms (⭐1.3k) - A research toolkit for particle swarm optimization in Python.
- Platypus (⭐565) - A Free and Open Source Python Library for Multiobjective Optimization.
- POT (⭐2.4k) - Python Optimal Transport library.
- Talos (⭐1.6k) - Hyperparameter Optimization for Keras Models.
- nlopt (⭐1.9k) - Library for nonlinear optimization (global and local, constrained or unconstrained).
Feature Engineering / General
- Featuretools (⭐7.2k) - Automated feature engineering.
Visualization / General Purposes
- Matplotlib (⭐20k) - Plotting with Python.
- seaborn (⭐12k) - Statistical data visualization using matplotlib.
- prettyplotlib (⭐1.7k) - Painlessly create beautiful matplotlib plots.
- python-ternary (⭐726) - Ternary plotting library for Python with matplotlib.
- missingno (⭐3.9k) - Missing data visualization module for Python.
Statistics / NLP
- scikit-posthocs (⭐339) - Pairwise Multiple Comparisons Post-hoc Tests.
Data Manipulation / Data Frames
- pandas - Powerful Python data analysis toolkit.
- Arctic (⭐3.1k) - High-performance datastore for time series and tick data.
- xpandas (⭐26) - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
Data Manipulation / Pipelines
- pdpipe (⭐713) - Sasy pipelines for pandas DataFrames.
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
- Dataset (⭐200) - Helps you conveniently work with random or sequential batches of your data and define data processing.
- Prodmodel (⭐59) - Build system for data science pipelines.
Evaluation / Synthetic Data
- recmetrics (⭐565) - Library of useful metrics and plots for evaluating recommender systems.
- Metrics (⭐1.6k) - Machine learning evaluation metric.
- AI Fairness 360 (⭐2.4k) - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
Computations / Synthetic Data
- numpy - The fundamental package needed for scientific computing with Python.
- bottleneck (⭐1.1k) - Fast NumPy array functions written in C.
- CuPy (⭐9.3k) - NumPy-like API accelerated with CUDA.
- scikit-tensor (⭐401) - Python library for multilinear algebra and tensor factorizations.
- numdifftools (⭐253) - Solve automatic numerical differentiation problems in one or more variables.
- quaternion (⭐609) - Add built-in support for quaternions to numpy.
- adaptive (⭐1.2k) - Tools for adaptive and parallel samping of mathematical functions.
Spatial Analysis / Synthetic Data
- PySal (⭐1.3k) - Python Spatial Analysis Library.
Quantum Computing / Synthetic Data
- QML (⭐199) - A Python Toolkit for Quantum Machine Learning.
Conversion / Synthetic Data
- sklearn-porter (⭐1.3k) - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
- ONNX (⭐18k) - Open Neural Network Exchange.
- MMdnn (⭐5.8k) - A set of tools to help users inter-operate among different deep learning frameworks.
Dec 22, 2017
Optimization / Others
- Bayesian Optimization (⭐7.8k) - A Python implementation of global optimization with gaussian processes.
Statistics / NLP
- stockstats (⭐1.3k) - Supply a wrapper
StockDataFrame
based on thepandas.DataFrame
with inline stock statistics/indicators support.