Awesome List Updates on Dec 14, 2023
2 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Azure Openai Llm
RAG Pipeline & Advanced RAG
- Demystifying Advanced RAG Pipelines: An LLM-powered advanced RAG pipeline built from scratch git (⭐776) [19 Oct 2023]
Vector Database Comparison
- Faiss: Facebook AI Similarity Search (Faiss) is a library for efficient similarity search and clustering of dense vectors. It is used as an alternative to a vector database in the development and library of algorithms for a vector database. It is developed by Facebook AI Research. git (⭐30k) [Feb 2017]
Semantic Kernel / Azure AI Search
- Microsoft LangChain Library supports C# and Python and offers several features, some of which are still in development and may be unclear on how to implement. However, it is simple, stable, and faster than Python-based open-source software. The features listed on the link include: Semantic Kernel Feature Matrix / doc:ref / blog:ref / git [Feb 2023]
LangChain features and related libraries / DSPy optimizer
- LangChain Expression Language: A declarative way to easily compose chains together [Aug 2023]
- OpenGPTs (⭐6.4k): An open source effort to create a similar experience to OpenAI's GPTs [Nov 2023]
- langflow (⭐24k): LangFlow is a UI for LangChain, designed with react-flow. [Feb 2023]
- Flowise (⭐29k) Drag & drop UI to build your customized LLM flow [Apr 2023]
Prompt Engineering / Prompt Template Language
Power of Prompting
- GPT-4 with Medprompt: GPT-4, using a method called Medprompt that combines several prompting strategies, has surpassed MedPaLM 2 on the MedQA dataset without the need for fine-tuning. ref [28 Nov 2023]
- promptbase (⭐5.3k): Scripts demonstrating the Medprompt methodology [Dec 2023]
Prompt Guide & Leaked prompts / Prompt Template Language
- Prompts for Education (⭐1.5k): Microsoft Prompts for Education [Jul 2023]
RLHF (Reinforcement Learning from Human Feedback) & SFT (Supervised Fine-Tuning) / Llama Finetuning
- OpenAI Spinning Up in Deep RL!: An educational resource to help anyone learn deep reinforcement learning. git (⭐9.9k) [Nov 2018]
Quantization Techniques / Llama Finetuning
- bitsandbytes: 8-bit optimizers git (⭐5.9k) [Oct 2021]
Other techniques and LLM patterns / Llama Finetuning
- Mixture of experts models: Mixtral 8x7B: Sparse mixture of experts models (SMoE) magnet [Dec 2023]
- Huggingface Mixture of Experts Explained: Mixture of Experts, or MoEs for short [Dec 2023]
- Simplifying Transformer Blocks: Simplifie Transformer. Removed several block components, including skip connections, projection/value matrices, sequential sub-blocks and normalisation layers without loss of training speed. [3 Nov 2023]
Numbers LLM / GPT series release date
- tiktoken (⭐12k): BPE tokeniser for use with OpenAI's models. Token counting. [Dec 2022]
- What are tokens and how to count them?: OpenAI Articles
- Byte-Pair Encoding (BPE): P.2015. The most widely used tokenization algorithm for text today. BPE adds an end token to words, splits them into characters, and merges frequent byte pairs iteratively until a stop criterion. The final tokens form the vocabulary for new data encoding and decoding. [31 Aug 2015] / ref [13 Aug 2021]
Trustworthy, Safe and Secure LLM / GPT series release date
- NeMo Guardrails (⭐3.9k): Building Trustworthy, Safe and Secure LLM Conversational Systems [Apr 2023]
- Hallucination Leaderboard (⭐1.1k): Evaluate how often an LLM introduces hallucinations when summarizing a document. [Nov 2023]
Large Language Model Is: Abilities / GPT series release date
- Math soving optimized LLM WizardMath: [cnt]: Developed by adapting Evol-Instruct and Reinforcement Learning techniques, these models excel in math-related instructions like GSM8k and MATH. git (⭐9.2k) [18 Aug 2023] / Math solving Plugin: Wolfram alpha
Build an LLMs from scratch: picoGPT and lit-gpt / GPT series release date
- lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. git (⭐9.4k) [Mar 2023]
- pix2code (⭐12k): Generating Code from a Graphical User Interface Screenshot. Trained dataset as a pair of screenshots and simplified intermediate script for HTML, utilizing image embedding for CNN and text embedding for LSTM, encoder and decoder model. Early adoption of image-to-code. [May 2017] -> Screenshot to code (⭐16k): Turning Design Mockups Into Code With Deep Learning [Oct 2017] ref
LLM Materials for East Asian Languages / Japanese
- ブレインパッド社員が投稿した Qiita 記事まとめ: ブレインパッド社員が投稿した Qiita 記事まとめ [Jul 2023]
- New Era of Computing - ChatGPT がもたらした新時代 [May 2023]
- 大規模言語モデルで変わる ML システム開発: ML system development that changes with large-scale language models [Mar 2023]
- GPT-4 登場以降に出てきた ChatGPT/LLM に関する論文や技術の振り返り: Review of ChatGPT/LLM papers and technologies that have emerged since the advent of GPT-4 [Jun 2023]
- LLM を制御するには何をするべきか?: How to control LLM [Jun 2023]
- 1. 生成 AI のマルチモーダルモデルでできること: What can be done with multimodal models of generative AI 2. 生成 AI のマルチモーダリティに関する技術調査 [Jun 2023]
- LLM の推論を効率化する量子化技術調査: Survey of quantization techniques to improve efficiency of LLM reasoning [Sep 2023]
- LLM の出力制御や新モデルについて: About LLM output control and new models [Sep 2023]
- Azure OpenAI を活用したアプリケーション実装のリファレンス (⭐264): 日本マイクロソフト リファレンスアーキテクチャ [Jun 2023]
- 生成 AI・LLM のツール拡張に関する論文の動向調査: Survey of trends in papers on tool extensions for generative AI and LLM [Sep 2023]
- LLM の学習・推論の効率化・高速化に関する技術調査: Technical survey on improving the efficiency and speed of LLM learning and inference [Sep 2023]
Learning and Supplementary Materials / Korean
- gpt4free (⭐60k) for educational purposes only [Mar 2023]
- IbrahimSobh/llms (⭐266): Language models introduction with simple code. [Jun 2023]
- DeepLearning.ai Short courses: DeepLearning.ai Short courses [2023]
- Deep Learning cheatsheets for Stanford's CS 230 (⭐6.3k): Super VIP Cheetsheet: Deep Learning [Nov 2019]
- Best-of Machine Learning with Python (⭐16k):🏆A ranked list of awesome machine learning Python libraries. [Nov 2020]
Section 10: General AI Tools and Extensions / OSS Alternatives for OpenAI Code Interpreter (aka. Advanced Data Analytics)
- Vercel AI Vercel AI Playground / Vercel AI SDK git (⭐9.1k) [May 2023]
- Quora Poe A chatbot service that gives access to GPT-4, gpt-3.5-turbo, Claude from Anthropic, and a variety of other bots. [Feb 2023]
Section 11: Datasets for LLM Training / OSS Alternatives for OpenAI Code Interpreter (aka. Advanced Data Analytics)
- SQuAD: The Stanford Question Answering Dataset (SQuAD), a set of Wikipedia articles, 100,000+ question-answer pairs on 500+ articles. [16 Jun 2016]
- 大規模言語モデルのデータセットまとめ: 大規模言語モデルのデータセットまとめ [Apr 2023]
2. Static Analysis
Programming Languages / Other
- JET (⭐726) — Static type inference system to detect bugs and type instabilities.
- Prev: Dec 15, 2023
- Next: Dec 13, 2023