Awesome List Updates on Jan 27, 2024
10 awesome lists updated today.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor
1. Awesome Software Patreons
Open Source Projects
- Bottles - Easily manage and run Windows apps on Linux.
- Sonic Pi - Code-based music creation and performance tool.
Open Source Projects / Operating Systems
- PostmarketOS - A real Linux distribution for phones.
2. Awesome Selfhosted
Software / Groupware
- Tine - Software for digital collaboration in companies and organizations. From powerful groupware functionalities to clever add-ons, tine combines everything to make daily team collaboration easier. (Source Code (⭐11))
AGPL-3.0
Docker
3. Awesome Vue
Projects Using Vue.js / Open Source
- vue3-realworld-app (⭐36) - 🖖 Best practices for building RealWorld with Vue3
4. Awesome Git Hooks
Git Hook Scripts / pre-commit
- dotenvx (⭐919) - Prevent committing your
.env
file(s) to code.
5. Awesome Azure Openai Llm
What is the RAG (Retrieval-Augmented Generation)?
RAG (Retrieval-Augmented Generation) : Integrates the retrieval (searching) into LLM text generation. RAG helps the model to “look up” external information to improve its responses. cite [25 Aug 2023]
Retrieval-Augmented Generation: Research Papers
- Benchmarking Large Language Models in Retrieval-Augmented Generation: [cnt]: Retrieval-Augmented Generation Benchmark (RGB) is proposed to assess LLMs on 4 key abilities [4 Sep 2023]:
-
Expand: Research Papers
- Active Retrieval Augmented Generation : [cnt]: Forward-Looking Active REtrieval augmented generation (FLARE): FLARE iteratively generates a temporary next sentence and check whether it contains low-probability tokens. If so, the system retrieves relevant documents and regenerates the sentence. Determine low-probability tokens by
token_logprobs in OpenAI API response
. git (⭐562) [11 May 2023] - Self-RAG: [cnt] 1.
Critic model C
: Generates reflection tokens (IsREL (relevant,irrelevant), IsSUP (fullysupported,partially supported,nosupport), IsUse (is useful: 5,4,3,2,1)). It is pretrained on data labeled by GPT-4. 2.Generator model M
: The main language model that generates task outputs and reflection tokens. It leverages the data labeled by the critic model during training. 3.Retriever model R
: Retrieves relevant passages. The LM decides if external passages (retriever) are needed for text generation. git (⭐1.7k) [17 Oct 2023] - A Survey on Retrieval-Augmented Text Generation: [cnt]: This paper conducts a survey on retrieval-augmented text generation, highlighting its advantages and state-of-the-art performance in many NLP tasks. These tasks include Dialogue response generation, Machine translation, Summarization, Paraphrase generation, Text style transfer, and Data-to-text generation. [2 Feb 2022]
- Retrieval meets Long Context LLMs: [cnt]: We demonstrate that retrieval-augmentation significantly improves the performance of 4K context LLMs. Perhaps surprisingly, we find this simple retrieval-augmented baseline can perform comparable to 16K long context LLMs. [4 Oct 2023]
- FreshLLMs: [cnt]: Fresh Prompt, Google search first, then use results in prompt. Our experiments show that FreshPrompt outperforms both competing search engine-augmented prompting methods such as Self-Ask (Press et al., 2022) as well as commercial systems such as Perplexity.AI. git [5 Oct 2023]
- RECOMP: Improving Retrieval-Augmented LMs with Compressors: [cnt]: 1. We propose RECOMP (Retrieve, Compress, Prepend), an intermediate step which compresses retrieved documents into a textual summary prior to prepending them to improve retrieval-augmented language models (RALMs). 2. We present two compressors – an
extractive compressor
which selects useful sentences from retrieved documents and anabstractive compressor
which generates summaries by synthesizing information from multiple documents. 3. Both compressors are trained. [6 Oct 2023] - Retrieval-Augmentation for Long-form Question Answering: [cnt]: 1. The order of evidence documents affects the order of generated answers 2. the last sentence of the answer is more likely to be unsupported by evidence. 3. Automatic methods for detecting attribution can achieve reasonable performance, but still lag behind human agreement.
Attribution in the paper assesses how well answers are based on provided evidence and avoid creating non-existent information.
[18 Oct 2023] - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning: INTERS covers 21 search tasks across three categories: query understanding, document understanding, and query-document relationship understanding. The dataset is designed for instruction tuning, a method that fine-tunes LLMs on natural language instructions. git (⭐194) [12 Jan 2024]
- RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture. [16 Jan 2024]
- The Power of Noise: Redefining Retrieval for RAG Systems: No more than 2-5 relevant docs + some amount of random noise to the LLM context maximizes the accuracy of the RAG. [26 Jan 2024]
- Corrective Retrieval Augmented Generation (CRAG): Retrieval Evaluator assesses the retrieved documents and categorizes them as Correct, Ambiguous, or Incorrect1. For Ambiguous and Incorrect documents, the method uses Web Search to improve the quality of the information. The refined and distilled documents are then used to generate the final output. [29 Jan 2024] CRAG implementation by LangGraph git (⭐5.2k)
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval: Introduce a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. git (⭐35k)
pip install llama-index-packs-raptor
/ git (⭐27) [31 Jan 2024] - CRAG: Comprehensive RAG Benchmark: a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search ref [7 Jun 2024]
- PlanRAG: Decision Making. Decision QA benchmark, DQA. Plan -> Retrieve -> Make a decision (PlanRAG) git (⭐112) [18 Jun 2024]
- Searching for Best Practices in Retrieval-Augmented Generation:
Best Performance Practice
: Query Classification, Hybrid with HyDE (retrieval), monoT5 (reranking), Reverse (repacking), Recomp (summarization).Balanced Efficiency Practice
: Query Classification, Hybrid (retrieval), TILDEv2 (reranking), Reverse (repacking), Recomp (summarization). [1 Jul 2024] - Retrieval Augmented Generation or Long-Context LLMs?: Long-Context consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. [23 Jul 2024]
- Graph Retrieval-Augmented Generation: A Survey [15 Aug 2024]
- Active Retrieval Augmented Generation : [cnt]: Forward-Looking Active REtrieval augmented generation (FLARE): FLARE iteratively generates a temporary next sentence and check whether it contains low-probability tokens. If so, the system retrieves relevant documents and regenerates the sentence. Determine low-probability tokens by
RAG Pipeline & Advanced RAG
- How to optimize RAG pipeline: Indexing optimization [24 Oct 2023]
Azure Reference Architectures / Azure AI Search
- A set of capabilities designed to improve relevance in these scenarios. We use a combination of hybrid retrieval (vector search + keyword search) + semantic ranking as the most effective approach for improved relevance out-of–the-box.
TL;DR: Retrieval Performance; Hybrid search + Semantic rank > Hybrid search > Vector only search > Keyword only
ref [18 Sep 2023]
Semantic Kernel / Feature Roadmap
- .NET Semantic Kernel SDK: 1. Renamed packages and classes that used the term “Skill” to now use “Plugin”. 2. OpenAI specific in Semantic Kernel core to be AI service agnostic 3. Consolidated our planner implementations into a single package ref [10 Oct 2023]
Semantic Kernel / Code Recipes
- Chat Copilot Sample Application: A reference application for building a chat experience using Semantic Kernel. Leveraging plugins, planners, and AI memories. git (⭐2k) [Apr 2023]
- Semantic Kernel Recipes: A collection of C# notebooks git (⭐165) [Mar 2023]
- Semantic Kernel-Powered OpenAI Plugin Development Lifecycle ref [30 Oct 2023]
- SemanticKernel Implementation sample to overcome Token limits of Open AI model. Semantic Kernel でトークンの限界を超えるような長い文章を分割してスキルに渡して結果を結合したい (zenn.dev) ref [06 May 2023]
Semantic Kernel / Semantic Kernel Planner
Semantic Kernel Planner ref [24 Jul 2023]
Semantic Kernel / Semantic Function
- Prompt Template language Key takeaways
MLLM (multimodal large language model) / GPT series release date
- Benchmarking Multimodal LLMs.
LLaVA-1.5 achieves SoTA on a broad range of 11 tasks incl. SEED-Bench.
SEED-Bench: [cnt]: Benchmarking Multimodal LLMs git (⭐289) [30 Jul 2023]
Learning and Supplementary Materials / Korean
- Large Language Model Course (⭐36k): Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. [Jun 2023]
6. Awesome Kotlin
Android / Projects
- inorichi/tachiyomi - Free and open source manga reader for Android.
7. Urban and Regional Planning Resources
Public Data Resources / Equity and Environmental Justice
- STEAP - The Screening Tool for Equity Analysis of Projects (STEAP) is a census sampling tool that allows rapid screening of potential project locations anywhere in the United States to support Title VI, environmental justice, and other socioeconomic data analyses.
8. Awesome Neovim
(requires Neovim 0.5)
- lopi-py/luau-lsp.nvim (⭐38) - A luau-lsp extension to improve your experience.
9. Awesome Datascience
Deep Learning Packages / Visualization Tools
10. Awesome Rails
Gems / Other external resources
- solid_queue (⭐1.7k) - A gem to Database-backed Active Job backend 🔴
- Prev: Jan 28, 2024
- Next: Jan 26, 2024