2023 Technical Reading
2023 Reading List
Signal Processing
General
- End-to-End Signal Processing and Deep Learning (using Embedded GPUs): End-to-End Signal Processing and Deep Learning, particularly using Embedded GPUs, are specialized areas of focus. These technologies enable the development of full-stack solutions for deep learning and GPU-enabled signal processing systems, including edge compute hardware and custom applications. The integration of hardware and software facilitates efficient performance in tasks such as radio embedded systems with FPGA, CPU, and GPU, GPU-based signal processing algorithms, and pruned neural networks for inference on edge RF systems.
Audio
- Self-supervised learning for infant cry analysis
- Self-supervised learning can be used to learn useful representations of infant cries from unlabeled data.
- The self-supervised approach was able to achieve comparable performance to a supervised learning approach that was trained on a small amount of labeled data.
- Self-supervised learning could be a valuable tool for developing new and improved infant cry analysis systems.
- CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
- The CryCeleb dataset is a large and diverse dataset of infant cry sounds. This challenge has narrowed in on the task of distinguishing baby cries from each other.
NLP
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- BART is a denoising autoencoder, which means that it is trained on a dataset of corrupted text. The corruptions can be simple, such as replacing words with random words, or more complex, such as removing words or sentences. BART is trained to reconstruct the original text from the corrupted text.
- This training procedure helps BART to learn to represent the meaning of text, even when the text is corrupted. This makes BART well-suited for natural language generation, translation, and comprehension tasks.
- Backpack Language Models
- Backpacks is a neural architecture that combines strong modeling performance with interpretability and control. It learns multiple sense vectors for each word, representing different aspects of the word, and allows for intervention and modification of these vectors to change the model’s behavior. It outperforms larger models in lexical similarity evaluations and enables controllable text generation and debiasing through sense vector manipulation.
- Date specific (might not age well) 05/25/23
- LLM Powered Autonomous Agents An OpenAI researcher’s musing on RHL.
- Prompt Receipes Great starting points for some specific use-cases in prompting. Some are shockingly effective.
Large-Language Models
-
Parameter-Efficient Transfer Learning for NLP: using adapter modules as an efficient alternative to fine-tuning large pre-trained models in the context of numerous downstream tasks. Adapter modules introduce minimal trainable parameters per task, enabling the incorporation of new tasks without retraining the entire model. Demonstrating their effectiveness, the authors apply adapter modules to 26 text classification tasks, including the GLUE benchmark, achieving near state-of-the-art performance with only a slight increase in parameters. This approach contrasts with traditional fine-tuning, which requires training all parameters for each task, showcasing the efficiency and flexibility of adapter modules in handling diverse tasks.
- RLHF: Reinforcement Learning from Human Feedback (A blog by Chip Huyen)
- StackLLaMA: A hands-on guide to train LLaMA with RLHF: About the development of the StackLLaMA model, a reinforcement learning from human feedback (RLHF) fine-tuned LLaMA model for Stack Exchange question and answering. The model is trained using a combination of supervised fine-tuning, reward modeling, and reinforcement learning techniques. Challenges faced during training, such as balancing rewards and managing KL divergence, are highlighted, and the post emphasizes the ongoing efforts to improve RLHF methods. The StackLLaMA model demonstrates the application of RL techniques in natural language processing tasks, showcasing its potential for enhancing question-answering systems.
- ReAct: Synergizing Reasoning and Acting in Language Models (web write-up): a novel approach that leverages Large Language Models (LLMs) to generate both reasoning traces and task-specific actions concurrently. By using reasoning and acting, ReAct demonstrates enhanced synergy between the two, enabling the model to induce, track, and update action plans based on reasoning traces and interact effectively with external sources. The results show ReAct’s effectiveness over existing methods in various language and decision-making tasks, addressing issues like hallucination and error propagation while improving human interpretability and trustworthiness.
- ReAct: Synergizing Reasoning and Acting in Language Models
- Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification: The text discusses the significance of Authorship Verification (AV) in natural language processing, highlighting challenges in existing techniques such as data requirements and explainability. To address these issues, the paper introduces PromptAV, a novel approach leveraging Large-Language Models for AV. PromptAV utilizes step-by-step stylometric explanation prompts, outperforming state-of-the-art methods with limited data and providing intuitive explanations for predictions. The research aims to enhance AV effectiveness and interpretability, presenting PromptAV as a promising solution in forensic analysis, plagiarism detection, and identification of deceptive content.
-
Building RAG-based LLM Applications for Production: Summary of Retrieval Augmented Generation (RAG):
Retrieval Augmented Generation (RAG) addresses the limitation of Large Language Models (LLMs) not being trained on specific user data. It involves incorporating user data into the LLMs’ existing knowledge. In the RAG process, user data is loaded and prepared for queries, creating an index. User queries filter the data down to the most relevant context, and this context, along with the query, is sent to the LLM, which generates a response.
Key stages within RAG include:
- Loading: Retrieving user data from various sources and bringing it into the processing pipeline, supported by connectors like those provided by LlamaHub.
- Indexing: Creating a data structure, often involving vector embeddings and metadata strategies, to facilitate efficient querying and retrieval of contextually relevant data.
- Storing: Saving the indexed data and associated metadata to avoid the need for re-indexing in the future.
- Querying: Utilizing LLMs and LlamaIndex data structures for various query strategies, such as sub-queries, multi-step queries, and hybrid approaches.
- Evaluation: Assessing the effectiveness of the pipeline through objective measures, comparing strategies and evaluating the accuracy, fidelity, and speed of query responses.
Vision
- YoloV10: Real-Time End-to-End Object Detection Among other improvements in efficiency, the highlight may be tackeling the speed issues caused by NMS.
- Faiss: A library for efficient similarity search: Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other
ML (General)
- DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks Time-to-event analysis is widely used in economics, finance, engineering, medicine and many other areas. Previous models rely on strong parametric assumptions that are often violated. DeepHit uses a deep neural network to learn the distribution of survival times directly. Comparisons with previous models on the basis of real and synthetic datasets demonstrate that DeepHit achieves statistically significant performance improvements.
Recommendation and Search
- Counterfactual Evaluation for Recommendation Systems: the challenge of evaluating recommendation systems and suggests that they should be treated as interventional problems rather than observational ones. It explains that traditional offline evaluation methods may not capture the true impact of recommendations on user behavior. The article introduces counterfactual evaluation as an alternative approach, particularly focusing on Inverse Propensity Scoring (IPS) and its variants like Clipped IPS (CIPS) and Self-Normalized IPS (SNIPS), highlighting their advantages and limitations.
- Improving Deep Learning For Airbnb Search: Airbnb’s transition to deep learning for search ranking significantly impacted its roadmap, leading to a shift in strategy. While the initial optimism about incorporating machine learning ideas from literature surveys faded due to application-specific challenges, the focus shifted towards a process-driven approach, emphasizing the importance of iterative strategies over individual techniques for enhancing deep learning models in industrial settings.
ML Ops
- https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
- Continuous Adaptation for Machine (Learning)
- Model training as a CI/CD system: Part I
- Model training as a CI/CD system: Part II
- APIs for Model Serving
-
Github Actions This is a great place to start especially since your project likely already has a Github repository. It covers all the basics of CI/CD. It discusses the components of a workflow and how to create one. Workflows can be triggered by events such as pushes to the repository. Each workflow run is executed on a runner, which can be a virtual machine or a self-hosted server. Jobs within a workflow can run in parallel or sequentially.
- The process of building a CI/CD system specifically for model training.
- Key components such as data management, environment setup, model training pipeline, and version control.
- Emphasizes the benefits of automation, highlights cloud infrastructure support, and suggests specific tools for implementing a model training CI/CD system.
- Application deployment and testing strategies (Google Cloud)
- Best Practices: 1) Backward compatibility 2) Continuous integration/continuous deployment (CI/CD) 3) Automation 4) Operating environments and configuration management. 5) Rollback strategy in case things go wrong 6) Post-deployment monitoring
- Leveraging TensorFlow-TensorRT integration for Low latency Inference:TensorRT integration in TensorFlow allows developers to optimize and accelerate their deep learning models for deployment on GPUs. By leveraging TensorRT’s optimizations, TensorFlow users can achieve faster inference times and reduced memory footprint. This integration provides a seamless workflow, enabling efficient deployment of TensorFlow models with improved performance for real-time applications. *Bayesian hyperparameter tuning: Nuts & bolts