2025 Technical Reading

2025 Reading List

NLP

LLMs

Scaling Laws for Precision

Training in lower precision reduces the model’s “effective parameter count,” allowing us to predict the additional loss incurred from training in low precision and post-train quantization. Authors find that lower precision training can be more compute-efficient, but it can also lead to worse performance. They also find that there is a trade-off between the amount of data a model is trained on and the precision at which it is trained. For example, a model that is trained on a lot of data may perform worse if it is quantized to a lower precision after training. In conclusion, this paper shows that the precision of a language model can have a significant impact on its performance. It is important to consider both the training and inference precision when choosing a precision for a language model.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

BitNet b1.58 is a new way to make AI language models use less memory and power by reducing each parameter to just three values: -1, 0, or 1. The model works as well as regular 16-bit models when it reaches 3 billion parameters, but uses about 3.5 times less memory and runs almost 3 times faster. At larger sizes (70 billion parameters), it runs even better - about 4 times faster and uses much less memory than standard models. The model saves a lot of energy too - using 71 times less power for its main calculations compared to regular models. Tests show it performs well on language tasks and can handle long training sessions (2 trillion tokens) with good results.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast,Memory Efficient, and Long Context Finetuning and Inference

ModernBERT is an improved version of the BERT model. It’s designed for tasks like retrieval and classification. The model was trained on 2 trillion tokens and can handle sequences up to 8192 tokens long. It uses modern techniques like rotary positional embeddings and Gated Linear Units. This makes it faster and more memory-efficient. ModernBERT achieves top results in various evaluations, including classification tasks and retrieval in different domains, such as code. It’s also optimized to run efficiently on common GPUs.

Agents (LLM-based)

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

LLMs are powerful but can struggle with factual consistency and require complex queries. Traditional KG tools are difficult to use and require technical expertise. AGENTiGraph bridges the gap between LLMs and KGs. It uses a multi-agent system where each agent has a specific role, such as interpreting user intent or translating queries into graph operations. This allows for more natural language interaction with KGs and improved accuracy in tasks like question answering.

Agents are not enough

Current AI agents are limited. They can’t handle complex tasks or adapt to different situations.

  • There are historical challenges with agents, such as limited capabilities and lack of trust.

  • To improve agents, the authors propose three things:

    • A secure version for private tasks.
    • A user representation to avoid constant user input.
    • A program to manage interactions between user and agents.

The idea is to create an ecosystem with different components working together:

  • Agents: Focus on specific tasks and can work with each other.

  • Sims: Represent users with their preferences and privacy settings.

  • Assistants: Interact with users and manage Sims and Agents to complete tasks.

Agents (Chip Hyuen)

* AI agents perceive and act on their environment, with their capabilities defined by available tools and the environment itself.
* Tools are essential for agents to perceive (read) and act (write), augmenting knowledge, extending capabilities (like math or code execution), and enabling real-world actions.
* Planning is crucial for complex tasks, requiring plan generation, validation (by heuristics or AI), and execution, ideally decoupled to prevent wasted resources.
* Foundation models can be used for planning, especially when provided with information about action outcomes, and function calling enables tool use within model APIs.
* Effective agents require careful consideration of planning granularity, reflection/error correction, tool selection, and robust evaluation to address potential failures in planning, tool usage, or efficiency.

Machine Learning

Reinforcement Learning: An Overview

This is a pretty hefty paper and can take several hours to read through once but is a good refresher and/or overview. It provides a comprehensive overview of reinforcement learning (RL) and sequential decision making from end of 2024 (December). Based off of a textbook from Kevin Murphy. The paper covers value-based RL, policy-gradient methods, and model-based approaches. It also briefly discusses the integration of RL with large language models (LLMs). This work updates and expands upon chapters 34 and 35 of Murphy’s earlier textbook.

Categories: ,

Updated: