1. A case study of designing and fine-tuning llama3 model to support function calling

    By Michael Hu
    July 6, 2024 11:25 pm
    20 min read

    This post discuss how we designed and fine-tuned llama3 model to support function calling, a immense potential for adapting LLM to real-world business cases. We also provide an open-source project Llama3-FunctionCalling that anyone can access.

  2. A simple and clean implementation of Retrieval Augmented Generation (RAG) with open-source embedding models and LLaMA

    By Michael Hu
    March 28, 2024 2:43 pm
    15 min read

    In this post, we delve into the workings of the Retrieval Augmented Generation (RAG) system and introduce RAG-LLaMA. This open-source project showcases a straightforward implementation of Retrieval Augmented Generation (RAG), incorporating open-source embedding models and the LLaMA chat model. We build a very simple chatbot that answers questions about Tesla cars, demonstrating RAG's potential for various private knowledge base tasks.

  3. A clean PyTorch implementation of Direct Preference Optimization (DPO) to fine-tune LLaMA models

    By Michael Hu
    March 11, 2024 4:10 pm
    2 min read

    Introducing DPO-LLaMA. This open-source project features a clean PyTorch implementation of Direct Preference Optimization (DPO) to fine-tune LLaMA model to follow human preference.

  4. 4-bit Quantized Low-Rank Adaptation (LoRA) LLM fine-tuning decoupled from Hugging Face

    By Michael Hu
    December 28, 2023 9:30 pm
    10 min read

    In this post, we discuss quantized LoRA and introduce QLoRA-LLM, my new open-source project. This project showcases a straightforward custom implementation of Quantized LoRA (QLoRA) for fine-tuning a language model (LLM), employing fundamental tools like PyTorch and Bitsandbytes, independent of Hugging Face. Such a tailored implementation of QLoRA proves highly beneficial for fine-tuning a personalized LLM model.

  5. A PyTorch implementation of OpenAI's InstructGPT paper to train and fine-tune LLaMA model to align with human preferences, with support for k-bit quantization and Low-Rank Adaptation (LoRA) fine-tuning

    By Michael Hu
    September 17, 2023 8:00 pm
    10 min read

    In this post, we discuss how reinforcement learning from human feedback (RLHF) works, and we introduce InstructLLaMA. This open-source project showcases a PyTorch implementation of OpenAI's InstructGPT paper, completely independent of third-party tools like Hugging Face. Here, we substitute the GPT model with Meta's LLaMA pre-trained model. InstructLLaMA facilitates various stages including dataset preparation, pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) to train and fine-tune the LLaMA2 model, enabling it to adhere to human instructions. This is akin to InstructGPT or ChatGPT but on a smaller scale.

  6. A PyTorch implementation of pre-training and fine-tuning scripts to train and fine-tune GPT-2 models

    By Michael Hu
    July 17, 2023 10:00 pm
    2 min read

    Introducing MiniGPT. This open-source project features PyTorch implementation of OpenAI's GPT-2 model. It supports dataset preparation, pre-training, fine-tuning, and distributed training with PyTorch FSDP.

  7. A PyTorch implementation of DeepMind's MuZero agent to do planning with a learned model

    By Michael Hu
    July 6, 2022 10:00 pm
    2 min read

    Introducing MuZero. This open-source project features PyTorch implementation of DeepMind's MuZero agent. MuZero agent supports turn-based, two-player, zero-sum games, as well as single-player games like Atari games.

  8. A PyTorch implementation of DeepMind's AlphaZero agent to play two-player, zero-sum strategy board games like Go and Gomoku.

    By Michael Hu
    June 14, 2022 10:30 am
    2 min read

    Introducing AlphaZero. An open-source project features PyTorch implementation of DeepMind's famous AlphaZero agent to play Go and Free-style Gomoku board games.

  9. A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar

    By Michael Hu
    May 3, 2022 9:00 pm
    2 min read

    Introducing Deep-RL-Zoo. This open-source project features a collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar. Including SOTA algorithms like DQN, Rainbow, PPO, RND, R2D2, and Agent57.