Target modules for applying PEFT / LoRA on different models

Written by - Aionlinecourse984 times views

Target modules for applying PEFT / LoRA on different models

Large Language Model brings the evolution of the ability to understand more complex queries of human language. Parameter-efficient fine-tuning (PEFT) and Low-Rank Adaptation (LoRA) are one of the most powerful techniques for fine-tuning large language models (LLMs) efficiently. PEFT and LoRA provide significant efficiency improvements without requiring significant computational resources. 

Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) techniques allow for the efficient adaptation of large pretrained models to a range of downstream applications by fine-tuning a small number of model parameters rather than the entire model. This reduces the computational and storage costs. This technique also makes it feasible to fine-tune large models even on limited hardware.

Low-Rank Adaptation (LoRA)

Low-Rank Adaptation (LoRA) is one of the most common lightweight training techniques for LLMs that significantly reduces the number of trainable parameters. It works by including a smaller number of new weights into the model, and only these are trained. Training with LoRA becomes significantly faster, more memory-efficient, and produces smaller model weights.

Solution 1:

Let's say that you load some model of your choice:

model = AutoModelForCausalLM.from_pretrained("some-model-checkpoint")

Then you can see available modules by printing out this model:

print(model)

You will get something like this (SalesForce/CodeGen25):

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(51200, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=51200, bias=False)
)

Solution 2:

Here method to get all linear.

import bitsandbytes as bnb

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            # model-specific
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

In the futur release you can use directly target_modules="all-linear" in your LoraConfig


Solution 3:

To solve this, getting a list of Lora compatible modules programmatically, 

target_modules = 'all-linear',

which seems available in latest PEFT versions. However, that would raise an error when applying to google/gemma-2b model. (dropout layers were for some reason added to the target_modules, see later for the layers supported by LORA).

From documentation of the PEFT library:

only the following modules: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

 Creating this function for getting all Lora compatible modules from arbitrary models:

import torch
from transformers import Conv1D

def get_specific_layer_names(model):
    # Create a list to store the layer names
    layer_names = []
    
    # Recursively visit all modules and submodules
    for name, module in model.named_modules():
        # Check if the module is an instance of the specified layers
        if isinstance(module, (torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, Conv1D)):
            # model name parsing 

            layer_names.append('.'.join(name.split('.')[4:]).split('.')[0])
    
    return layer_names

list(set(get_specific_layer_names(model)))

Which yields on gemma-2B

[
 'down_proj',
 'o_proj',
 'k_proj',
 'q_proj',
 'gate_proj',
 'up_proj',
 'v_proj']

This list was valid for a target_modules selection

peft.__version__
'0.10.1.dev0'

transformers.__version__
'4.39.1'

By following the above steps, you can implement target modules for applying PEFT / LoRA on different models. These fine-tuning techniques are powerful techniques for adapting large pre-trained models to specific tasks with minimal computational resources. If you are working on computer vision, natural language processing, or sequential data tasks, applying PEFT and LoRA can significantly enhance your model's performance and efficiency.

Thank you for reading the article.

Recommended Projects

Deep Learning Interview Guide

Medical Image Segmentation With UNET

Have you ever thought about how doctors are so precise in diagnosing any conditions based on medical images? Quite simply,...

Computer Vision
Deep Learning Interview Guide

Build A Book Recommender System With TF-IDF And Clustering(Python)

Have you ever thought about the reasons behind the segregation and recommendation of books with similarities? This project is aimed...

Machine LearningDeep LearningNatural Language Processing
Deep Learning Interview Guide

Automatic Eye Cataract Detection Using YOLOv8

Cataracts are a leading cause of vision impairment worldwide, affecting millions of people every year. Early detection and timely intervention...

Computer Vision
Deep Learning Interview Guide

Crop Disease Detection Using YOLOv8

In this project, we are utilizing AI for a noble objective, which is crop disease detection. Well, you're here if...

Computer Vision
Deep Learning Interview Guide

Vegetable classification with Parallel CNN model

The Vegetable Classification project shows how CNNs can sort vegetables efficiently. As industries like agriculture and food retail grow, automating...

Machine LearningDeep Learning
Deep Learning Interview Guide

Banana Leaf Disease Detection using Vision Transformer model

Banana cultivation is a significant agricultural activity in many tropical and subtropical regions, providing a vital source of income and...

Deep LearningComputer Vision
Deep Learning Interview Guide

Credit Card Default Prediction Using Machine Learning Techniques

This project aims to develop and assess machine learning models in predicting customer defaults, assisting businesses in evaluating the risk...

Machine Learning