PEFT documentation

PEFT configurations and models

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.14.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

PEFT configurations and models

The sheer size of today’s large pretrained models - which commonly have billions of parameters - present a significant training challenge because they require more storage space and more computational power to crunch all those calculations. You’ll need access to powerful GPUs or TPUs to train these large pretrained models which is expensive, not widely accessible to everyone, not environmentally friendly, and not very practical. PEFT methods address many of these challenges. There are several types of PEFT methods (soft prompting, matrix decomposition, adapters), but they all focus on the same thing, reduce the number of trainable parameters. This makes it more accessible to train and store large models on consumer hardware.

The PEFT library is designed to help you quickly train large models on free or low-cost GPUs, and in this tutorial, you’ll learn how to setup a configuration to apply a PEFT method to a pretrained base model for training. Once the PEFT configuration is setup, you can use any training framework you like (Transformer’s Trainer class, Accelerate, a custom PyTorch training loop).

PEFT configurations

Learn more about the parameters you can configure for each PEFT method in their respective API reference page.

A configuration stores important parameters that specify how a particular PEFT method should be applied.

For example, take a look at the following LoraConfig for applying LoRA and PromptEncoderConfig for applying p-tuning (these configuration files are already JSON-serialized). Whenever you load a PEFT adapter, it is a good idea to check whether it has an associated adapter_config.json file which is required.

LoraConfig
PromptEncoderConfig
{
  "base_model_name_or_path": "facebook/opt-350m", #base model to apply LoRA to
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32,
  "lora_dropout": 0.05,
  "modules_to_save": null,
  "peft_type": "LORA", #PEFT method type
  "r": 16,
  "revision": null,
  "target_modules": [
    "q_proj", #model modules to apply LoRA to (query and value projection layers)
    "v_proj"
  ],
  "task_type": "CAUSAL_LM" #type of task to train model on
}

You can create your own configuration for training by initializing a LoraConfig.

from peft import LoraConfig, TaskType

lora_config = LoraConfig(
    r=16,
    target_modules=["q_proj", "v_proj"],
    task_type=TaskType.CAUSAL_LM,
    lora_alpha=32,
    lora_dropout=0.05
)

PEFT models

With a PEFT configuration in hand, you can now apply it to any pretrained model to create a PeftModel. Choose from any of the state-of-the-art models from the Transformers library, a custom model, and even new and unsupported transformer architectures.

For this tutorial, load a base facebook/opt-350m model to finetune.

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")

Use the get_peft_model() function to create a PeftModel from the base facebook/opt-350m model and the lora_config you created earlier.

from peft import get_peft_model

lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
"trainable params: 1,572,864 || all params: 332,769,280 || trainable%: 0.472659014678278"

When calling get_peft_model(), the base model will be modified in-place. That means, when calling get_peft_model() on a model that was already modified in the same way before, this model will be further mutated. Therefore, if you would like to modify your PEFT configuration after having called get_peft_model() before, you would first have to unload the model with unload() and then call get_peft_model() with your new configuration. Alternatively, you can re-initialize the model to ensure a fresh, unmodified state before applying a new PEFT configuration.

Now you can train the PeftModel with your preferred training framework! After training, you can save your model locally with save_pretrained() or upload it to the Hub with the push_to_hub method.

# save locally
lora_model.save_pretrained("your-name/opt-350m-lora")

# push to Hub
lora_model.push_to_hub("your-name/opt-350m-lora")

To load a PeftModel for inference, you’ll need to provide the PeftConfig used to create it and the base model it was trained from.

from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")

By default, the PeftModel is set for inference, but if you’d like to train the adapter some more you can set is_trainable=True.

lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora", is_trainable=True)

The PeftModel.from_pretrained() method is the most flexible way to load a PeftModel because it doesn’t matter what model framework was used (Transformers, timm, a generic PyTorch model). Other classes, like AutoPeftModel, are just a convenient wrapper around the base PeftModel, and makes it easier to load PEFT models directly from the Hub or locally where the PEFT weights are stored.

from peft import AutoPeftModelForCausalLM

lora_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")

Take a look at the AutoPeftModel API reference to learn more about the AutoPeftModel classes.

Next steps

With the appropriate PeftConfig, you can apply it to any pretrained model to create a PeftModel and train large powerful models faster on freely available GPUs! To learn more about PEFT configurations and models, the following guide may be helpful:

< > Update on GitHub