Portfolio

LoRA

parameter efficient tuning의 한 종류

theta 훨씬 작은 양의 set of paremeter

변화량을 더한다. 왜?

low intrinsic dimension? over-parameterized된 모델도 생각만큼 복잡하지 않을 것이다. 그 아래에는 내재된 차원이 있을 것이다 라는 아이디어. 파라미터를 효율적으로 사용하자

LoRA를 사용하면 pretrained weight를 고정된 상태로 유지하면서 Adaptation 중 dense 레이어의 변화(delta)에 대한 Rank decomposition matrix를 최적화한다. 이를 통해서 신경망의 일부 dense layer를 간접적으로 훈련 시키는 것이 가능하다.

가설: 가중치에 대한 업데이트도 adaptation중 intrinsic rank가 낮다

forward pass: input x가 들어왔을때 원래 트랜스포머 레이어에 태워줌(w0 x). 그 다음에는 input x에 A matrix로 down projection해주고 그 다음에 B matrix multiplication해줘서 up projection해줘서 원래의 차원으로 recontruction해주는 단순한 포맷이다…

장점

No additional inference latency

Prefix tuning & Adapter

prefix - 0.1%의 파라미터만 업데이트, query

why these solutions are not enough?

Adapter layers introduce inference latency

Directly optimizing the prompt is hard

https://github.com/huggingface/peft

PEFT - Parameter Efficiency Fine Tuning


from transformers import AutoModelForCausalLM
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"

peft_config = LoraConfig(
    task_type="CAUSAL_LM", inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)

model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

result: # output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282

trainable params 모델을 PEFT로 변환했을 때 훈련 가능한 파라미터의 수, 즉 adapter의 개수라고 볼 수 있다

All params 사전 학습된 모델의 모든 파라미터

TISTORY구현체를 통해 PEFT lora 를 알아보자

구현체를 통해 PEFT lora 를 알아보자

기존에 썼던 짧은 lora config글과 통합하여, Lora를 근본적으로 이해하기 위해서 새로운 글을 시작한다. lora가 어떤 원리인지 구현체를 통해서 이해해 보고, huggingface peft lora를 사용할 때 어떤 config를 "왜" 조정해야 하는지 알아보자. 0. 정말 간단한 요약 모델이 너무 무겁다. 이걸 어떻게 다 학습하나. 그래서 학습할 때 일부 layer들에 대해서(transformer에선 주로 attention block에 대해서만 적용하고 있다) weight를 따로 똑 떼어내서 학습하고자 한다. 아래 그림을 보면, 기존 모델을 그대로 학습한다면 모든 weight를 다 학습에 사용한다. 반면에 lora를 학습에 이용한다면 기존 weight는 학습에서 freeze시키고 lora a..

| 각종 파라미터 설명 중

[ PEFT ] Parameter-Efficient Fine-Tuning

PEFT methodology

| DoRA

LoRA 본격적으로 사용해보기

이 코드는 Food101 데이터셋을 사용하여 이미지 분류 모델을 훈련하고, PEFT-LoRA를 사용하여 미세 조정한 후 Hugging Face Hub에 업로드하고, 예측을 수행하는 전체 워크플로를 다룹니다. 각 단계의 의미를 설명하겠습니다.

huggingfaceLoRA methods

LoRA methods

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

데이터셋 로드


from datasets import load_dataset

ds = load_dataset("food101")

라벨 매핑


labels = ds["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

이미지 프로세서 로드


from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")

이미지 변환 설정


from torchvision.transforms import (
    CenterCrop, Compose, Normalize, RandomHorizontalFlip, RandomResizedCrop, Resize, ToTensor,
)

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)

데이터 전처리 함수 정의


def preprocess_train(example_batch):
    example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

def preprocess_val(example_batch):
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

train_ds = ds["train"]
val_ds = ds["validation"]

train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)

모델 로드 및 PEFT 정의


from transformers import AutoModelForImageClassification

model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224-in21k",
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)
model = get_peft_model(model, config)

각 매개변수의 역할을 살펴보겠습니다:

r (Rank):

r=16는 로우 랭크 행렬의 랭크를 16으로 설정합니다. 이 값은 저차원 근사 행렬의 차원을 나타내며, 로우 랭크 근사를 통해 원래 행렬을 효율적으로 표현합니다.

lora_alpha:

lora_alpha=16는 스케일링 인자입니다. 이는 로우 랭크 행렬의 출력에 곱해져서 원래 모델 파라미터의 크기와 일치하도록 조정하는 역할을 합니다.

target_modules:

target_modules=["query", "value"]는 LoRA를 적용할 모델 내 모듈을 지정합니다. 여기서는 모델의 query와 value 모듈에 LoRA를 적용합니다.

lora_dropout:

lora_dropout=0.1는 드롭아웃 확률입니다. 이는 드롭아웃을 적용하여 모델의 과적합을 방지하고, 학습 중 일부 뉴런을 무작위로 비활성화하여 일반화 성능을 향상시키는 역할을 합니다.

bias:

bias="none"는 bias 항을 포함할지 여부를 설정합니다. 여기서는 bias를 사용하지 않도록 설정합니다.

modules_to_save:

modules_to_save=["classifier"]는 학습 후 저장할 모듈을 지정합니다. 여기서는 classifier 모듈을 저장합니다.

Query (Q): 현재 위치에서의 정보를 다른 모든 위치와 비교하는 데 사용됩니다.

Key (K): 모든 위치의 정보를 Query와 비교하는 데 사용됩니다.

Value (V): 실제로 업데이트할 정보로, Query와 Key의 유사도에 따라 가중치가 부여됩니다.

콜레이트 함수 정의


def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

훈련 설정


from transformers import TrainingArguments, Trainer

account = "stevhliu"
peft_model_id = f"{account}/google/vit-base-patch16-224-in21k-lora"
batch_size = 128

args = TrainingArguments(
    peft_model_id,
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    label_names=["labels"],
)

trainer = Trainer(
    model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
    data_collator=collate_fn,
)
trainer.train()

peft_model_id:

모델의 저장 경로입니다. 여기서는 "stevhliu/google/vit-base-patch16-224-in21k-lora" 경로에 저장됩니다.

remove_unused_columns=False:

데이터셋에서 사용되지 않는 열을 제거하지 않습니다. 기본적으로 True로 설정되면 사용되지 않는 열이 제거됩니다.

evaluation_strategy="epoch":

매 에폭(epoch)마다 평가를 수행합니다. 가능한 값으로는 "no", "steps", "epoch"가 있습니다.

save_strategy="epoch":

매 에폭마다 모델을 저장합니다. 가능한 값으로는 "no", "steps", "epoch"가 있습니다.

learning_rate=5e-3:

학습률을 0.005로 설정합니다. 학습률은 모델 파라미터를 업데이트할 때 사용되는 비율입니다.

per_device_train_batch_size=batch_size:

각 디바이스(예: GPU)당 훈련 배치 크기를 128로 설정합니다.

gradient_accumulation_steps=4:

4단계의 그라디언트를 누적한 후에 업데이트를 수행합니다. 이는 효과적으로 배치 크기를 늘리는 역할을 합니다.

per_device_eval_batch_size=batch_size:

각 디바이스(예: GPU)당 평가 배치 크기를 128로 설정합니다.

fp16=True:

혼합 정밀도 훈련을 활성화합니다. 이는 훈련 속도를 높이고 메모리 사용량을 줄일 수 있습니다.

num_train_epochs=5:

전체 데이터셋에 대해 5번 반복하여 훈련합니다.

logging_steps=10:

10 스텝마다 로그를 기록합니다. 훈련 중간중간 로그를 남겨서 진행 상황을 모니터링할 수 있습니다.

load_best_model_at_end=True:

훈련 종료 시 가장 성능이 좋은 모델을 로드합니다.

label_names=["labels"]:

라벨 이름을 지정합니다. 여기서는 "labels"라는 이름으로 라벨을 사용합니다.

model:

훈련할 모델입니다. 여기서는 PEFT-LoRA를 적용한 google/vit-base-patch16-224-in21k 모델입니다.

args:

훈련 인자를 포함하는 TrainingArguments 객체입니다. 위에서 설명한 설정들이 포함됩니다.

train_dataset:

훈련 데이터셋입니다. 여기서는 전처리된 train_ds 데이터셋입니다.

eval_dataset:

평가 데이터셋입니다. 여기서는 전처리된 val_ds 데이터셋입니다.

tokenizer:

토크나이저 또는 이미지 프로세서입니다. 여기서는 image_processor를 사용합니다.

data_collator:

배치 데이터를 구성하는 함수입니다. 여기서는 collate_fn을 사용하여 배치를 구성합니다.

모델 업로드


from huggingface_hub import notebook_login

notebook_login()

model.push_to_hub(peft_model_id)

모델 로드 및 예측


from peft import PeftConfig, PeftModel
from transformers import AutoImageProcessor
from PIL import Image
import requests

config = PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")
model = AutoModelForImageClassification.from_pretrained(
    config.base_model_name_or_path,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)
model = PeftModel.from_pretrained(model, "stevhliu/vit-base-patch16-224-in21k-lora")

url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg"
image = Image.open(requests.get(url, stream=True).raw)

encoding = image_processor(image.convert("RGB"), return_tensors="pt")

with torch.no_grad():
    outputs = model(**encoding)
    logits = outputs.logits

predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

2024/7/14 - LoRA

LoRA

Prefix tuning & Adapter

PEFT - Parameter Efficiency Fine Tuning

LoRA 본격적으로 사용해보기