Portfolio

위 제목은 예시 입니다. 회의록 작성 시 “날짜 ‘회의록’ (회의 내용 요약)” 형식으로 제목을 수정해 주세요:)

[회의 주제]

[To-do]

dataset

huggingfaceJANGJIWON/UGRP_sketchset_textbook · Datasets at Hugging Face

JANGJIWON/UGRP_sketchset_textbook · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingfacexodhks/EmoSet118K · Datasets at Hugging Face

xodhks/EmoSet118K · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingfacexodhks/EmoSet118K_MonetStyle · Datasets at Hugging Face

xodhks/EmoSet118K_MonetStyle · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

++ 크롤링 데이터

train

파인튜닝 메소드, 파라미터 fix 시키고 데이터 셋 test

emoset → form
emoset+form → form
emoset(GAN) → form
emoset(GAN)+form → form
emoset(GAN)+crawling → form
emoset(GAN)+crawling+form → form
form → form

~~데이터 셋, 파라미터 fix 시키고, 파인튜닝 메소드 test~~

~~original fine tuning~~
~~adpter~~
~~LoRA~~

파인튜닝 메소드, 데이터 셋 fix 시키고, 파라미터 test

LoRA’s param tuning

질문(교수님 미팅)

크롤 데이터

작은 데이터 셋 평가하는 방법

정확도를 높일 수 있는 또 다른 방법

점진적 접근 법(?)

코드


import torch
import os
from transformers import AutoModelForImageClassification, AutoImageProcessor
from datasets import load_dataset
from torch.utils.data import DataLoader
from torch.optim import Adam
import torch.nn as nn
from peft import get_peft_model, LoraConfig

# GPU 설정
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Device:", device)

# 데이터셋 로드
train_dataset = load_dataset("xodhks/EmoSet118K_MonetStyle", split="train")
test_dataset = load_dataset("xodhks/Children_Sketch", split="train")

# 테스트 데이터셋의 유효 라벨 목록
test_valid_label_indices = [0, 1, 4, 5]  # Children_Sketch에 존재하는 라벨 인덱스만 포함

# 이미지 처리기와 모델 로드
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True)
model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224",
    num_labels=6,  # 데이터셋의 감정 클래스 수
    ignore_mismatched_sizes=True
).to(device)

# LoRA 구성 및 적용
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["query", "key", "value"],
)
model = get_peft_model(model, config)

# 모델 저장을 위한 디렉토리 생성
os.makedirs("top_models", exist_ok=True)
top_models = []

# DataLoader 설정
def collate_fn(batch):
    images = [item['image'] for item in batch]
    labels = [item['label'] for item in batch]

    inputs = processor(images=images, return_tensors="pt")
    inputs['labels'] = torch.tensor(labels, dtype=torch.long)
    return inputs

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, collate_fn=collate_fn, num_workers=4)

# 손실 함수 및 옵티마이저 정의
criterion = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=1e-4)

# 평가 함수
def evaluate(model, data_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in data_loader:
            inputs = {k: v.to(device) for k, v in batch.items()}
            outputs = model(**inputs)
            _, preds = torch.max(outputs.logits, 1)

            for pred, label in zip(preds, inputs['labels']):
                if pred.item() in test_valid_label_indices:
                    if pred.item() == label.item():
                        correct += 1
                total += 1

    accuracy = 100 * correct / total
    return accuracy

# 모델 저장 함수
def save_top_models(epoch, accuracy, model, top_models):
    model_filename = f"model_epoch_{epoch + 1}_accuracy_{accuracy:.2f}.pth"
    model_path = os.path.join("top_models", model_filename)
    top_models.append((accuracy, model_path))
    top_models = sorted(top_models, key=lambda x: x[0], reverse=True)[:10]
    torch.save(model.state_dict(), model_path)
    print("\nTop 10 Models (by accuracy):")
    for i, (acc, path) in enumerate(top_models, 1):
        print(f"Rank {i}: Accuracy = {acc:.2f}%, Model Path = {path}")
    return top_models

# 학습 루프
num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for batch in train_loader:
        optimizer.zero_grad()
        inputs = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**inputs)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}")
    test_accuracy = evaluate(model, test_loader)
    print(f"Test Accuracy after Epoch {epoch+1}: {test_accuracy:.2f}%")
    top_models = save_top_models(epoch, test_accuracy, model, top_models)

print("Finished Training")

코드


import torch
import os
from transformers import AutoModelForImageClassification, AutoImageProcessor
from datasets import load_dataset
from torch.utils.data import DataLoader, Dataset
from torch.optim import Adam
import torch.nn as nn
from sklearn.preprocessing import LabelEncoder
from peft import get_peft_model, LoraConfig
import requests
import io
from torchvision import transforms
from tqdm import tqdm
from sklearn.model_selection import train_test_split

# ViT 모델 로드 및 전처리 함수
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True)
model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224",
    num_labels=8,
    ignore_mismatched_sizes=True
).to(device)

# 전처리 함수
def preprocess_image(image):
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)),
    ])
    return transform(image)

class CustomDataset(Dataset):
    def __init__(self, dataset, transform=None):
        self.dataset = dataset
        self.transform = transform
        self.label_encoder = LabelEncoder()
        labels = [item['label'] for item in dataset]
        self.label_encoder.fit(labels)

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        item = self.dataset[idx]
        img = item['image']
        label = item['label']
        if self.transform:
            img = self.transform(img)
        label = self.label_encoder.transform([label])[0]
        return img, torch.tensor(label, dtype=torch.long)

# 모델 가중치 로드 및 데이터셋 확인
# model_url = "https://huggingface.co/JANGJIWON/EmoSet118K_MonetStyle_student/blob/main/model_epoch_5_accuracy_43.09.pth"
# response = requests.get(model_url)
# model_weights = io.BytesIO(response.content)

from huggingface_hub import hf_hub_download

model_weights_path = hf_hub_download(repo_id="JANGJIWON/EmoSet118K_MonetStyle_student", filename="model_epoch_5_accuracy_43.09.pth")
model.load_state_dict(torch.load(model_weights_path, map_location='cpu'), strict=False)

# try:
#     model.load_state_dict(torch.load(model_weights, map_location='cpu', weights_only=False), strict=False)
# except RuntimeError as e:
#     print(f"Error loading state_dict: {e}")

# 두 번째 LoRA 구성 및 적용
config2 = LoraConfig(
    r=4,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["query", "key", "value"]
)
model = get_peft_model(model, config2)

# 데이터셋 준비
dataset = load_dataset("JANGJIWON/UGRP_sketchset_textbook", split="train")

# Convert the dataset to a list of dictionaries for splitting
dataset_list = [dict(item) for item in dataset]

# Split the dataset into train and test sets (80% train, 20% test)
train_data, test_data = train_test_split(dataset_list, test_size=0.3, random_state=42)

# Create datasets and dataloaders
train_dataset = CustomDataset(train_data, transform=preprocess_image)
test_dataset = CustomDataset(test_data, transform=preprocess_image)

train_loader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2)

# 옵티마이저 및 손실 함수 설정
learning_rate = 0.001  # 조금 더 높은 학습률로 설정
optimizer = Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# 모델 훈련
num_epochs = 10
model.train()

for epoch in range(num_epochs):
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in tqdm(train_loader):
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()

        outputs = model(images)
        loss = criterion(outputs.logits, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = torch.max(outputs.logits, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    epoch_accuracy = 100 * correct / total
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%')

# 테스트 수행 및 정확도 계산
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:  # Use test_loader here
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.logits, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Final Test Accuracy after second LoRA tuning: {accuracy:.2f}%')

그림

구조를 이해해 보면 의미 X

만약 전 LoRA 갖다 쓰고 그럴 거면 의미가 있을 건데, 마지막 LoRA만 쓸 거면 의미 없음

질문

0,1,2,3,4,5

[happiness, sadness, …]

VLM(시각-언어 모델): LLAVA

질문

반대로 test 이미지를 encoding

mix 해서 학습 학습하는 것

이렇게 학습 하는 것 두 가지 방식으로 해보기

[세부 내용 메모]

GAN 해도 최종적으로 평가하는 이미지와 거리가 너무 멀다.

마지막껀 future로 두기

JIWON ← 코드는 여기있는 거 바탕으로 하면 좋을듯

학습 X zero shot → 18%?

ugrp test set → 60%

LoRA → 50%

parameter efficient fine tuning → 모든 파라미터 활성화시키는 것보다 떨어지는 것 같음..

처음 테스트 : UGRP test

두번째 : emoset Monet style로 pre train → 정확도 비슷하게 나왔었음..

그냥 emoset → 아직 테스트 안 해봄

TABLE

CLIP, ViT 비교

ㅤ	ㅤ	CLIP	ViT	ResNet
Pretrain(PEFT) with emoset	loss	0.2 이하	0.2 이하	ㅤ
ㅤ	LoRA config	ㅤ	ㅤ	ㅤ
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	ㅤ
ㅤ	prompt	ㅤ	-	ㅤ
train(full finetuning)	epoch	10	10	ㅤ
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	ㅤ
ㅤ	prompt	ㅤ	-	ㅤ
test	random seed	ㅤ	ㅤ	ㅤ
output	accuracy	ㅤ	ㅤ	ㅤ
ㅤ	top-N accuracy	ㅤ	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ	ㅤ

Pretrain 방식 비교

ㅤ	ㅤ	CLIP(with emoset)	CLIP(with mone)	CLIP(with crawling)	CLIP(with emoset crawling)	CLIP(with mone crawling)	CLIP(with X)
Pretrain(PEFT)	loss	0.1 이하	0.1 이하	0.1 이하	ㅤ	ㅤ	-
ㅤ	LoRA config	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	-
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	-
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	-
ㅤ	prompt	ㅤ	-	ㅤ	ㅤ	ㅤ	-
train(full finetuning)	epoch	10	10	10	ㅤ	ㅤ	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	ㅤ
ㅤ	prompt	ㅤ	-	ㅤ	ㅤ	ㅤ	ㅤ
test	random seed	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
output	accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	top-N accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

여러 번 학습 vs 한 번 학습(mixed data) 비교

ㅤ	ㅤ	CLIP(단계 별)	CLIP(한 번)
Pretrain(PEFT)	loss	0.2 이하	0.2 이하
ㅤ	LoRA config	ㅤ	ㅤ
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss
ㅤ	optimizer	AdamW	AdamW
ㅤ	prompt	ㅤ	-
train(full finetuning)	epoch	10	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss
ㅤ	optimizer	AdamW	AdamW
ㅤ	prompt	ㅤ	-
test	random seed	ㅤ	ㅤ
output	accuracy	ㅤ	ㅤ
ㅤ	top-N accuracy	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ

파라미터 비교

위 것들 중 젤 좋은 것에 대해 파라미터 비교

TABLE-1/2

전생했더니 인공지능이었던 건에 대하여[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models

[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models

LoRA 논문 리뷰

ㅤ	ㅤ	CLIP(with emoset)	CLIP(with mone)	CLIP(with crawling)	CLIP(with emoset crawling)	CLIP(with mone crawling)	CLIP(with X)
Pretrain(PEFT)	loss	0.2 이하	0.2 이하	0.2 이하	0.2 이하	0.2 이하	-
ㅤ	LoRA config	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, visual_projection, bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, visual_projection, bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, visual_projection, bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, visual_projection, bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, visual_projection, bias="none"	-
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	-
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	-
ㅤ	prompt	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	-
train(full finetuning)	epoch	10	10	10	10	10	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	AdamW
ㅤ	prompt	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]	[f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels]
ㅤ	train size	0.2	0.2	0.2	0.2	0.2	0.2
ㅤ	random seed	42	42	42	42	42	42
output	accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	61.54%
ㅤ	tsne	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	Silhouette Score	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

ㅤ	ㅤ	ViT(with emoset)	ViT(with mone)	ViT(with crawling)	ViT(with emoset crawling)	ViT(with mone crawling)	ViT(with X)
Pretrain(PEFT)	loss	0.2 이하	0.2 이하	0.2 이하	0.2 이하	0.2 이하	-
ㅤ	LoRA config	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	-r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	-
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	-
ㅤ	prompt	ㅤ	-	ㅤ	ㅤ	ㅤ	-
train(full finetuning)	epoch	10	10	10	ㅤ	ㅤ	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	ㅤ
ㅤ	prompt	-	-	-	-	-	ㅤ
ㅤ	train size	0.2	0.2	0.2	0.2	0.2	0.2
ㅤ	random seed	42	42	42	42	42	42
output	accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	Silhouette Score	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

ㅤ	ㅤ	ResNet(with emoset)	ResNet(with mone)	ResNet(with crawling)	ResNet(with emoset crawling)	ResNet(with mone crawling)	ResNet(with X)
Pretrain(PEFT)	loss	0.2 이하	0.2 이하	0.2 이하	0.2 이하	0.2 이하	-
ㅤ	LoRA config	-	-	-	-	-	-
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	-
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	-
ㅤ	prompt	-	-	-	-	-	-
train(full finetuning)	epoch	10	10	10	10	10	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	ㅤ
ㅤ	prompt	-	-	-	-	-	ㅤ
ㅤ	train size	0.2	0.2	0.2	0.2	0.2	ㅤ
test	random seed	42	42	42	42	42	42
output	accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	Silhouette Score	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

t-SNE: 비선형에 쓰는 PCA같은거?!

JUNSU

설문조사 과정 정리

AHYEON

설문조사 데이터 분석ing…

HOJIN

resNet 바탕 데이터 코드 돌림

LoRA를 사용하지 않고 1. 설문조사 데이터로 test, 2. crawling 한걸 쓰느냐 → sketch 로 변환한게 더 결과가 좋았음

RAG에 먹일 수 있는 설문조사 의도나 이런 데이터 가공

openAI key

TAEWAN

crawling → 한 감정당 150개 정도는 뜸

[다음 회의 주제 및 To-do]

GAN 대신 diffusion 써보기

LoRA 구조 바꿔보기

VLM(우선 ViT부터) 써보기

OpenAI 학습 다시 해보기 → 정확도 이거 이상함,,, → key 빌리기

test를 인코딩 해보기 → 표정 같은 거 없어질 수도

confiui → finefuning도 가능함

우리가 사용하는 건 CLIP의 ViT image encoder

우리가 하는 건 text encoder 필요 없나?

prompt 엔지니어링도 해보

tsne 뿌려보기

담주 할 것

지원: 1,2,6, AI 코드 짜오기, t-sne

태완: 1,2,6, 크롤링 완성

호진: 1,2,6

준수: Emoset

아현: 설문 조사 과정

2024/11/19 회의록 (마무리3)

[회의 주제]

[To-do]

dataset

train

질문(교수님 미팅)

[세부 내용 메모]

[다음 회의 주제 및 To-do]