위 제목은 예시 입니다. 회의록 작성 시 “날짜 ‘회의록’ (회의 내용 요약)” 형식으로 제목을 수정해 주세요:)
[회의 주제]
[To-do]
dataset
JANGJIWON/UGRP_sketchset_textbook · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
xodhks/EmoSet118K · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
xodhks/EmoSet118K_MonetStyle · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
++ 크롤링 데이터
train
- 파인튜닝 메소드, 파라미터 fix 시키고 데이터 셋 test
- emoset → form
- emoset+form → form
- emoset(GAN) → form
- emoset(GAN)+form → form
- emoset(GAN)+crawling → form
- emoset(GAN)+crawling+form → form
- form → form
데이터 셋, 파라미터 fix 시키고, 파인튜닝 메소드 testoriginal fine tuningadpterLoRA
- 파인튜닝 메소드, 데이터 셋 fix 시키고, 파라미터 test
- LoRA’s param tuning
질문(교수님 미팅)
- 크롤 데이터
- 작은 데이터 셋 평가하는 방법
- 정확도를 높일 수 있는 또 다른 방법
- 점진적 접근 법(?)
코드
import torch import os from transformers import AutoModelForImageClassification, AutoImageProcessor from datasets import load_dataset from torch.utils.data import DataLoader from torch.optim import Adam import torch.nn as nn from peft import get_peft_model, LoraConfig # GPU 설정 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print("Device:", device) # 데이터셋 로드 train_dataset = load_dataset("xodhks/EmoSet118K_MonetStyle", split="train") test_dataset = load_dataset("xodhks/Children_Sketch", split="train") # 테스트 데이터셋의 유효 라벨 목록 test_valid_label_indices = [0, 1, 4, 5] # Children_Sketch에 존재하는 라벨 인덱스만 포함 # 이미지 처리기와 모델 로드 processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True) model = AutoModelForImageClassification.from_pretrained( "google/vit-base-patch16-224", num_labels=6, # 데이터셋의 감정 클래스 수 ignore_mismatched_sizes=True ).to(device) # LoRA 구성 및 적용 config = LoraConfig( r=8, lora_alpha=32, lora_dropout=0.1, target_modules=["query", "key", "value"], ) model = get_peft_model(model, config) # 모델 저장을 위한 디렉토리 생성 os.makedirs("top_models", exist_ok=True) top_models = [] # DataLoader 설정 def collate_fn(batch): images = [item['image'] for item in batch] labels = [item['label'] for item in batch] inputs = processor(images=images, return_tensors="pt") inputs['labels'] = torch.tensor(labels, dtype=torch.long) return inputs train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn, num_workers=4) test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, collate_fn=collate_fn, num_workers=4) # 손실 함수 및 옵티마이저 정의 criterion = nn.CrossEntropyLoss() optimizer = Adam(model.parameters(), lr=1e-4) # 평가 함수 def evaluate(model, data_loader): model.eval() correct = 0 total = 0 with torch.no_grad(): for batch in data_loader: inputs = {k: v.to(device) for k, v in batch.items()} outputs = model(**inputs) _, preds = torch.max(outputs.logits, 1) for pred, label in zip(preds, inputs['labels']): if pred.item() in test_valid_label_indices: if pred.item() == label.item(): correct += 1 total += 1 accuracy = 100 * correct / total return accuracy # 모델 저장 함수 def save_top_models(epoch, accuracy, model, top_models): model_filename = f"model_epoch_{epoch + 1}_accuracy_{accuracy:.2f}.pth" model_path = os.path.join("top_models", model_filename) top_models.append((accuracy, model_path)) top_models = sorted(top_models, key=lambda x: x[0], reverse=True)[:10] torch.save(model.state_dict(), model_path) print("\nTop 10 Models (by accuracy):") for i, (acc, path) in enumerate(top_models, 1): print(f"Rank {i}: Accuracy = {acc:.2f}%, Model Path = {path}") return top_models # 학습 루프 num_epochs = 100 for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch in train_loader: optimizer.zero_grad() inputs = {k: v.to(device) for k, v in batch.items()} outputs = model(**inputs) loss = outputs.loss loss.backward() optimizer.step() running_loss += loss.item() print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") test_accuracy = evaluate(model, test_loader) print(f"Test Accuracy after Epoch {epoch+1}: {test_accuracy:.2f}%") top_models = save_top_models(epoch, test_accuracy, model, top_models) print("Finished Training")
코드
import torch import os from transformers import AutoModelForImageClassification, AutoImageProcessor from datasets import load_dataset from torch.utils.data import DataLoader, Dataset from torch.optim import Adam import torch.nn as nn from sklearn.preprocessing import LabelEncoder from peft import get_peft_model, LoraConfig import requests import io from torchvision import transforms from tqdm import tqdm from sklearn.model_selection import train_test_split # ViT 모델 로드 및 전처리 함수 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True) model = AutoModelForImageClassification.from_pretrained( "google/vit-base-patch16-224", num_labels=8, ignore_mismatched_sizes=True ).to(device) # 전처리 함수 def preprocess_image(image): transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)), ]) return transform(image) class CustomDataset(Dataset): def __init__(self, dataset, transform=None): self.dataset = dataset self.transform = transform self.label_encoder = LabelEncoder() labels = [item['label'] for item in dataset] self.label_encoder.fit(labels) def __len__(self): return len(self.dataset) def __getitem__(self, idx): item = self.dataset[idx] img = item['image'] label = item['label'] if self.transform: img = self.transform(img) label = self.label_encoder.transform([label])[0] return img, torch.tensor(label, dtype=torch.long) # 모델 가중치 로드 및 데이터셋 확인 # model_url = "https://huggingface.co/JANGJIWON/EmoSet118K_MonetStyle_student/blob/main/model_epoch_5_accuracy_43.09.pth" # response = requests.get(model_url) # model_weights = io.BytesIO(response.content) from huggingface_hub import hf_hub_download model_weights_path = hf_hub_download(repo_id="JANGJIWON/EmoSet118K_MonetStyle_student", filename="model_epoch_5_accuracy_43.09.pth") model.load_state_dict(torch.load(model_weights_path, map_location='cpu'), strict=False) # try: # model.load_state_dict(torch.load(model_weights, map_location='cpu', weights_only=False), strict=False) # except RuntimeError as e: # print(f"Error loading state_dict: {e}") # 두 번째 LoRA 구성 및 적용 config2 = LoraConfig( r=4, lora_alpha=16, lora_dropout=0.05, target_modules=["query", "key", "value"] ) model = get_peft_model(model, config2) # 데이터셋 준비 dataset = load_dataset("JANGJIWON/UGRP_sketchset_textbook", split="train") # Convert the dataset to a list of dictionaries for splitting dataset_list = [dict(item) for item in dataset] # Split the dataset into train and test sets (80% train, 20% test) train_data, test_data = train_test_split(dataset_list, test_size=0.3, random_state=42) # Create datasets and dataloaders train_dataset = CustomDataset(train_data, transform=preprocess_image) test_dataset = CustomDataset(test_data, transform=preprocess_image) train_loader = DataLoader(train_dataset, batch_size=1, shuffle=True, num_workers=2) test_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=2) # 옵티마이저 및 손실 함수 설정 learning_rate = 0.001 # 조금 더 높은 학습률로 설정 optimizer = Adam(model.parameters(), lr=learning_rate) criterion = nn.CrossEntropyLoss() # 모델 훈련 num_epochs = 10 model.train() for epoch in range(num_epochs): running_loss = 0.0 correct = 0 total = 0 for images, labels in tqdm(train_loader): images, labels = images.to(device), labels.to(device) optimizer.zero_grad() outputs = model(images) loss = criterion(outputs.logits, labels) loss.backward() optimizer.step() running_loss += loss.item() _, predicted = torch.max(outputs.logits, 1) total += labels.size(0) correct += (predicted == labels).sum().item() epoch_accuracy = 100 * correct / total print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%') # 테스트 수행 및 정확도 계산 model.eval() correct = 0 total = 0 with torch.no_grad(): for images, labels in test_loader: # Use test_loader here images, labels = images.to(device), labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.logits, 1) total += labels.size(0) correct += (predicted == labels).sum().item() accuracy = 100 * correct / total print(f'Final Test Accuracy after second LoRA tuning: {accuracy:.2f}%')
그림

구조를 이해해 보면 의미 X
만약 전 LoRA 갖다 쓰고 그럴 거면 의미가 있을 건데, 마지막 LoRA만 쓸 거면 의미 없음
질문
0,1,2,3,4,5
[happiness, sadness, …]
VLM(시각-언어 모델): LLAVA
질문
반대로 test 이미지를 encoding
mix 해서 학습 학습하는 것
이렇게 학습 하는 것 두 가지 방식으로 해보기
[세부 내용 메모]
GAN 해도 최종적으로 평가하는 이미지와 거리가 너무 멀다.
마지막껀 future로 두기
JIWON ← 코드는 여기있는 거 바탕으로 하면 좋을듯
학습 X zero shot → 18%?
ugrp test set → 60%
LoRA → 50%
parameter efficient fine tuning → 모든 파라미터 활성화시키는 것보다 떨어지는 것 같음..
처음 테스트 : UGRP test
두번째 : emoset Monet style로 pre train → 정확도 비슷하게 나왔었음..
그냥 emoset → 아직 테스트 안 해봄
TABLE
- CLIP, ViT 비교
ㅤ | ㅤ | CLIP | ViT | ResNet |
Pretrain(PEFT) with emoset | loss | 0.2 이하 | 0.2 이하 | ㅤ |
ㅤ | LoRA config | ㅤ | ㅤ | ㅤ |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | ㅤ |
ㅤ | optimizer | AdamW | AdamW | ㅤ |
ㅤ | prompt | ㅤ | - | ㅤ |
train(full finetuning) | epoch | 10 | 10 | ㅤ |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | ㅤ |
ㅤ | optimizer | AdamW | AdamW | ㅤ |
ㅤ | prompt | ㅤ | - | ㅤ |
test | random seed | ㅤ | ㅤ | ㅤ |
output | accuracy | ㅤ | ㅤ | ㅤ |
ㅤ | top-N accuracy | ㅤ | ㅤ | ㅤ |
ㅤ | tsne | ㅤ | ㅤ | ㅤ |
- Pretrain 방식 비교
ㅤ | ㅤ | CLIP(with emoset) | CLIP(with mone) | CLIP(with crawling) | CLIP(with emoset crawling) | CLIP(with mone crawling) | CLIP(with X) |
Pretrain(PEFT) | loss | 0.1 이하 | 0.1 이하 | 0.1 이하 | ㅤ | ㅤ | - |
ㅤ | LoRA config | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | - |
ㅤ | prompt | ㅤ | - | ㅤ | ㅤ | ㅤ | - |
train(full finetuning) | epoch | 10 | 10 | 10 | ㅤ | ㅤ | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | ㅤ |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | ㅤ |
ㅤ | prompt | ㅤ | - | ㅤ | ㅤ | ㅤ | ㅤ |
test | random seed | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
output | accuracy | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | top-N accuracy | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
- 여러 번 학습 vs 한 번 학습(mixed data) 비교
ㅤ | ㅤ | CLIP(단계 별) | CLIP(한 번) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 |
ㅤ | LoRA config | ㅤ | ㅤ |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW |
ㅤ | prompt | ㅤ | - |
train(full finetuning) | epoch | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW |
ㅤ | prompt | ㅤ | - |
test | random seed | ㅤ | ㅤ |
output | accuracy | ㅤ | ㅤ |
ㅤ | top-N accuracy | ㅤ | ㅤ |
ㅤ | tsne | ㅤ | ㅤ |
- 파라미터 비교
위 것들 중 젤 좋은 것에 대해 파라미터 비교
TABLE-1/2
[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models
LoRA 논문 리뷰
ㅤ | ㅤ | CLIP(with emoset) | CLIP(with mone) | CLIP(with crawling) | CLIP(with emoset crawling) | CLIP(with mone crawling) | CLIP(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | - |
ㅤ | prompt | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | - |
train(full finetuning) | epoch | 10 | 10 | 10 | 10 | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | AdamW |
ㅤ | prompt | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
ㅤ | random seed | 42 | 42 | 42 | 42 | 42 | 42 |
output | accuracy | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | 61.54% |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | Silhouette Score | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | ㅤ | ViT(with emoset) | ViT(with mone) | ViT(with crawling) | ViT(with emoset crawling) | ViT(with mone crawling) | ViT(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | -r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | - |
ㅤ | prompt | ㅤ | - | ㅤ | ㅤ | ㅤ | - |
train(full finetuning) | epoch | 10 | 10 | 10 | ㅤ | ㅤ | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | ㅤ |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | ㅤ |
ㅤ | prompt | - | - | - | - | - | ㅤ |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
ㅤ | random seed | 42 | 42 | 42 | 42 | 42 | 42 |
output | accuracy | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | Silhouette Score | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | ㅤ | ResNet(with emoset) | ResNet(with mone) | ResNet(with crawling) | ResNet(with emoset crawling) | ResNet(with mone crawling) | ResNet(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | - | - | - | - | - | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | - |
ㅤ | prompt | - | - | - | - | - | - |
train(full finetuning) | epoch | 10 | 10 | 10 | 10 | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | ㅤ |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW | AdamW | ㅤ |
ㅤ | prompt | - | - | - | - | - | ㅤ |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | ㅤ |
test | random seed | 42 | 42 | 42 | 42 | 42 | 42 |
output | accuracy | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
ㅤ | Silhouette Score | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ | ㅤ |
t-SNE: 비선형에 쓰는 PCA같은거?!
JUNSU
설문조사 과정 정리
AHYEON
설문조사 데이터 분석ing…
resNet 바탕 데이터 코드 돌림
LoRA를 사용하지 않고 1. 설문조사 데이터로 test, 2. crawling 한걸 쓰느냐 → sketch 로 변환한게 더 결과가 좋았음
RAG에 먹일 수 있는 설문조사 의도나 이런 데이터 가공
openAI key
crawling → 한 감정당 150개 정도는 뜸
[다음 회의 주제 및 To-do]
- GAN 대신 diffusion 써보기
- LoRA 구조 바꿔보기
- VLM(우선 ViT부터) 써보기
- OpenAI 학습 다시 해보기 → 정확도 이거 이상함,,, → key 빌리기
- test를 인코딩 해보기 → 표정 같은 거 없어질 수도
confiui → finefuning도 가능함
우리가 사용하는 건 CLIP의 ViT image encoder
우리가 하는 건 text encoder 필요 없나?
prompt 엔지니어링도 해보
tsne 뿌려보기
담주 할 것
지원: 1,2,6, AI 코드 짜오기, t-sne
태완: 1,2,6, 크롤링 완성
호진: 1,2,6
준수: Emoset
아현: 설문 조사 과정