HomeAboutMeBlogGuest
© 2025 Sejin Cha. All rights reserved.
Built with Next.js, deployed on Vercel
장지원 페이지/
📕
2024 UGRP
/
Member Page
Member Page
/
권태완
권태완
/
2024/11/26 -

2024/11/26 -

Tags

clip-vit >> model figure

warnings.warn( CLIPModel( (text_model): CLIPTextTransformer( (embeddings): CLIPTextEmbeddings( (token_embedding): Embedding(49408, 512) (position_embedding): Embedding(77, 512) ) (encoder): CLIPEncoder( (layers): ModuleList( (0-11): 12 x CLIPEncoderLayer( (self_attn): CLIPSdpaAttention( (k_proj): Linear(in_features=512, out_features=512, bias=True) (v_proj): Linear(in_features=512, out_features=512, bias=True) (q_proj): Linear(in_features=512, out_features=512, bias=True) (out_proj): Linear(in_features=512, out_features=512, bias=True) ) (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) (mlp): CLIPMLP( (activation_fn): QuickGELUActivation() (fc1): Linear(in_features=512, out_features=2048, bias=True) (fc2): Linear(in_features=2048, out_features=512, bias=True) ) (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) ) ) (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True) ) (vision_model): CLIPVisionTransformer( (embeddings): CLIPVisionEmbeddings( (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) (position_embedding): Embedding(50, 768) ) (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder): CLIPEncoder( (layers): ModuleList( (0-11): 12 x CLIPEncoderLayer( (self_attn): CLIPSdpaAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): CLIPMLP( (activation_fn): QuickGELUActivation() (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) ) (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) ) (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) (visual_projection): Linear(in_features=768, out_features=512, bias=False) (text_projection): Linear(in_features=512, out_features=512, bias=False) )

GPT - target_modules 를 통해서 접근 가능한 파라미터들

LoRA의 target_modules 파라미터는 모델 내부에서 변경하고자 하는 특정 레이어를 지정합니다. 위 출력된 CLIPModel 구조를 기반으로 LoRA를 적용할 수 있는 주요 모듈을 확인하면 다음과 같습니다:

주요 타겟 모듈

  1. Vision Transformer (ViT) 관련 모듈
      • vision_model.encoder.layers[i].self_attn.q_proj
      • vision_model.encoder.layers[i].self_attn.k_proj
      • vision_model.encoder.layers[i].self_attn.v_proj
      • vision_model.encoder.layers[i].self_attn.out_proj
      • vision_model.encoder.layers[i].mlp.fc1
      • vision_model.encoder.layers[i].mlp.fc2
      이들은 이미지 입력을 처리하는 부분이며, LoRA를 활용해 이미지 피처 학습을 강화할 수 있습니다.
  1. Visual Projection
      • visual_projection: 768 차원의 이미지 피처를 512 차원으로 매핑하는 선형 레이어입니다.
      이 모듈에 LoRA를 적용하면 이미지 피처의 최종 매핑 과정에서 학습 가능성을 높일 수 있습니다.
  1. 텍스트 모델 관련 모듈 (선택적)
      • text_model.encoder.layers[i].self_attn.q_proj
      • text_model.encoder.layers[i].self_attn.k_proj
      • text_model.encoder.layers[i].self_attn.v_proj
      • text_model.encoder.layers[i].self_attn.out_proj
      • text_model.encoder.layers[i].mlp.fc1
      • text_model.encoder.layers[i].mlp.fc2
      텍스트 처리를 강화하려면 해당 모듈도 타겟팅 가능합니다. 하지만 주로 이미지 중심 작업에서는 vision_model 관련 모듈만 타겟팅하면 됩니다.

예시: LoRA 구성에서의 타겟 모듈 설정

다음은 LoRA의 target_modules 설정 예시입니다:
config = LoraConfig( r=8, lora_alpha=32, lora_dropout=0.1, target_modules=[ "vision_model.encoder.layers.*.self_attn.q_proj", "vision_model.encoder.layers.*.self_attn.k_proj", "vision_model.encoder.layers.*.self_attn.v_proj", "vision_model.encoder.layers.*.self_attn.out_proj", "visual_projection" ] )

와일드카드 사용

target_modules에서 .*. 와일드카드를 사용하면 모든 레이어의 특정 모듈을 한꺼번에 타겟팅할 수 있습니다. 예를 들어, vision_model.encoder.layers.*.self_attn.q_proj는 모든 ViT 레이어의 q_proj 모듈을 타겟팅합니다.

선택할 타겟

  • 이미지 중심 작업: vision_model 관련 모듈과 visual_projection만 타겟팅.
  • 텍스트-이미지 상호작용: text_model 관련 모듈도 추가로 타겟팅.
이 정보를 바탕으로 LoRA 적용 시 적절한 타겟 모듈을 선택할 수 있습니다.

vit >> model figure

ViTForImageClassification( (vit): ViTModel( (embeddings): ViTEmbeddings( (patch_embeddings): ViTPatchEmbeddings( (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16)) ) (dropout): Dropout(p=0.0, inplace=False) ) (encoder): ViTEncoder( (layer): ModuleList( (0-11): 12 x ViTLayer( (attention): ViTSdpaAttention( (attention): ViTSdpaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (output): ViTSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) ) (intermediate): ViTIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): ViTOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True) ) ) ) (layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) ) (classifier): Linear(in_features=768, out_features=6, bias=True) )

target_modules를 통해서 접근가능한 파라미터들

ViTForImageClassification 구조를 기반으로 LoRA의 target_modules에서 사용 가능한 파라미터는 모델 내부의 학습 가능한 레이어들입니다. 주로 Linear 레이어와 관련된 부분이 LoRA 적용의 타겟이 됩니다. 아래는 모델 구조에서 target_modules로 지정 가능한 주요 모듈입니다:

1. Attention 관련 모듈 (ViTSdpaSelfAttention)

  • attention.attention.query: Query 부분의 선형 변환
  • attention.attention.key: Key 부분의 선형 변환
  • attention.attention.value: Value 부분의 선형 변환
  • attention.output.dense: Attention 출력의 선형 변환
예시 target_modules 설정:
target_modules = [ "vit.encoder.layer.*.attention.attention.query", "vit.encoder.layer.*.attention.attention.key", "vit.encoder.layer.*.attention.attention.value", "vit.encoder.layer.*.attention.output.dense", ]

2. Intermediate 및 Output 모듈 (Feed-Forward Network)

  • intermediate.dense: Feed-Forward 네트워크의 첫 번째 레이어
  • output.dense: Feed-Forward 네트워크의 두 번째 레이어
예시 target_modules 설정:
target_modules = [ "vit.encoder.layer.*.intermediate.dense", "vit.encoder.layer.*.output.dense", ]

3. Patch Embedding Projection

  • vit.embeddings.patch_embeddings.projection: 입력 이미지를 패치로 변환하는 컨볼루션 레이어
예시 target_modules 설정:
target_modules = [ "vit.embeddings.patch_embeddings.projection", ]

4. Classifier (선택적)

  • classifier: 분류 작업에서 최종 출력으로 사용하는 선형 변환
예시 target_modules 설정:
target_modules = [ "classifier", ]

5. 와일드카드 활용

LoRA 설정에서 와일드카드를 사용하여 여러 레이어를 일괄적으로 타겟팅할 수 있습니다. 예를 들어, 모든 attention.query 모듈을 타겟팅하려면:
target_modules = [ "vit.encoder.layer.*.attention.attention.query", ]
  • 는 모든 레이어를 포함한다는 의미입니다.

전체 타겟 예시

다음은 주요 타겟 모듈을 모두 포함한 예시입니다:
config = LoraConfig( r=8, lora_alpha=32, lora_dropout=0.1, target_modules=[ "vit.encoder.layer.*.attention.attention.query", "vit.encoder.layer.*.attention.attention.key", "vit.encoder.layer.*.attention.attention.value", "vit.encoder.layer.*.attention.output.dense", "vit.encoder.layer.*.intermediate.dense", "vit.encoder.layer.*.output.dense", "vit.embeddings.patch_embeddings.projection", ] )

요약

  • Self-Attention: query, key, value, output.dense
  • Feed-Forward Network: intermediate.dense, output.dense
  • Patch Embeddings: projection
  • Classifier: classifier (선택적)
이 정보를 바탕으로 LoRA의 target_modules를 지정하면 됩니다.
 
[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models - 전생했더니 인공지능이었던 건에 대하여
 

최종 학습을 위한 파라미터, 옵티마이저 설정

  • target_modules
    • ViT의 경우 다음과 같은 파라미터를 가지고 있다
      • Self-Attention: query, key, value, output.dense Feed-Forward Network: intermediate.dense, output.dense Patch Embeddings: projection
    • 이 각각의 파라미터를 사용하는 방법은 아래와 같은데, 만약 명시적으로 표시하지 않을 경우 해당 이름을 가지고 있는 모든 레이어에 영향을 주니 주의하자
      • "vit.encoder.layer.*.attention.attention.query", "vit.encoder.layer.*.attention.attention.key", "vit.encoder.layer.*.attention.attention.value", "vit.encoder.layer.*.attention.output.dense", "vit.encoder.layer.*.intermediate.dense", "vit.encoder.layer.*.output.dense", "vit.embeddings.patch_embeddings.projection"
        이 때 query, key, value의 경우는 attention 에서만 존재하므로 묵시적 표현이 가능하며, dense의 경우 attention과 그 밖의 layer에서도 존재하므로 명시적 표현을 하는 것이 좋다
  • optimizer
    • optimizer는 torch.optim 을 통해서 접근이 가능하다.
      • torch.optim — PyTorch 2.5 documentation
        torch.optim — PyTorch 2.5 documentation

        torch.optim — PyTorch 2.5 documentation

        torch.optim is a package implementing various optimization algorithms.

      • 설명은 위 주소에 있다.
      • 우리가 사용하려는 것은 AdamW, Adam과는 다음의 차이점이 있다
        • 💡
          Adam
          모멘텀 + 적응형 학습률의 특징을 잘 결합한 옵티마이저로, 기울기 평균과 기울기 제곱의 이동 평균을 동시에 계산한다. 이를 통해 학습률을 자동으로 조정하고, 높은 차원에서도 안정적이고 빠른 수렴을 가능하게 한다
          AdamW
          Adam의 Weight Decay 방식, 즉 가중치 감쇠를 적용하는 방식을 개선한 버전이다.
          Adam의 경우 Weight Decay가 L2 정규화의 형태로 loss에 추가된다. 이는 옵티마이저가 업데이트할 떄 가중치를 감쇠시키는 방식으로, Weight Decay와 기울기 업데이트가 밀접하게 연관되어 있는 문제점이 있다. 즉, Weight Decay의 효과가 기울기와 학습률의 조정 과정에서 섞이게 된다
          AdamW에서는 Weight Decay를 Loss 함수에 포함시키는 대신, 파라미터 업데이트 단계에서 독립적으로 적용한다. 이를 통해 Weight Decay가 단순히 규제 효과를 발휘할 수 있도록 하고, 기울기와 Weight Decay 간의 상호작용을 최소화하여, 더 일관된 최적화를 가능하게 한다. 이러한 방법은 특히 딥러닝 모델의 일반화 능력을 향상시키는 데 효과적이다.
           
ㅤ
ㅤ
ViT(with emoset)
ViT(with mone)
ViT(with crawling)
ViT(with emoset crawling)
ViT(with mone crawling)
ViT(with X)
Pretrain(PEFT)
loss
0.1 이하
0.1 이하
0.1 이하
ㅤ
ㅤ
-
ㅤ
LoRA config
r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"
ㅤ
ㅤ
ㅤ
ㅤ
-
ㅤ
loss function
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
-
ㅤ
optimizer
AdamW
AdamW
AdamW
AdamW
AdamW
-
ㅤ
prompt
ㅤ
-
ㅤ
ㅤ
ㅤ
-
train(full finetuning)
epoch
10
10
10
ㅤ
ㅤ
10
ㅤ
loss function
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
CrossEntropyLoss
ㅤ
ㅤ
optimizer
AdamW
AdamW
AdamW
AdamW
AdamW
ㅤ
ㅤ
prompt
ㅤ
-
ㅤ
ㅤ
ㅤ
ㅤ
test
random seed
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
output
accuracy
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
top-N accuracy
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
tsne
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ
ㅤ

본격적인 학습

  • 각종 코드 확인
def setup_device(): """설치된 GPU 확인 및 설정.""" device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print("Device:", device) return device def merge_datasets(dataset1, dataset2): """두 데이터셋을 결합.""" return dataset1 + dataset2 # grayscale to RGB scale !!! def load_datasets(train_path, test_path): train_dataset = load_dataset(train_path, split="train").map(lambda x: {"image": x["image"].convert("RGB")}) test_dataset = load_dataset(test_path, split="train").map(lambda x: {"image": x["image"].convert("RGB")}) return train_dataset, test_dataset def prepare_model(device, num_labels=6): """모델 및 이미지 프로세서 초기화.""" processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True) model = AutoModelForImageClassification.from_pretrained( "google/vit-base-patch16-224", num_labels=num_labels, ignore_mismatched_sizes=True ).to(device) config = LoraConfig( r=8, lora_alpha=16, lora_dropout=0.1, target_modules=["query", "key", "value", "vit.encoder.layer.*.attention.output.dense"], bias="none" ) model = get_peft_model(model, config) return model, processor def create_dataloader(dataset, processor, batch_size=32, shuffle=True, num_workers=4): """DataLoader 생성.""" def collate_fn(batch): images = [item['image'] for item in batch] labels = [item['label'] for item in batch] inputs = processor(images=images, return_tensors="pt") inputs['labels'] = torch.tensor(labels, dtype=torch.long) return inputs return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn, num_workers=num_workers) def create_dataloader_with_augmentation(dataset, processor, batch_size=32, shuffle=True, num_workers=4): augmentation = transforms.Compose([ transforms.RandomHorizontalFlip(), transforms.RandomRotation(15), transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1), transforms.ToTensor() ]) def collate_fn(batch): images = [augmentation(item['image']) for item in batch] labels = [item['label'] for item in batch] inputs = processor(images=images, return_tensors="pt") inputs['labels'] = torch.tensor(labels, dtype=torch.long) return inputs return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn, num_workers=num_workers) def add_noise_to_images(images, noise_level=0.1): """이미지에 랜덤 노이즈 추가.""" noisy_images = [] for img in images: img_np = np.array(img) noise = np.random.normal(0, noise_level, img_np.shape) noisy_img = np.clip(img_np + noise, 0, 255).astype(np.uint8) noisy_images.append(Image.fromarray(noisy_img)) return noisy_images # collate_fn 내부에서 적용 def collate_fn_with_noise(batch): images = add_noise_to_images([item['image'] for item in batch]) labels = [item['label'] for item in batch] inputs = processor(images=images, return_tensors="pt") inputs['labels'] = torch.tensor(labels, dtype=torch.long) return inputs def evaluate_model(model, data_loader, device, valid_label_indices): """모델 평가.""" model.eval() correct = 0 total = 0 with torch.no_grad(): for batch in data_loader: inputs = {k: v.to(device) for k, v in batch.items()} outputs = model(**inputs) _, preds = torch.max(outputs.logits, 1) for pred, label in zip(preds, inputs['labels']): if pred.item() in valid_label_indices: if pred.item() == label.item(): correct += 1 total += 1 return 100 * correct / total def save_top_models(epoch, accuracy, model, top_models, directory): """사용자 지정 디렉토리에 최고 성능 모델 저장.""" os.makedirs(directory, exist_ok=True) model_filename = f"model_epoch_{epoch + 1}_accuracy_{accuracy:.2f}.pth" model_path = os.path.join(directory, model_filename) top_models.append((accuracy, model_path)) top_models = sorted(top_models, key=lambda x: x[0], reverse=True)[:10] torch.save(model.state_dict(), model_path) if epoch % 10 == 0: print("\nTop 10 Models (by accuracy):") for i, (acc, path) in enumerate(top_models, 1): print(f"Rank {i}: Accuracy = {acc:.2f}%, Model Path = {path}") return top_models def train_model(num_epochs, train_loader, test_loader, model, device, optimizer, criterion, valid_label_indices, directory): """모델 학습 루프.""" top_models = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch in train_loader: optimizer.zero_grad() inputs = {k: v.to(device) for k, v in batch.items()} outputs = model(**inputs) loss = outputs.loss loss.backward() optimizer.step() running_loss += loss.item() print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") test_accuracy = evaluate_model(model, test_loader, device, valid_label_indices) print(f"Test Accuracy after Epoch {epoch+1}: {test_accuracy:.2f}%") top_models = save_top_models(epoch, test_accuracy, model, top_models, directory) return top_models # 실행 메인 함수 ## insert_this_code_below device = setup_device() train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K_MonetStyle", "xodhks/Children_Sketch") valid_label_indices = [0, 1, 2, 3, 4, 5] model, processor = prepare_model(device) ''' train dataset 변주 ''' # combined_dataset = merge_datasets(train_dataset, test_dataset) # train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True) train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn_with_noise, num_workers=4) test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False) optimizer = AdamW(model.parameters(), lr=1e-4) criterion = nn.CrossEntropyLoss() # 사용자로부터 저장 디렉토리 입력 받기 save_directory = input("Enter the directory name to save models: ") save_directory = os.path.join(os.getcwd(), save_directory) # 실행 폴더 기준으로 디렉토리 생성 top_models = train_model( num_epochs=100, train_loader=train_loader, test_loader=test_loader, model=model, device=device, optimizer=optimizer, criterion=criterion, valid_label_indices=valid_label_indices, directory = save_directory ) print("Finished Training")

ViT with EmoSet118K

device = setup_device() train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K", "xodhks/Children_Sketch") valid_label_indices = [0, 1, 4, 5] model, processor = prepare_model(device) ''' train dataset 변주 ''' # combined_dataset = merge_datasets(train_dataset, test_dataset) # train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True) train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn_with_noise, num_workers=4) test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False) optimizer = AdamW(model.parameters(), lr=1e-4) criterion = nn.CrossEntropyLoss() # 사용자로부터 저장 디렉토리 입력 받기 save_directory = input("Enter the directory name to save models: ") save_directory = os.path.join(os.getcwd(), save_directory) # 실행 폴더 기준으로 디렉토리 생성 top_models = train_model( num_epochs=100, train_loader=train_loader, test_loader=test_loader, model=model, device=device, optimizer=optimizer, criterion=criterion, valid_label_indices=valid_label_indices, directory = save_directory ) print("Finished Training")
result
epoch 14부터 loss가 0.1 이하로 유지됨. 이 이후의 모델 중 최대의 정답률을 보이는 모델을 저장 및 업로드
Device: cuda Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224 and are newly initialized because the shapes did not match: - classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([6]) in the model instantiated - classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Epoch [1/100], Loss: 1.1820 Test Accuracy after Epoch 1: 38.86% Top 10 Models (by accuracy): Rank 1: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth Epoch [2/100], Loss: 0.5871 Test Accuracy after Epoch 2: 39.69% Epoch [3/100], Loss: 0.4978 Test Accuracy after Epoch 3: 40.06% Epoch [4/100], Loss: 0.4388 Test Accuracy after Epoch 4: 39.87% Epoch [5/100], Loss: 0.3893 Test Accuracy after Epoch 5: 39.59% Epoch [6/100], Loss: 0.3452 Test Accuracy after Epoch 6: 38.86% Epoch [7/100], Loss: 0.3038 Test Accuracy after Epoch 7: 38.12% Epoch [8/100], Loss: 0.2666 Test Accuracy after Epoch 8: 38.95% Epoch [9/100], Loss: 0.2313 Test Accuracy after Epoch 9: 38.49% Epoch [10/100], Loss: 0.1964 Test Accuracy after Epoch 10: 38.49% Epoch [11/100], Loss: 0.1664 Test Accuracy after Epoch 11: 39.23% Top 10 Models (by accuracy): Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth Rank 5: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth Rank 6: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth Rank 7: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth Rank 8: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_6_accuracy_38.86.pth Rank 9: Accuracy = 38.49%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_9_accuracy_38.49.pth Rank 10: Accuracy = 38.49%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_10_accuracy_38.49.pth Epoch [12/100], Loss: 0.1378 Test Accuracy after Epoch 12: 38.77% Epoch [13/100], Loss: 0.1118 Test Accuracy after Epoch 13: 39.32% Epoch [14/100], Loss: 0.0900 Test Accuracy after Epoch 14: 38.58% Epoch [15/100], Loss: 0.0692 Test Accuracy after Epoch 15: 38.86% Epoch [16/100], Loss: 0.0524 Test Accuracy after Epoch 16: 38.67% Epoch [17/100], Loss: 0.0384 Test Accuracy after Epoch 17: 38.49% Epoch [18/100], Loss: 0.0277 Test Accuracy after Epoch 18: 39.13% Epoch [19/100], Loss: 0.0203 Test Accuracy after Epoch 19: 39.41% Epoch [20/100], Loss: 0.0140 Test Accuracy after Epoch 20: 38.49% Epoch [21/100], Loss: 0.0119 Test Accuracy after Epoch 21: 38.40% Top 10 Models (by accuracy): Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth Rank 5: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth Rank 6: Accuracy = 39.32%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_13_accuracy_39.32.pth Rank 7: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth Rank 8: Accuracy = 39.13%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_18_accuracy_39.13.pth Rank 9: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth Rank 10: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth Epoch [22/100], Loss: 0.0081 Test Accuracy after Epoch 22: 38.49% Epoch [23/100], Loss: 0.0058 Test Accuracy after Epoch 23: 38.12% Epoch [24/100], Loss: 0.0075 Test Accuracy after Epoch 24: 38.77% Epoch [25/100], Loss: 0.0140 Test Accuracy after Epoch 25: 38.03% Epoch [26/100], Loss: 0.0077 Test Accuracy after Epoch 26: 38.21% Epoch [27/100], Loss: 0.0042 Test Accuracy after Epoch 27: 38.67% Epoch [28/100], Loss: 0.0029 Test Accuracy after Epoch 28: 38.67% Epoch [29/100], Loss: 0.0024 Test Accuracy after Epoch 29: 38.58% Epoch [30/100], Loss: 0.0020 Test Accuracy after Epoch 30: 38.67% Epoch [31/100], Loss: 0.0016 Test Accuracy after Epoch 31: 39.23% Top 10 Models (by accuracy): Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth Rank 5: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth Rank 6: Accuracy = 39.32%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_13_accuracy_39.32.pth Rank 7: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth Rank 8: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_31_accuracy_39.23.pth Rank 9: Accuracy = 39.13%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_18_accuracy_39.13.pth Rank 10: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth Epoch [32/100], Loss: 0.0015 Test Accuracy after Epoch 32: 38.95% Epoch [33/100], Loss: 0.0012 Test Accuracy after Epoch 33: 39.23% Epoch [34/100], Loss: 0.0239 Test Accuracy after Epoch 34: 39.69% Epoch [35/100], Loss: 0.0054 Test Accuracy after Epoch 35: 39.50% Epoch [36/100], Loss: 0.0023 Test Accuracy after Epoch 36: 38.77% Epoch [37/100], Loss: 0.0020 Test Accuracy after Epoch 37: 39.13% Epoch [38/100], Loss: 0.0023 Test Accuracy after Epoch 38: 39.50% Epoch [39/100], Loss: 0.0062 Test Accuracy after Epoch 39: 39.59% Epoch [40/100], Loss: 0.0024 Test Accuracy after Epoch 40: 39.69% Epoch [41/100], Loss: 0.0071 Test Accuracy after Epoch 41: 39.32% Top 10 Models (by accuracy): Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth Rank 4: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_34_accuracy_39.69.pth Rank 5: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_40_accuracy_39.69.pth Rank 6: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth Rank 7: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_39_accuracy_39.59.pth Rank 8: Accuracy = 39.50%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_35_accuracy_39.50.pth Rank 9: Accuracy = 39.50%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_38_accuracy_39.50.pth Rank 10: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth Epoch [42/100], Loss: 0.0045 Test Accuracy after Epoch 42: 38.67% Epoch [43/100], Loss: 0.0029 Test Accuracy after Epoch 43: 39.04% Epoch [44/100], Loss: 0.0026 Test Accuracy after Epoch 44: 39.41% Epoch [45/100], Loss: 0.0030 Test Accuracy after Epoch 45: 39.59% Epoch [46/100], Loss: 0.0011 Test Accuracy after Epoch 46: 40.42% Epoch [47/100], Loss: 0.0008 Test Accuracy after Epoch 47: 39.78% Epoch [48/100], Loss: 0.0008 Test Accuracy after Epoch 48: 39.78% Epoch [49/100], Loss: 0.0008 Test Accuracy after Epoch 49: 40.33% Epoch [50/100], Loss: 0.0111 Test Accuracy after Epoch 50: 37.57% Epoch [51/100], Loss: 0.0089 Test Accuracy after Epoch 51: 39.32% Top 10 Models (by accuracy): Rank 1: Accuracy = 40.42%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_46_accuracy_40.42.pth Rank 2: Accuracy = 40.33%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_49_accuracy_40.33.pth Rank 3: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth Rank 4: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth Rank 5: Accuracy = 39.78%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_47_accuracy_39.78.pth Rank 6: Accuracy = 39.78%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_48_accuracy_39.78.pth Rank 7: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth Rank 8: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_34_accuracy_39.69.pth Rank 9: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_40_accuracy_39.69.pth Rank 10: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth
model_epoch_46_accuracy_40.42.pth
337051.6KB
model_epoch_49_accuracy_40.33.pth
337051.6KB
model_epoch_3_accuracy_40.06.pth
337051.3KB
 

ViT with MonetStyle

device = setup_device() train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K_MonetStyle", "xodhks/Children_Sketch") valid_label_indices = [0, 1, 4, 5] model, processor = prepare_model(device) ''' train dataset 변주 ''' # combined_dataset = merge_datasets(train_dataset, test_dataset) # train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True) train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn_with_noise, num_workers=4) test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False) optimizer = AdamW(model.parameters(), lr=1e-4) criterion = nn.CrossEntropyLoss() # 사용자로부터 저장 디렉토리 입력 받기 save_directory = "ViT with EmeSet MonetStyle top models" save_directory = os.path.join(os.getcwd(), save_directory) # 실행 폴더 기준으로 디렉토리 생성 top_models = train_model( num_epochs=100, train_loader=train_loader, test_loader=test_loader, model=model, device=device, optimizer=optimizer, criterion=criterion, valid_label_indices=valid_label_indices, directory = save_directory ) print("Finished Training")
result
epoch 9부터 0.1 이하이므로 이 이후의 모델 중 제일 높은 정확도를 가진 모델들을 나열한다
model_epoch_21_accuracy_43.00.pth
337051.6KB
model_epoch_22_accuracy_42.17.pth
337051.6KB
model_epoch_42_accuracy_39.23.pth
337051.6KB

ViT with Crawling

device = setup_device() train_dataset, test_dataset = load_datasets("xodhks/crawling-emotions-in-google-train", "xodhks/crawling-emotions-in-google-test") valid_label_indices = [0, 1, 2, 3, 4, 5] model, processor = prepare_model(device) ''' train dataset 변주 ''' # combined_dataset = merge_datasets(train_dataset, test_dataset) # train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True) train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True) # train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn_with_noise, num_workers=4) test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False) optimizer = AdamW(model.parameters(), lr=1e-3) criterion = nn.CrossEntropyLoss() # 사용자로부터 저장 디렉토리 입력 받기 save_directory = "ViT with Crawling top models" save_directory = os.path.join(os.getcwd(), save_directory) # 실행 폴더 기준으로 디렉토리 생성 top_models = train_model( num_epochs=100, train_loader=train_loader, test_loader=test_loader, model=model, device=device, optimizer=optimizer, criterion=criterion, valid_label_indices=valid_label_indices, directory = save_directory ) print("Finished Training")
result
epoch 7 이후로 loss는 0.1 이하이다
model_epoch_10_accuracy_65.33.pth
337051.6KB
model_epoch_45_accuracy_64.67.pth
337051.6KB
model_epoch_15_accuracy_64.33.pth
337051.6KB
 
 

Crawling → upload Images

  • 지금까지 구글을 통해서 크롤링한 이미지들을 취합하여 children sketch 라는 제목으로 hugging face에 올리고 이를 학습과 검증에 사용하려고 한다