Portfolio

clip-vit >> model figure


  warnings.warn(
CLIPModel(
  (text_model): CLIPTextTransformer(
    (embeddings): CLIPTextEmbeddings(
      (token_embedding): Embedding(49408, 512)
      (position_embedding): Embedding(77, 512)
    )
    (encoder): CLIPEncoder(
      (layers): ModuleList(
        (0-11): 12 x CLIPEncoderLayer(
          (self_attn): CLIPSdpaAttention(
            (k_proj): Linear(in_features=512, out_features=512, bias=True)
            (v_proj): Linear(in_features=512, out_features=512, bias=True)
            (q_proj): Linear(in_features=512, out_features=512, bias=True)
            (out_proj): Linear(in_features=512, out_features=512, bias=True)
          )
          (layer_norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (mlp): CLIPMLP(
            (activation_fn): QuickGELUActivation()
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
          )
          (layer_norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        )
      )
    )
    (final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (vision_model): CLIPVisionTransformer(
    (embeddings): CLIPVisionEmbeddings(
      (patch_embedding): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
      (position_embedding): Embedding(50, 768)
    )
    (pre_layrnorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (encoder): CLIPEncoder(
      (layers): ModuleList(
        (0-11): 12 x CLIPEncoderLayer(
          (self_attn): CLIPSdpaAttention(
            (k_proj): Linear(in_features=768, out_features=768, bias=True)
            (v_proj): Linear(in_features=768, out_features=768, bias=True)
            (q_proj): Linear(in_features=768, out_features=768, bias=True)
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
          (layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): CLIPMLP(
            (activation_fn): QuickGELUActivation()
            (fc1): Linear(in_features=768, out_features=3072, bias=True)
            (fc2): Linear(in_features=3072, out_features=768, bias=True)
          )
          (layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        )
      )
    )
    (post_layernorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (visual_projection): Linear(in_features=768, out_features=512, bias=False)
  (text_projection): Linear(in_features=512, out_features=512, bias=False)
)

GPT - target_modules 를 통해서 접근 가능한 파라미터들

LoRA의 target_modules 파라미터는 모델 내부에서 변경하고자 하는 특정 레이어를 지정합니다. 위 출력된 CLIPModel 구조를 기반으로 LoRA를 적용할 수 있는 주요 모듈을 확인하면 다음과 같습니다:
주요 타겟 모듈
Vision Transformer (ViT) 관련 모듈
vision_model.encoder.layers[i].self_attn.q_proj
vision_model.encoder.layers[i].self_attn.k_proj
vision_model.encoder.layers[i].self_attn.v_proj
vision_model.encoder.layers[i].self_attn.out_proj
vision_model.encoder.layers[i].mlp.fc1
vision_model.encoder.layers[i].mlp.fc2
이들은 이미지 입력을 처리하는 부분이며, LoRA를 활용해 이미지 피처 학습을 강화할 수 있습니다.
Visual Projection
visual_projection: 768 차원의 이미지 피처를 512 차원으로 매핑하는 선형 레이어입니다.
이 모듈에 LoRA를 적용하면 이미지 피처의 최종 매핑 과정에서 학습 가능성을 높일 수 있습니다.
텍스트 모델 관련 모듈 (선택적)
text_model.encoder.layers[i].self_attn.q_proj
text_model.encoder.layers[i].self_attn.k_proj
text_model.encoder.layers[i].self_attn.v_proj
text_model.encoder.layers[i].self_attn.out_proj
text_model.encoder.layers[i].mlp.fc1
text_model.encoder.layers[i].mlp.fc2
텍스트 처리를 강화하려면 해당 모듈도 타겟팅 가능합니다. 하지만 주로 이미지 중심 작업에서는 vision_model 관련 모듈만 타겟팅하면 됩니다.
예시: LoRA 구성에서의 타겟 모듈 설정
다음은 LoRA의 target_modules 설정 예시입니다:
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=[
        "vision_model.encoder.layers.*.self_attn.q_proj",
        "vision_model.encoder.layers.*.self_attn.k_proj",
        "vision_model.encoder.layers.*.self_attn.v_proj",
        "vision_model.encoder.layers.*.self_attn.out_proj",
        "visual_projection"
    ]
)
와일드카드 사용
target_modules에서 .*. 와일드카드를 사용하면 모든 레이어의 특정 모듈을 한꺼번에 타겟팅할 수 있습니다. 예를 들어, vision_model.encoder.layers.*.self_attn.q_proj는 모든 ViT 레이어의 q_proj 모듈을 타겟팅합니다.
선택할 타겟
이미지 중심 작업: vision_model 관련 모듈과 visual_projection만 타겟팅.
텍스트-이미지 상호작용: text_model 관련 모듈도 추가로 타겟팅.
이 정보를 바탕으로 LoRA 적용 시 적절한 타겟 모듈을 선택할 수 있습니다.

vit >> model figure


ViTForImageClassification(
  (vit): ViTModel(
    (embeddings): ViTEmbeddings(
      (patch_embeddings): ViTPatchEmbeddings(
        (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
      )
      (dropout): Dropout(p=0.0, inplace=False)
    )
    (encoder): ViTEncoder(
      (layer): ModuleList(
        (0-11): 12 x ViTLayer(
          (attention): ViTSdpaAttention(
            (attention): ViTSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
            (output): ViTSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
          (intermediate): ViTIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): ViTOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (dropout): Dropout(p=0.0, inplace=False)
          )
          (layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        )
      )
    )
    (layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  )
  (classifier): Linear(in_features=768, out_features=6, bias=True)
)

target_modules를 통해서 접근가능한 파라미터들

ViTForImageClassification 구조를 기반으로 LoRA의 target_modules에서 사용 가능한 파라미터는 모델 내부의 학습 가능한 레이어들입니다. 주로 Linear 레이어와 관련된 부분이 LoRA 적용의 타겟이 됩니다. 아래는 모델 구조에서 target_modules로 지정 가능한 주요 모듈입니다:
1. Attention 관련 모듈 (ViTSdpaSelfAttention)
attention.attention.query: Query 부분의 선형 변환
attention.attention.key: Key 부분의 선형 변환
attention.attention.value: Value 부분의 선형 변환
attention.output.dense: Attention 출력의 선형 변환
예시 target_modules 설정:
target_modules = [
    "vit.encoder.layer.*.attention.attention.query",
    "vit.encoder.layer.*.attention.attention.key",
    "vit.encoder.layer.*.attention.attention.value",
    "vit.encoder.layer.*.attention.output.dense",
]
2. Intermediate 및 Output 모듈 (Feed-Forward Network)
intermediate.dense: Feed-Forward 네트워크의 첫 번째 레이어
output.dense: Feed-Forward 네트워크의 두 번째 레이어
예시 target_modules 설정:
target_modules = [
    "vit.encoder.layer.*.intermediate.dense",
    "vit.encoder.layer.*.output.dense",
]
3. Patch Embedding Projection
vit.embeddings.patch_embeddings.projection: 입력 이미지를 패치로 변환하는 컨볼루션 레이어
예시 target_modules 설정:
target_modules = [
    "vit.embeddings.patch_embeddings.projection",
]
4. Classifier (선택적)
classifier: 분류 작업에서 최종 출력으로 사용하는 선형 변환
예시 target_modules 설정:
target_modules = [
    "classifier",
]
5. 와일드카드 활용
LoRA 설정에서 와일드카드를 사용하여 여러 레이어를 일괄적으로 타겟팅할 수 있습니다. 예를 들어, 모든 attention.query 모듈을 타겟팅하려면:
target_modules = [
    "vit.encoder.layer.*.attention.attention.query",
]
는 모든 레이어를 포함한다는 의미입니다.
전체 타겟 예시
다음은 주요 타겟 모듈을 모두 포함한 예시입니다:
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=[
        "vit.encoder.layer.*.attention.attention.query",
        "vit.encoder.layer.*.attention.attention.key",
        "vit.encoder.layer.*.attention.attention.value",
        "vit.encoder.layer.*.attention.output.dense",
        "vit.encoder.layer.*.intermediate.dense",
        "vit.encoder.layer.*.output.dense",
        "vit.embeddings.patch_embeddings.projection",
    ]
)
요약
Self-Attention: query, key, value, output.dense
Feed-Forward Network: intermediate.dense, output.dense
Patch Embeddings: projection
Classifier: classifier (선택적)
이 정보를 바탕으로 LoRA의 target_modules를 지정하면 됩니다.

[논문리뷰] LoRA: Low-Rank Adaptation of Large Language Models - 전생했더니 인공지능이었던 건에 대하여

최종 학습을 위한 파라미터, 옵티마이저 설정

target_modules

ViT의 경우 다음과 같은 파라미터를 가지고 있다

Self-Attention: query, key, value, output.denseFeed-Forward Network: intermediate.dense, output.densePatch Embeddings: projection

이 각각의 파라미터를 사용하는 방법은 아래와 같은데, 만약 명시적으로 표시하지 않을 경우 해당 이름을 가지고 있는 모든 레이어에 영향을 주니 주의하자

"vit.encoder.layer.*.attention.attention.query", "vit.encoder.layer.*.attention.attention.key", "vit.encoder.layer.*.attention.attention.value", "vit.encoder.layer.*.attention.output.dense", "vit.encoder.layer.*.intermediate.dense", "vit.encoder.layer.*.output.dense", "vit.embeddings.patch_embeddings.projection"

이 때 query, key, value의 경우는 attention 에서만 존재하므로 묵시적 표현이 가능하며, dense의 경우 attention과 그 밖의 layer에서도 존재하므로 명시적 표현을 하는 것이 좋다

optimizer

optimizer는 torch.optim 을 통해서 접근이 가능하다.

torch.optim — PyTorch 2.5 documentation
torch.optim — PyTorch 2.5 documentation
torch.optim is a package implementing various optimization algorithms.
설명은 위 주소에 있다.
우리가 사용하려는 것은 AdamW, Adam과는 다음의 차이점이 있다

💡

Adam

모멘텀 + 적응형 학습률의 특징을 잘 결합한 옵티마이저로, 기울기 평균과 기울기 제곱의 이동 평균을 동시에 계산한다. 이를 통해 학습률을 자동으로 조정하고, 높은 차원에서도 안정적이고 빠른 수렴을 가능하게 한다

AdamW

Adam의 Weight Decay 방식, 즉 가중치 감쇠를 적용하는 방식을 개선한 버전이다.

Adam의 경우 Weight Decay가 L2 정규화의 형태로 loss에 추가된다. 이는 옵티마이저가 업데이트할 떄 가중치를 감쇠시키는 방식으로, Weight Decay와 기울기 업데이트가 밀접하게 연관되어 있는 문제점이 있다. 즉, Weight Decay의 효과가 기울기와 학습률의 조정 과정에서 섞이게 된다

AdamW에서는 Weight Decay를 Loss 함수에 포함시키는 대신, 파라미터 업데이트 단계에서 독립적으로 적용한다. 이를 통해 Weight Decay가 단순히 규제 효과를 발휘할 수 있도록 하고, 기울기와 Weight Decay 간의 상호작용을 최소화하여, 더 일관된 최적화를 가능하게 한다. 이러한 방법은 특히 딥러닝 모델의 일반화 능력을 향상시키는 데 효과적이다.

ㅤ	ㅤ	ViT(with emoset)	ViT(with mone)	ViT(with crawling)	ViT(with emoset crawling)	ViT(with mone crawling)	ViT(with X)
Pretrain(PEFT)	loss	0.1 이하	0.1 이하	0.1 이하	ㅤ	ㅤ	-
ㅤ	LoRA config	r=8, alpha=16, lora_dropout=0.1, target_modules=k_proj, q_proj, v_proj, output.dense , bias="none"	ㅤ	ㅤ	ㅤ	ㅤ	-
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	-
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	-
ㅤ	prompt	ㅤ	-	ㅤ	ㅤ	ㅤ	-
train(full finetuning)	epoch	10	10	10	ㅤ	ㅤ	10
ㅤ	loss function	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	CrossEntropyLoss	ㅤ
ㅤ	optimizer	AdamW	AdamW	AdamW	AdamW	AdamW	ㅤ
ㅤ	prompt	ㅤ	-	ㅤ	ㅤ	ㅤ	ㅤ
test	random seed	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
output	accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	top-N accuracy	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ
ㅤ	tsne	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

본격적인 학습

각종 코드 확인


def setup_device():
    """설치된 GPU 확인 및 설정."""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print("Device:", device)
    return device

def merge_datasets(dataset1, dataset2):
    """두 데이터셋을 결합."""
    return dataset1 + dataset2

# grayscale to RGB scale !!!
def load_datasets(train_path, test_path):
    train_dataset = load_dataset(train_path, split="train").map(lambda x: {"image": x["image"].convert("RGB")})
    test_dataset = load_dataset(test_path, split="train").map(lambda x: {"image": x["image"].convert("RGB")})
    return train_dataset, test_dataset

def prepare_model(device, num_labels=6):
    """모델 및 이미지 프로세서 초기화."""
    processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True)
    model = AutoModelForImageClassification.from_pretrained(
        "google/vit-base-patch16-224",
        num_labels=num_labels,
        ignore_mismatched_sizes=True
    ).to(device)

    config = LoraConfig(
        r=8,
        lora_alpha=16,
        lora_dropout=0.1,
        target_modules=["query", "key", "value", "vit.encoder.layer.*.attention.output.dense"],
        bias="none"
    )
    model = get_peft_model(model, config)
    return model, processor

def create_dataloader(dataset, processor, batch_size=32, shuffle=True, num_workers=4):
    """DataLoader 생성."""
    def collate_fn(batch):
        images = [item['image'] for item in batch]
        labels = [item['label'] for item in batch]
        inputs = processor(images=images, return_tensors="pt")
        inputs['labels'] = torch.tensor(labels, dtype=torch.long)
        return inputs

    return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn, num_workers=num_workers)

def create_dataloader_with_augmentation(dataset, processor, batch_size=32, shuffle=True, num_workers=4):
    augmentation = transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(15),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
        transforms.ToTensor()
    ])

    def collate_fn(batch):
        images = [augmentation(item['image']) for item in batch]
        labels = [item['label'] for item in batch]
        inputs = processor(images=images, return_tensors="pt")
        inputs['labels'] = torch.tensor(labels, dtype=torch.long)
        return inputs

    return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn, num_workers=num_workers)

def add_noise_to_images(images, noise_level=0.1):
    """이미지에 랜덤 노이즈 추가."""
    noisy_images = []
    for img in images:
        img_np = np.array(img)
        noise = np.random.normal(0, noise_level, img_np.shape)
        noisy_img = np.clip(img_np + noise, 0, 255).astype(np.uint8)
        noisy_images.append(Image.fromarray(noisy_img))
    return noisy_images

# collate_fn 내부에서 적용
def collate_fn_with_noise(batch):
    images = add_noise_to_images([item['image'] for item in batch])
    labels = [item['label'] for item in batch]
    inputs = processor(images=images, return_tensors="pt")
    inputs['labels'] = torch.tensor(labels, dtype=torch.long)
    return inputs

def evaluate_model(model, data_loader, device, valid_label_indices):
    """모델 평가."""
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in data_loader:
            inputs = {k: v.to(device) for k, v in batch.items()}
            outputs = model(**inputs)
            _, preds = torch.max(outputs.logits, 1)

            for pred, label in zip(preds, inputs['labels']):
                if pred.item() in valid_label_indices:
                    if pred.item() == label.item():
                        correct += 1
                total += 1

    return 100 * correct / total

def save_top_models(epoch, accuracy, model, top_models, directory):
    """사용자 지정 디렉토리에 최고 성능 모델 저장."""
    os.makedirs(directory, exist_ok=True)
    model_filename = f"model_epoch_{epoch + 1}_accuracy_{accuracy:.2f}.pth"
    model_path = os.path.join(directory, model_filename)
    top_models.append((accuracy, model_path))
    top_models = sorted(top_models, key=lambda x: x[0], reverse=True)[:10]
    torch.save(model.state_dict(), model_path)
    if epoch % 10 == 0:
        print("\nTop 10 Models (by accuracy):")
        for i, (acc, path) in enumerate(top_models, 1):
            print(f"Rank {i}: Accuracy = {acc:.2f}%, Model Path = {path}")
    return top_models


def train_model(num_epochs, train_loader, test_loader, model, device, optimizer, criterion, valid_label_indices, directory):
    """모델 학습 루프."""
    top_models = []
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for batch in train_loader:
            optimizer.zero_grad()
            inputs = {k: v.to(device) for k, v in batch.items()}
            outputs = model(**inputs)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}")
        test_accuracy = evaluate_model(model, test_loader, device, valid_label_indices)
        print(f"Test Accuracy after Epoch {epoch+1}: {test_accuracy:.2f}%")
        top_models = save_top_models(epoch, test_accuracy, model, top_models, directory)
    return top_models

# 실행 메인 함수
## insert_this_code_below
    device = setup_device()
    train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K_MonetStyle", "xodhks/Children_Sketch")
    valid_label_indices = [0, 1, 2, 3, 4, 5]
    model, processor = prepare_model(device)

    ''' train dataset 변주 '''
    # combined_dataset = merge_datasets(train_dataset, test_dataset)
    # train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True)

    train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True)
    
    # train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True)
    
    # train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,  collate_fn=collate_fn_with_noise, num_workers=4)
    
    test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False)

    optimizer = AdamW(model.parameters(), lr=1e-4)
    criterion = nn.CrossEntropyLoss()

    # 사용자로부터 저장 디렉토리 입력 받기
    save_directory = input("Enter the directory name to save models: ")
    save_directory = os.path.join(os.getcwd(), save_directory)  # 실행 폴더 기준으로 디렉토리 생성

    top_models = train_model(
        num_epochs=100,
        train_loader=train_loader,
        test_loader=test_loader,
        model=model,
        device=device,
        optimizer=optimizer,
        criterion=criterion,
        valid_label_indices=valid_label_indices,
        directory = save_directory
    )
    print("Finished Training")

ViT with EmoSet118K


device = setup_device()
train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K", "xodhks/Children_Sketch")
valid_label_indices = [0, 1, 4, 5]
model, processor = prepare_model(device)

''' train dataset 변주 '''
# combined_dataset = merge_datasets(train_dataset, test_dataset)
# train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True)

train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,  collate_fn=collate_fn_with_noise, num_workers=4)

test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False)

optimizer = AdamW(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

# 사용자로부터 저장 디렉토리 입력 받기
save_directory = input("Enter the directory name to save models: ")
save_directory = os.path.join(os.getcwd(), save_directory)  # 실행 폴더 기준으로 디렉토리 생성

top_models = train_model(
    num_epochs=100,
    train_loader=train_loader,
    test_loader=test_loader,
    model=model,
    device=device,
    optimizer=optimizer,
    criterion=criterion,
    valid_label_indices=valid_label_indices,
    directory = save_directory
)
print("Finished Training")

result

epoch 14부터 loss가 0.1 이하로 유지됨. 이 이후의 모델 중 최대의 정답률을 보이는 모델을 저장 및 업로드


Device: cuda
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224 and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([6]) in the model instantiated
- classifier.weight: found shape torch.Size([1000, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch [1/100], Loss: 1.1820
Test Accuracy after Epoch 1: 38.86%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth
Epoch [2/100], Loss: 0.5871
Test Accuracy after Epoch 2: 39.69%
Epoch [3/100], Loss: 0.4978
Test Accuracy after Epoch 3: 40.06%
Epoch [4/100], Loss: 0.4388
Test Accuracy after Epoch 4: 39.87%
Epoch [5/100], Loss: 0.3893
Test Accuracy after Epoch 5: 39.59%
Epoch [6/100], Loss: 0.3452
Test Accuracy after Epoch 6: 38.86%
Epoch [7/100], Loss: 0.3038
Test Accuracy after Epoch 7: 38.12%
Epoch [8/100], Loss: 0.2666
Test Accuracy after Epoch 8: 38.95%
Epoch [9/100], Loss: 0.2313
Test Accuracy after Epoch 9: 38.49%
Epoch [10/100], Loss: 0.1964
Test Accuracy after Epoch 10: 38.49%
Epoch [11/100], Loss: 0.1664
Test Accuracy after Epoch 11: 39.23%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth
Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth
Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth
Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth
Rank 5: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth
Rank 6: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth
Rank 7: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth
Rank 8: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_6_accuracy_38.86.pth
Rank 9: Accuracy = 38.49%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_9_accuracy_38.49.pth
Rank 10: Accuracy = 38.49%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_10_accuracy_38.49.pth
Epoch [12/100], Loss: 0.1378
Test Accuracy after Epoch 12: 38.77%
Epoch [13/100], Loss: 0.1118
Test Accuracy after Epoch 13: 39.32%
Epoch [14/100], Loss: 0.0900
Test Accuracy after Epoch 14: 38.58%
Epoch [15/100], Loss: 0.0692
Test Accuracy after Epoch 15: 38.86%
Epoch [16/100], Loss: 0.0524
Test Accuracy after Epoch 16: 38.67%
Epoch [17/100], Loss: 0.0384
Test Accuracy after Epoch 17: 38.49%
Epoch [18/100], Loss: 0.0277
Test Accuracy after Epoch 18: 39.13%
Epoch [19/100], Loss: 0.0203
Test Accuracy after Epoch 19: 39.41%
Epoch [20/100], Loss: 0.0140
Test Accuracy after Epoch 20: 38.49%
Epoch [21/100], Loss: 0.0119
Test Accuracy after Epoch 21: 38.40%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth
Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth
Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth
Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth
Rank 5: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth
Rank 6: Accuracy = 39.32%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_13_accuracy_39.32.pth
Rank 7: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth
Rank 8: Accuracy = 39.13%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_18_accuracy_39.13.pth
Rank 9: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth
Rank 10: Accuracy = 38.86%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_1_accuracy_38.86.pth
Epoch [22/100], Loss: 0.0081
Test Accuracy after Epoch 22: 38.49%
Epoch [23/100], Loss: 0.0058
Test Accuracy after Epoch 23: 38.12%
Epoch [24/100], Loss: 0.0075
Test Accuracy after Epoch 24: 38.77%
Epoch [25/100], Loss: 0.0140
Test Accuracy after Epoch 25: 38.03%
Epoch [26/100], Loss: 0.0077
Test Accuracy after Epoch 26: 38.21%
Epoch [27/100], Loss: 0.0042
Test Accuracy after Epoch 27: 38.67%
Epoch [28/100], Loss: 0.0029
Test Accuracy after Epoch 28: 38.67%
Epoch [29/100], Loss: 0.0024
Test Accuracy after Epoch 29: 38.58%
Epoch [30/100], Loss: 0.0020
Test Accuracy after Epoch 30: 38.67%
Epoch [31/100], Loss: 0.0016
Test Accuracy after Epoch 31: 39.23%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth
Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth
Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth
Rank 4: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth
Rank 5: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth
Rank 6: Accuracy = 39.32%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_13_accuracy_39.32.pth
Rank 7: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_11_accuracy_39.23.pth
Rank 8: Accuracy = 39.23%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_31_accuracy_39.23.pth
Rank 9: Accuracy = 39.13%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_18_accuracy_39.13.pth
Rank 10: Accuracy = 38.95%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_8_accuracy_38.95.pth
Epoch [32/100], Loss: 0.0015
Test Accuracy after Epoch 32: 38.95%
Epoch [33/100], Loss: 0.0012
Test Accuracy after Epoch 33: 39.23%
Epoch [34/100], Loss: 0.0239
Test Accuracy after Epoch 34: 39.69%
Epoch [35/100], Loss: 0.0054
Test Accuracy after Epoch 35: 39.50%
Epoch [36/100], Loss: 0.0023
Test Accuracy after Epoch 36: 38.77%
Epoch [37/100], Loss: 0.0020
Test Accuracy after Epoch 37: 39.13%
Epoch [38/100], Loss: 0.0023
Test Accuracy after Epoch 38: 39.50%
Epoch [39/100], Loss: 0.0062
Test Accuracy after Epoch 39: 39.59%
Epoch [40/100], Loss: 0.0024
Test Accuracy after Epoch 40: 39.69%
Epoch [41/100], Loss: 0.0071
Test Accuracy after Epoch 41: 39.32%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth
Rank 2: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth
Rank 3: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth
Rank 4: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_34_accuracy_39.69.pth
Rank 5: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_40_accuracy_39.69.pth
Rank 6: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth
Rank 7: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_39_accuracy_39.59.pth
Rank 8: Accuracy = 39.50%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_35_accuracy_39.50.pth
Rank 9: Accuracy = 39.50%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_38_accuracy_39.50.pth
Rank 10: Accuracy = 39.41%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_19_accuracy_39.41.pth
Epoch [42/100], Loss: 0.0045
Test Accuracy after Epoch 42: 38.67%
Epoch [43/100], Loss: 0.0029
Test Accuracy after Epoch 43: 39.04%
Epoch [44/100], Loss: 0.0026
Test Accuracy after Epoch 44: 39.41%
Epoch [45/100], Loss: 0.0030
Test Accuracy after Epoch 45: 39.59%
Epoch [46/100], Loss: 0.0011
Test Accuracy after Epoch 46: 40.42%
Epoch [47/100], Loss: 0.0008
Test Accuracy after Epoch 47: 39.78%
Epoch [48/100], Loss: 0.0008
Test Accuracy after Epoch 48: 39.78%
Epoch [49/100], Loss: 0.0008
Test Accuracy after Epoch 49: 40.33%
Epoch [50/100], Loss: 0.0111
Test Accuracy after Epoch 50: 37.57%
Epoch [51/100], Loss: 0.0089
Test Accuracy after Epoch 51: 39.32%

Top 10 Models (by accuracy):
Rank 1: Accuracy = 40.42%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_46_accuracy_40.42.pth
Rank 2: Accuracy = 40.33%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_49_accuracy_40.33.pth
Rank 3: Accuracy = 40.06%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_3_accuracy_40.06.pth
Rank 4: Accuracy = 39.87%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_4_accuracy_39.87.pth
Rank 5: Accuracy = 39.78%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_47_accuracy_39.78.pth
Rank 6: Accuracy = 39.78%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_48_accuracy_39.78.pth
Rank 7: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_2_accuracy_39.69.pth
Rank 8: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_34_accuracy_39.69.pth
Rank 9: Accuracy = 39.69%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_40_accuracy_39.69.pth
Rank 10: Accuracy = 39.59%, Model Path = /home/rnjsxodhks/code/UGRP/ViT with emoset top models/model_epoch_5_accuracy_39.59.pth

model_epoch_46_accuracy_40.42.pth

337051.6KB

model_epoch_49_accuracy_40.33.pth

337051.6KB

model_epoch_3_accuracy_40.06.pth

337051.3KB

ViT with MonetStyle


device = setup_device()
train_dataset, test_dataset = load_datasets("xodhks/EmoSet118K_MonetStyle", "xodhks/Children_Sketch")
valid_label_indices = [0, 1, 4, 5]
model, processor = prepare_model(device)

''' train dataset 변주 '''
# combined_dataset = merge_datasets(train_dataset, test_dataset)
# train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True)

train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,  collate_fn=collate_fn_with_noise, num_workers=4)

test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False)

optimizer = AdamW(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

# 사용자로부터 저장 디렉토리 입력 받기
save_directory = "ViT with EmeSet MonetStyle top models"
save_directory = os.path.join(os.getcwd(), save_directory)  # 실행 폴더 기준으로 디렉토리 생성

top_models = train_model(
    num_epochs=100,
    train_loader=train_loader,
    test_loader=test_loader,
    model=model,
    device=device,
    optimizer=optimizer,
    criterion=criterion,
    valid_label_indices=valid_label_indices,
    directory = save_directory
)
print("Finished Training")

result

epoch 9부터 0.1 이하이므로 이 이후의 모델 중 제일 높은 정확도를 가진 모델들을 나열한다

model_epoch_21_accuracy_43.00.pth

337051.6KB

model_epoch_22_accuracy_42.17.pth

337051.6KB

model_epoch_42_accuracy_39.23.pth

337051.6KB

ViT with Crawling


device = setup_device()
train_dataset, test_dataset = load_datasets("xodhks/crawling-emotions-in-google-train", "xodhks/crawling-emotions-in-google-test")
valid_label_indices = [0, 1, 2, 3, 4, 5]
model, processor = prepare_model(device)

''' train dataset 변주 '''
# combined_dataset = merge_datasets(train_dataset, test_dataset)
# train_loader = create_dataloader(combined_dataset, processor, batch_size=32, shuffle=True)

train_loader = create_dataloader(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = create_dataloader_with_augmentation(train_dataset, processor, batch_size=32, shuffle=True)

# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,  collate_fn=collate_fn_with_noise, num_workers=4)

test_loader = create_dataloader(test_dataset, processor, batch_size=32, shuffle=False)

optimizer = AdamW(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# 사용자로부터 저장 디렉토리 입력 받기
save_directory = "ViT with Crawling top models"
save_directory = os.path.join(os.getcwd(), save_directory)  # 실행 폴더 기준으로 디렉토리 생성

top_models = train_model(
    num_epochs=100,
    train_loader=train_loader,
    test_loader=test_loader,
    model=model,
    device=device,
    optimizer=optimizer,
    criterion=criterion,
    valid_label_indices=valid_label_indices,
    directory = save_directory
)
print("Finished Training")

result

epoch 7 이후로 loss는 0.1 이하이다

model_epoch_10_accuracy_65.33.pth

337051.6KB

model_epoch_45_accuracy_64.67.pth

337051.6KB

model_epoch_15_accuracy_64.33.pth

337051.6KB