크롤링 데이터 추가 수집

보고서 내용 요약
연구 목적
데이터 수집하고, 이를 활용해 더 나은 모델 만들고 이를 LLM에 넣어보겠다
연구 내용
- DataSet
- emoset
- gan
- crawling
- 설문 조사
- Model
- ResNet
- ViT
- CLIP
- PEFT
- LoRA
- method
- param efficient pretraining 비슷한 field의
- full fine tuning with little data of our data set
- futureWorks
- LLM 활용
++ related work
모델 제작 및 test 결과
<결과 정리 table>
ㅤ | ㅤ | CLIP(with emoset) | CLIP(with mone) | CLIP(with crawling) | CLIP(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, visual_projection,
bias="none" | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | - |
ㅤ | prompt | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | - |
train(full finetuning) | epoch | 10 | 10 | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW |
ㅤ | prompt | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] | [f"This image likely represents an emotional expression. Considering the visual details and the intention behind the image, it seems to convey a sense of {label}." for label in possible_labels] |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 |
ㅤ | random seed | 42 | 42 | 42 | 42 |
output | accuracy | 61.54% | 61.54% | 61.54% | 61.54% |
ㅤ | tsne | O | O | O | O |
ㅤ | Silhouette Score | 0.5422 | 0.5637 | 0.6321 | 0.5056 |
ㅤ | loss | [2.0093, 2.8423, 1.0086, 1.0494, 1.7056, 1.5294, 0.9401, 0.6743, 0.5783, 1.7595] | [2.5119, 3.5481, 1.8837, 1.5159, 1.3483, 1.0847, 1.0692, 1.9154, 1.5516, 1.3077] | [1.9597, 1.9207, 1.4398, 1.9074, 1.5318, 1.4545, 1.2903, 1.2021, 1.1387, 1.0534] | [2.0093, 1.6136, 3.9201, 1.0728, 2.1742, 1.3275, 1.3556, 1.1783, 1.0104, 0.8990] |
t-sne for CLIP(with emoset)

t-sne for CLIP(with mone)

t-sne for CLIP(with crawling)

t-sne for CLIP(with X)

ㅤ | ㅤ | ViT(with emoset) | ViT(with mone) | ViT(with crawling) | ViT(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | r=8,
alpha=16,
lora_dropout=0.1,
target_modules=k_proj, q_proj, v_proj, output.dense ,
bias="none" | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | - |
ㅤ | prompt | - | - | - | - |
train(full finetuning) | epoch | 10 | 10 | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW |
ㅤ | prompt | - | - | - | - |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 |
ㅤ | random seed | 42 | 42 | 42 | 42 |
output | accuracy | ㅤ | ㅤ | ㅤ | 10.26% |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | O |
ㅤ | Silhouette Score | ㅤ | ㅤ | ㅤ | -0.0166 |
ㅤ | loss | ㅤ | ㅤ | ㅤ | [1.7635, 0.7548, 0.3577, 0.1841, 0.0988, 0.0578, 0.0375, 0.0264, 0.0197, 0.0153] → overfitting |
t-sne for ViT(with emoset)
not yet
t-sne for ViT(with mone)
not yet
t-sne for ViT(with crawling)
not yet
t-sne for ViT(with X)

ㅤ | ㅤ | ResNet50(with emoset) | ResNet50(with mone) | ResNet50(with crawling) | ResNet50(with X) |
Pretrain(PEFT) | loss | 0.2 이하 | 0.2 이하 | 0.2 이하 | - |
ㅤ | LoRA config | - | - | - | - |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | - |
ㅤ | optimizer | AdamW | AdamW | AdamW | - |
ㅤ | prompt | - | - | - | - |
train(full finetuning) | epoch | 10 | 10 | 10 | 10 |
ㅤ | loss function | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss | CrossEntropyLoss |
ㅤ | optimizer | AdamW | AdamW | AdamW | AdamW |
ㅤ | prompt | - | - | - | - |
ㅤ | train size | 0.2 | 0.2 | 0.2 | 0.2 |
test | random seed | 42 | 42 | 42 | 42 |
output | accuracy | ㅤ | ㅤ | ㅤ | 5.13% |
ㅤ | tsne | ㅤ | ㅤ | ㅤ | O |
ㅤ | Silhouette Score | ㅤ | ㅤ | ㅤ | 0.0581 |
ㅤ | loss | ㅤ | ㅤ | ㅤ | [1.7927, 1.7385, 1.6950, 1.6560, 1.6202, 1.5855, 1.5523, 1.5202, 1.4875, 1.4570] |
t-sne for ResNet50(with emoset)
not yet
t-sne for ResNet50(with mone)
not yet
t-sne for ResNet50(with crawling)
not yet
t-sne for ResNet50(with X)

LLM 추가 활용
