Portfolio

[Task Description]

Undergraduate Group Research Program으로 1년 동안 진행한 task는 classification입니다.

해당 연구에서 목적으로 삼은 것은 다음 두 가지입니다.

예술 치료 인공지능을 위한 그림 데이터 수집

수집한 그림 데이터를 잘 classification하는 모델 구현

[Reference Paper]

참고한 주요 핵심 논문들은 다음과 같습니다.

Dataset Ref

Emoset:

ICCV 2023 Open Access Repository

(감정 사진 데이터 수집 연구 논문)

PEFT Ref

LoRA:

arXiv.orgLoRA: Low-Rank Adaptation of Large Language Models

LoRA: Low-Rank Adaptation of Large Language Models

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full...

(LoRA 파라미터 효율 파인 튜닝 방식 제안)

[Dataset]

먼저 설문을 통해서 그림 데이터 셋을 수집해 보았습니다. docs.google.com

docs.google.com

해당 데이터는 학습에서 함께 사용하려고 했지만, 원하는 만큼 모이지 않아서 테스트에만 사용했습니다.

huggingfacexodhks/ugrp-survey-test · Datasets at Hugging Face

xodhks/ugrp-survey-test · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

학습 데이터 셋은 Emoset, CycleGan을 통한 Emoset Argumentation, Crawling data 세 가지를 사용하였습니다. 이 때 각 데이터 셋의 스타일이 달라서 학습에 서로 영향을 미칠 것 같았기 때문에 각 데이터에 대해서 따로 학습 시키고, 결과 비교를 해보았습니다.

huggingfacexodhks/EmoSet118K · Datasets at Hugging Face

xodhks/EmoSet118K · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingfacexodhks/EmoSet118K_MonetStyle · Datasets at Hugging Face

xodhks/EmoSet118K_MonetStyle · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingfacexodhks/crawling-emotions-in-google-train · Datasets at Hugging Face

xodhks/crawling-emotions-in-google-train · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

[Experiment]

학습은 parameter efficient fine tuning을 활용했습니다. 이유는 연구를 위해 주어진 GPU 서버가 없었고, 따라서 google colab이나 개인 노트북을 활용했어야 했기 때문입니다. 따라서 연산량이 낮은 파라미터 효율 전이 학습 방식을 택했습니다.

Base model은 CLIP, ViT, ResNet을 사용했습니다.

Pytorch를 사용하였습니다.

파라미터는 밑과 같이 세팅한 후 실험을 진행하였습니다.

학습은 colab 기준 4-5시간씩 소요되었습니다. (100iter)

[Result]

결과는 다음과 같았습니다. Chat GPT4o-mini 에서 보다(58.33%) 높은 결과를 얻을 수 있었습니다.

GPT 결과는 GPT API를 이용해서 구해 보았습니다.


import json
import requests
import openai
import base64  
from PIL import Image  # Image 클래스를 가져오는 부분
import os

# OpenAI API 키 설정
openai.api_key = 'sk-proj-...'

# 데이터셋과 레이블 목록 설정
dataset_path = r'G:\dataset\dataset_info.json'  # JSON 파일 경로 설정
images_base_path = r'G:\dataset'  # 이미지 기본 경로 설정

# JSON 파일에서 데이터 로드
with open(dataset_path, 'r', encoding='utf-8') as f:
    dataset = json.load(f)

correct_predictions = 0

# 데이터셋 반복 처리
for data in dataset:
    # 이미지 파일 경로 가져오기
    image_filename = data['image']
    image_path = os.path.join(images_base_path, image_filename)  # 이미지 전체 경로 생성
    label = data['label']
    
    # 이미지 로드 및 base64로 인코딩
    try:
        with Image.open(image_path) as img:
            img = img.convert("RGB")  # 이미지를 RGB로 변환
            img = img.resize((256, 256))  # 이미지 크기 조정 (256x256)
            img.save("temp_image.jpg", format="JPEG")  # 임시 JPEG 이미지로 저장
            with open("temp_image.jpg", "rb") as image_file:
                image_data = image_file.read()
                image_base64 = base64.b64encode(image_data).decode('utf-8')  # 이미지를 base64로 인코딩
    except FileNotFoundError:
        print(f"파일을 찾을 수 없습니다: {image_path}")
        continue  # 이미지가 없으면 다음으로 넘어감
    
    # API 요청
    possible_labels = ["Happiness", "Disgust", "Fear", "Sadness", "Surprise"]

    question = (
        f"Analyze the following image data and predict the emotion it represents. "
        f"Choose one of these labels only: {', '.join(possible_labels)}.\n\n"
        # f"Data: {data}\n\n"  # JSON 데이터 포함
        f"[IMAGE] {image_base64}\n\n"  # base64 인코딩된 이미지 추가
        "Provide only the label without any explanation: "
    )

    # OpenAI API 요청
    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {openai.api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "gpt-4o-mini",  # GPT 비전 모델 사용
            "messages": [
                {"role": "user", "content": question}
            ],
            "max_tokens": 50,
        }
    )

    # API 응답 출력
    # print(f"Response from API: {response.text}")  # 응답 출력

    # 예측된 라벨 추출
    if response.status_code == 200:
        answer = response.json()
        predicted_label = answer['choices'][0]['message']['content'].strip()
    else:
        print(f"API 요청 실패: {response.status_code} - {response.text}")
        continue

    # 예측이 맞는지 확인
    if predicted_label == label:
        correct_predictions += 1

# 임시 파일 삭제
if os.path.exists("temp_image.jpg"):
    os.remove("temp_image.jpg")

# 정확도 출력
accuracy = correct_predictions / len(dataset)
print(f"Model accuracy: {accuracy * 100:.2f}%")

또한 해당 프로젝트로 장려상도 수상하였습니다.

UGRP

[Task Description]

[Reference Paper]

[Dataset]

[Experiment]

[Result]