Portfolio

[Link]

[Code]

작업 디렉토리 활성화


import os


# in case we're running on colab the code won't be available
# so we check it out
if not os.path.exists("maws"):
    os.system("git clone https://github.com/facebookresearch/maws")
    os.chdir("maws")
    %pip install timm

os를 import해서 현재 디렉토리를 maws로 바꾼다.

model 아키텍쳐들이 있는 timm을 install 한다.

모델 생성


from maws.model_builder import build_model
from maws.utils import start_inference_mode, predict_probs_and_plot

maws 모델 빌드

inferencing 함수 가져오기


start_inference_mode(device="cpu")

inferencing할 때 사용할 디바이스 입력


# create a base model, fastest to run but least accurate
clip_model = build_model("vit_b16_xlmr_b", "maws_clip")

# create a larger but slower models
# clip_model = build_model("vit_l16_xlmr_l", "maws_clip")
# clip_model = build_model("vit_h14_xlmr_l", "maws_clip")
# clip_model = build_model("vit_2b14_xlmr_l", "maws_clip")

_ = clip_model.eval()

모델 생성

추론

결과


image_path = (
    "https://images.pexels.com/photos/3397939/pexels-photo-3397939.jpeg"
    # "https://images.pexels.com/photos/3608263/pexels-photo-3608263.jpeg"
    # "https://images.pexels.com/photos/11873002/pexels-photo-11873002.jpeg"
    # "https://images.pexels.com/photos/70955/pexels-photo-70955.jpeg"
)
display(clip_model.get_cropped_images(image_path))

get_cropped 함수로 이미지 전처리


# English
texts = [
    "a dog",
    "a cat",
    "a panda",
    "a truck",
    "a plane",
]

predict_probs_and_plot(clip_model, image_path, texts, plot_image=True)

결과 출력

[CLIP]

maws는 clip 모델 구현을 제공하는 것…

CLIP

Contrastive Language–Image Pre-training

이미지와 텍스트를 동일한 임베딩 공간으로 mapping하여 이미지와 텍스트 간의 유사성을 측정할 수 있는 모델

clip class의 variable

vision_encoder: 이미지 인코더 text_encoder: 텍스트 인코더 text_tokenizer: nn.Module: 텍스트 tokenizer → 텍스트 분리 embed_dim: int: 동일한 공간에 매핑하기 위한 차원 vision_encoder_width: Optional[int]: text_encoder_width: Optional[int]:

⇒ CoCa랑 이거 모두 clip이용한 것

#16. MAWS 모델 돌려보기 (O)

[Link]

[Code]

[CLIP]