Home AboutMe Blog Guest

© 2025 Sejin Cha. All rights reserved.

Built with Next.js, deployed on Vercel

장지원 페이지/

DGIST CV lab (My page)/

Did: VOS 관련 논문 리뷰 및 트랜스포머 이해

Did: VOS 관련 논문 리뷰 및 트랜스포머 이해

날짜

Jan 2, 2025 → Jan 8, 2025

상태

완료

선택

논문 리뷰

TODO:

22일 맘바, VOS 발표

14일 구현

논문 Tip site:

zotero

VOS란?:

VOS → indexing이 핵심인 과제

TODO:

트랜스포머 이해하기

선배님이 보내주신 논문 읽어보기

논문 읽을 때는 method 부분 까지만

월요일까지 읽어보기 (사실은 수요일)

DETR

loss

transfomer

object quries / cross attention

Mask2Former

~~bounding box~~ → mask

mask attention

high resolution processing → pyramid / round robin

random sampling → matching / final loss

MinVIS

temporal consistency

~~per-clip~~ → per-frame

query matching (cos 유사도 + 헝가리안 매칭)

cls/mask loss

VITA → offline?

per-clip method based with per-frame method

object encoder → window attention

object decoder → token을 활용해서 video query 학습

loss → frame / video 함께 처리

DVIS → online?

segmentation/tracking/refining

referring tracker → 헝가리안 매칭 ⇒ TB block(denoising(RCA**))

temporal refiner → temporal decoder(long term temporal self attention / short term temporal conv block)

ToDo

맘바 논문 리뷰

query frame 사용하는 Vim 기반 segmentator 찾아보기(SegMamba) → 질문: 고해상도 이미지는 항상 좋은 결과?

video 단위에서 mamba를 적용하려면 per-clip(offline) 방식이 적합하지 않을까?!