Home AboutMe Blog Guest

© 2025 Sejin Cha. All rights reserved.

Built with Next.js, deployed on Vercel

장지원 페이지/

DGIST CV lab (My page)/

Ideation (4)

Ideation (4)

날짜

Jan 8, 2025

텍스트

only tracker 구조

notion image

각 frame query들은 TD(transformer denoising) block과 Mamba block의 input으로 들어간다.

TD block은 인접한 frame들을 처리한다.

Mamba block은 전체적인 맥락은 처리한다.

TD block과 Mamba block은 학습 시에 서로 영향을 주지 않게 학습한다.

각 frame마다 frame token과 context token을 cross attention해서 mask와 class를 online에서 처리한다.

→ prediction cell을 만들어서 뒷 context도 고려할 수 있게 해보자. 지금은 context token이 현재 frame ‘이 전’ context만 담고 있기 때문에 아직 한계가 있다.

notion image

각 frame query들은 TD(transformer denoising) block과 Mamba block의 input으로 들어간다.

TD block은 인접한 frame들을 처리한다.

Mamba block은 전체적인 맥락은 처리한다.

TD block과 Mamba block은 서로 영향을 주지 않는다.

Mamba block은 prediction cell에 의해서 video 뒷 부분의 context도 고려한 context token을 출력한다.

각 frame마다 frame token과 context token을 cross attention해서 mask와 class를 online에서 처리한다.

notion image