HomeAboutMeBlogGuest
© 2025 Sejin Cha. All rights reserved.
Built with Next.js, deployed on Vercel
장지원 페이지/
DGIST CV lab (My page)
DGIST CV lab (My page)
/
📔
Notes
/
Did: 맘바 포기.. RNN으로 코드 다시 짜기

Did: 맘바 포기.. RNN으로 코드 다시 짜기

날짜
Feb 5, 2025
상태
완료
선택
구현
CUDA 설치 (11.1)
나머지는 DVIS install.md와 동일하게 설치 (detectron도 clone 하지 않고, github 따름. pip install pillow==8.4.0/pip install numpy==1.19.5 버전 다운그레이드만 따로 추가)
 
 
→ layer를 유지한 채 RNN 통과 시키기?
 

문제 원인 (예상)
  • 너무 정보 압축
 
수정 버전
# TD block output outputs = torch.stack(outputs, dim=0) # (t, l, q, b, c) outputs_class, outputs_masks = self.prediction(outputs, mask_features) outputs = self.decoder_norm(outputs) temp_outputs = outputs out = { 'pred_logits': outputs_class[-1].transpose(1, 2), # (b, t, q, c) 'pred_masks': outputs_masks[-1], # (b, q, t, h, w) 'aux_outputs': self._set_aux_loss(outputs_class, outputs_masks), 'pred_embds': outputs[:, -1].permute(2, 3, 0, 1) # (b, c, t, q) } frames, layers, query_len, batch_size, embedding_len = outputs.shape temp_data = outputs[:, -1, :, :, :].reshape(batch_size, frames, query_len * embedding_len) reduced_data = self.dim_reducer(temp_data) input_data = reduced_data.reshape(batch_size, frames, 2560) # (b, t, cq) rnn_outputs, _ = self.rnn_block(input_data) rnn_outputs = rnn_outputs.permute(1, 0, 2) # (t, b, 1000) expanded_outputs = self.dim_expander(rnn_outputs) expanded_outputs = expanded_outputs.reshape(frames, batch_size, query_len, embedding_len) # (t, b, q, c) rnn_outputs = expanded_outputs.permute(0, 3, 2, 1) # (t, c, q, b) outputs = outputs[:, -1] # (t, q, b, c) # Final cross attention rnn_outputs = rnn_outputs.permute(3, 2, 0, 1) # (b, q, t, c) rnn_outputs = rnn_outputs.reshape(batch_size * frames, query_len, embedding_len) outputs = outputs.permute(2, 0, 1, 3) # (b, t, q, c) outputs = outputs.reshape(batch_size * frames, query_len, embedding_len) final_outputs = self.final_cross_attention_layer( rnn_outputs, outputs, memory_mask=None, memory_key_padding_mask=None, pos=None, query_pos=None, scale_factor = 0.5, ) final_outputs = final_outputs.reshape(batch_size, frames, query_len, embedding_len) # (b, t, q, c) final_outputs = final_outputs.permute(1, 2, 0, 3) # (t, q, b, c) final_outputs = final_outputs.unsqueeze(1) # (t, 1, q, b, c) final_outputs = torch.cat([temp_outputs, final_outputs], dim=1) # (t, l+1, q, b, c) # Final prediction heads final_class, final_masks = self.prediction(final_outputs, mask_features) final_outputs = self.decoder_norm(final_outputs) final_out = { 'pred_logits': final_class[-1].transpose(1, 2), # (b, t, q, c) 'pred_masks': final_masks[-1], # (b, q, t, h, w) 'aux_outputs': self._set_aux_loss(final_class, final_masks), 'pred_embds': final_outputs[:, -1].permute(2, 3, 0, 1) # (t, b, c, q) } if return_indices: return final_out, ret_indices else: return final_out
yvis old
[02/10 19:34:31 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl,AR1,AR10 [02/10 19:34:31 d2.evaluation.testing]: copypaste: 49.7782,75.1427,52.1098,25.3222,45.3111,65.5951,47.1885,57.1392
yvis new
[02/11 04:49:45 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl,AR1,AR10 [02/11 04:49:45 d2.evaluation.testing]: copypaste: 48.4620,72.2517,52.5028,26.9492,45.3930,63.1768,46.7606,55.6732