Posts

WL test

📅 2022/7/28 · ☕ 1 min read

引用: https://davidbieber.com/post/2019-05-10-weisfeiler-lehman-isomorphism-test/ 正式名称: The Weisfeiler-Lehman Isomorphism Test グラフが同型であるかチェックするアルゴリズム各ノード $i$に適当なラベル $C_i = 1$を割り当てる隣接するノードの多重集合 $L_i$をノードに記録する多重集合 $L_i$をハッシュに通して新たな $C_i$を得る ( $C_i \leftarrow hash(L_i)$) 以上を繰り返して, ノードの分割 ${C_i}$が収束したら停止２つのグラフが[* 同じ $ ...

【論文メモ】When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

📅 2022/7/28 · ☕ 1 min read

Attentionはglobalでdynamic dynamicについては On the Connection between Local Attention and Dynamic Depth-wise Convolution しかし global→SwinTransformerを見るとそこまでViTの精度に関係なさそう dynamic→MLP-Mixerを見ると, MLPはstaticなので精度に関係なさそうそこでShiftViTを提案上図のように, 入力の ...

#論文

【論文メモ】When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

p4m群

📅 2022/7/26 · ☕ 1 min read

任意の並進操作 + 任意の90度回転操作を元とする集合が群であるとき, p4群と呼ぶさらに鏡映操作についても群ならばp4m群と呼ぶ一般にpn群は回転対称数が360°/n 回であり, 鏡映対称性が成り立つならばsuffixにmが付く ...

なぜerrnoが必要か

📅 2022/7/25 · ☕ 1 min read

例えばfopenなど, そもそも構造体やポインタを返すようなものだと, エラーハンドリングがしにくいじゃあ常にタプルっぽく返せばいいんじゃない？エラーハンドリングが必要ない場合, 普通のCだと非常に煩雑になり得るメモリの解放とかめんどいしそこで, グローバルなerrnoが設計された現在の多くの言語ではタプルを返すことができるの ...

#C
#post

【論文メモ】BoxInst: High-Performance Instance Segmentation with Box Annotations

📅 2022/7/25 · ☕ 1 min read

Instance SegmentationをBBOXのみで学習するモデルを提案 BBOXのみで学習するのでアノテーションが必要ないのが利点新たな損失を提案 Projection Loss Pairwise affinity Loss todo ...

#論文

【論文メモ】Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

📅 2022/7/25 · ☕ 1 min read

通常のV&Lモデルでは, 画像のバックボーンネットワークは言語特徴量を使用しないそのようなモデルでは, 「画像にりんごはいくつあるか？」などといったVQAタスクすら解けない(可能性が高い) そこで, SwinTransformerを拡張し, 各ステージで言語特徴量をspatial / channel方向にmixしながら推論し ...

【論文メモ】Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

ReferItGame

📅 2022/7/25 · ☕ 1 min read

画像-参照表現におけるデータセット割と大きいデータセットみたい the game has produced a dataset containing 130,525 expressions, referring to 96,654 distinct objects, in 19,894 photographs of natural scenes. ゲーム形式でアノテーションされるアノテータは二人二人でアノテーションを行うまずプレイヤーAがキャプションを考える次にもうひとりのプレイヤーBがそのキャプションが正しいかを当てる BはAのキャプションが指している物体をクリ ...

PCA Color Augmentation (PCACA)

📅 2022/7/25 · ☕ 1 min read

AlexNetで使われたらしいData augumentation手法そんなに使われてるイメージはない. 古代の手法？？ Fancy PCA / PCACAとも言うらしい？(要出典) 画像中の色の分布を考慮したデータ拡張ができる例えば, 明るいところは明るく, 暗いところは暗く調節できる流れは簡単 $C\times H\times W$をflattenして, $C\times HW$にする各チャネ ...

【論文メモ】BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

📅 2022/7/25 · ☕ 1 min read

提案手法は主に２つの機構で構成される Multimodal mixture of Encoder-Decoder (MED) Captioning and Filtering (CapFilt): CLIPの使用するデータセットはnoisy なので, キャプションの取捨選択を自動で行う機構を導入流れノイズを含む元のデータセットでMEDを学習事前学習されたMEDを用いてCapFiltを実行 CapFiitによって得られたデータセットを用いて再度MEDを学習 MED Image-TextContrastiveLoss(ITC) 画像特徴 ...

【論文メモ】BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Perspective-n-Point問題

📅 2022/7/23 · ☕ 1 min read

世界座標系における3D点群と, それらに対応する2D画像が与えられた場合において, カメラのポーズ推定を行う問題カメラのポーズは平行移動と回転の6DOFで, Perspective-n-Point問題はPnPと略されることが多い P3Pは最低三点あれば解ける一般化されたPnPを解くアルゴリズムは様々ある EPnP SQPnP: A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point ...

#CV
#post

SQPnP: A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem

📅 2022/7/23 · ☕ 1 min read

ECCV2020 ...

#CV
#post

【論文メモ】Large-Scale Adversarial Training for Vision-and-Language Representation Learning

📅 2022/7/21 · ☕ 1 min read

各モダリティについて摂動を加えて学習 ...

#論文

【論文メモ】Large-Scale Adversarial Training for Vision-and-Language Representation Learning

【論文メモ】On the Connection between Local Attention and Dynamic Depth-wise Convolution

📅 2022/7/18 · ☕ 2 min read

AttentionとDepthwise-Conv(DwConv)は似ているよ, という論文上図は画像をflatten or patchifyしたものがspatial方向であると捉えればOK (a): 畳み込みある区間の画素値と複数チャネルを使って一つの埋め込みを生成 (c): DepthWise と local attention ある一つのチャネルに対して, 区間の画素値のみから生成 (Poin ...

#論文