Vision-and-Language

How to create Matterport3D segmentation images?

📅 2022/11/24 · ☕ 1 min read

Intro The other day, one of my labmates needed to make a segmentation of Matterport3D. He asked for help, and I got involved in creating the segmentation. However, it turned out to be a real struggle. We were not used to 3D mesh models. After several weeks, we completed the code to create a semantic segmentation image for Matterport3D. How to create Matterport3D segmentation images Matterport3D provides access to 3D segmentation but does not give users an easy way to access 2D. Matterport3D data only provides point clouds and meshes labeled by ground truth, and the user must add color directly to the point clouds and meshes to create 2D segmentations. We, therefore, wrote code using Matterport3DSimulator to place a camera for a given scan_id and viewpoint_id and create a segmentation from the original ply file. When we run our code, we get the following image. (I concatenated the obtained images and converted to a gif) Matterport3DSimulator takes a total of 36 pictures: 12 at the top, 12 at the perimeter, and 12 at the bottom. ...

How to create Matterport3D segmentation images?

Peter Anderson

📅 2022/8/26 · ☕ 1 min read

すげえ人 SPICE SPICE: Semantic Propositional Image Caption Evaluation REVERIE REVERIE - Remote Embodied Visual Referring Expression in Real Indoor Environments Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Sim-to-Real Transfer for Vision-and-Language Navigation など, めちゃくちゃよく見る論文の著者今はGoogleにいるらしい ...

【論文メモ】MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

📅 2022/8/24 · ☕ 1 min read

CVPR22 タスク: KB-VQA 質問画像に含まれていない知識を要する質問に回答するタスク例えば, 以下のVQAでは, 外部知識=kawasakiを使わないと回答できない新規性知識グラフの構築は行わない scene graphを作るのではなく, 画像由来のHead Entity (領域画像)と, 言語由来のTail Entity (後述)について, (entity, relation, entity)のtripletを用い ...

【論文メモ】MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

【論文メモ】Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval

📅 2022/8/24 · ☕ 2 min read

Stanford Scene Graph Parserの論文 (ACL 2015) 一応, scene graphを自動化してimage retrievalできるようにしようという趣旨 https://nlp.stanford.edu/software/scenegraph-parser.shtml 流れ ①Universal Dependenciesを一部修正したものをsemantic graphとして生成 a lot of 等のquantificational modifiersの修正代名詞の解釈複数名詞への対応 → ノー ...

【論文メモ】SPICE: Semantic Propositional Image Caption Evaluation

📅 2022/8/16 · ☕ 1 min read

評価指標SPICEの論文 (ECCV 2016) BLEUなどはn-gramの重なりにsensitiveで, 真の意味でsemanticsを評価しているとは言えないそこで, scene graphを用いた評価指標SPICEを提案実際, 画像キャプショニングモデルではよく見かける指標となってきた流れ ① 複数キャプションからscene graphを生成 scene graph ...

【論文メモ】SPICE: Semantic Propositional Image Caption Evaluation

日本語キャプションデータセット

📅 2022/8/15 · ☕ 1 min read

STAIR MSCOCOにキャプションを付与全部で820,310件のキャプション http://captions.stair.center/ Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi, “STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset”, Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper, 2017. YJ Captions 26k Dataset こちらもMSCOCOにキャプションを付与したもので, ACL2016 キャプション数がSTAIRの1/6程度 https://github.com/yahoojapan/YJCaptions Takashi Miyazaki and Nobuyuki Shimizu. 2016. Cross-Lingual Image Caption Generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1780 ...

【論文メモ】OTTER: Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation

📅 2022/8/10 · ☕ 1 min read

モチベーション CLIPは単位行列を教師として学習する → バッチ内の負例同士にゆるい相関があった場合, 負例を全て0として学習するのは違うよね → 最適輸送問題を解いたものを教師として活用しよう OTTER (Optimal TransporT distillation for Efficient zero-shot Recognition) を提案 Prototypical Contrastive Learning of Unsupervised Representationsと若干同じ感じ loss InfoNCEを拡張して $$\mathcal{L}_v = -\frac{1}{N} \sum_{i=1}^N \sum_{j=1}^N [\alpha I_{ij} + (1-\alpha) M^{v}_{ij}\rbrack \log p_v(\mathbf{z}_i^v, \mathbf{z}_j^t;\tau)$$ とするイ ...

【論文メモ】OTTER: Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation

【論文メモ】Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

📅 2022/7/25 · ☕ 1 min read

通常のV&Lモデルでは, 画像のバックボーンネットワークは言語特徴量を使用しないそのようなモデルでは, 「画像にりんごはいくつあるか？」などといったVQAタスクすら解けない(可能性が高い) そこで, SwinTransformerを拡張し, 各ステージで言語特徴量をspatial / channel方向にmixしながら推論し ...

【論文メモ】Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding

【論文メモ】BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

📅 2022/7/25 · ☕ 1 min read

提案手法は主に２つの機構で構成される Multimodal mixture of Encoder-Decoder (MED) Captioning and Filtering (CapFilt): CLIPの使用するデータセットはnoisy なので, キャプションの取捨選択を自動で行う機構を導入流れノイズを含む元のデータセットでMEDを学習事前学習されたMEDを用いてCapFiltを実行 CapFiitによって得られたデータセットを用いて再度MEDを学習 MED Image-TextContrastiveLoss(ITC) 画像特徴 ...

【論文メモ】BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

【論文メモ】Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

📅 2022/7/7 · ☕ 1 min read

VLN-DUET 概要 localな情報とグラフを用いたglobalな情報の両方を統合してactionを決定する actionが決定されたら, Graphを動的に構築して, 移動先までの最短経路をワーシャルフロイドで探索各ノードには, viewから得られた特徴量を埋め込み表現として保持する行動 $a^\pi$は各ノードへの尤度によって表現され, ノ ...

【論文メモ】Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

【論文メモ】REVERIE - Remote Embodied Visual Referring Expression in Real Indoor Environments

📅 2022/6/26 · ☕ 0 min read

...

【論文メモ】REVERIE - Remote Embodied Visual Referring Expression in Real Indoor Environments

Matterport3DSimulatorをCUDA11.1で動かす

📅 2022/6/25 · ☕ 1 min read

Matterport3DSimulatorをCUDA11.1で動かすDockerfile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 FROMnvcr.io/nvidia/pytorch:19.05-py3FROMphp:7.1.9-apacheFROMnvidia/cuda:11.1-cudnn8-devel-ubuntu18.04RUN rm /etc/apt/sources.list.d/cuda.listRUN rm /etc/apt/sources.list.d/nvidia-ml.listRUN apt-key del 7fa2af80RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pubRUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pubRUN apt-get updateRUN apt-get -y upgradeRUN apt-get -y install nano wget curl# ONNX Runtime Training Module for PyTorch# Copyright (c) Microsoft Corporation. All rights reserved.# Licensed under the MIT License.ARG TORCH_CUDA_VERSION=cu111 ARG TORCH_VERSION=1.8.1ARG TORCHVISION_VERSION=0.9.1# Install and update tools to minimize security vulnerabilitiesRUN apt-get updateRUN apt-get install -y software-properties-common wget apt-utils patchelf git libprotobuf-dev protobuf-compiler cmake RUN unattended-upgradeRUN ...