論文 – 行李の底に収めたり[YuWd]

【論文メモ】Hungry Hungry Hippos: Towards Language Modeling with State Space Models

📅 2023/3/7 · ☕ 6 min read

この度，SONY様のnnablaチャンネルにH3の解説動画を寄稿しました．本記事ではなく動画の視聴の方を推奨します．概要 ICLR23 状態空間モデル(state-space model; SSM)は様々なモダリティにおいて有用性が検証されてきたが，未だ言語系においては確認できていない．また，SSMは $\mathcal{O}(L)$であるにも拘ら ...

【論文メモ】Hungry Hungry Hippos: Towards Language Modeling with State Space Models

【論文メモ】LoRA: Low-Rank Adaptation of Large Language Models

📅 2023/2/12 · ☕ 1 min read

ICLR22 大規模モデルを高速かつ低消費メモリでfine-tuningする新たな手法 HypernetworksのようにTransformerの各層に学習可能なパラメタを挿入する (Adaptation層) しかし，重みを固定するにしてもAdaptation層を学習させるためにはGPUに載せないと意味ないので，結局時間が掛かってしまう ...

#論文

【論文メモ】LoRA: Low-Rank Adaptation of Large Language Models

【論文メモ】On the Versatile Uses of Partial Distance Correlation in Deep Learning

📅 2022/12/16 · ☕ 3 min read

はじめに ECCV22のbest paper https://twitter.com/eccvconf/status/1585560616688881664 #ECCV2022 Paper Awards pic.twitter.com/u9awGVCgSr — European Conference on Computer Vision (ECCV) (@eccvconf) October 27, 2022 概要二つのモデルの挙動を比較することは極めて重要しかし, それぞれが異なるアーキテクチャにおけるモデルの比較方法は依然として研究が不十分. そこで, この論文では(Partial) Distance Correlationを機械学習に応用する手法を提案. (Partial) Distance Correlation ...

#論文

【論文メモ】On the Versatile Uses of Partial Distance Correlation in Deep Learning

【論文メモ】Few-shot Relational Reasoning via Connection Subgraph Pretraining

📅 2022/11/16 · ☕ 4 min read

はじめに NeurIPS22 Few-shotにおける knowledge graph completion task を行う上図のように, Background KG (knowledge graph)とsupport setが与えられた状態で, Query setのrelationを推論するタスク Connection Subgraph Reasoner (CSR)を提案 Few-shot KG Completion KGは $\mathcal{G} = (\mathcal{E}, \mathcal{R}, \mathcal{T}) $で表されるここで, $\mathcal{E}, \mathcal{R}$はそれぞれentityとrelationで, $\mathcal{T ...

#論文

【論文メモ】Few-shot Relational Reasoning via Connection Subgraph Pretraining

【論文メモ】Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

📅 2022/11/2 · ☕ 8 min read

はじめに ICLR22 [paper] 深層学習において, 残差接続は不可欠な存在となりつつある残差接続により, より深い層数のNNを実現できるようになった残差接続に対する解釈の矛盾昨今の研究により残差接続は比較的浅い層をアンサンブルするような効果があるとの見方が強まっているしかし, 「深層」学習という名が体を表す通り, 一般には「層を増やす」ことがモデ ...

【論文メモ】Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

【論文メモ】Lifting the Curse of Multilinguality by Pre-training Modular Transformers

📅 2022/10/19 · ☕ 1 min read

NACCL22 多言語を扱うモデルにおいて, 言語の数を増やせば増やすほど精度が下がる「the curse of multilinguality」(多言語の呪い)という現象が存在するこの「多言語の呪い」を対処するモデルとしてX-MODを提案概略言語ごとにbottleneck型のモジュールを用意し, 言語ごとにスイッチさせるそれ故, 拡張は容易で, 学習・推 ...

#論文
#NLP

【論文メモ】Lifting the Curse of Multilinguality by Pre-training Modular Transformers

【論文メモ】SimCSE

📅 2022/10/18 · ☕ 1 min read

ENMLP21 Supervised SimCSE 含意関係にある文を正例として対照学習 NLIデータセット Unsupervised SimCSE 同じ文を二回埋め込んで対照学習 dropoutの影響で微かに異なる二つのベクトルに対して対照学習引用: https://www.slideshare.net/DeepLearningJP2016/dlsimcse-simple-contrastive-learning-of-sentence-embeddings-emnlp-2021 ...

【論文メモ】Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

📅 2022/9/24 · ☕ 4 min read

はじめに CVPR22 決定境界を描画し, 再現性と汎化性について定量的に考察した論文決定境界の描画 (領域の決定) 如何に決定境界を描画するかが重要になってくるその上でまず, データ多様体 $\mathcal{M}$の近傍(on-manifold)を取るのか, $\mathcal{M}$から離れた領域(off-manifold)を取るのかを考 ...

【論文メモ】Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective

【論文メモ】Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

📅 2022/9/19 · ☕ 2 min read

PMLR20 trainとtestで分布が違う場合の再学習手法TTT(Test-Time Training)を提案まずは普通に学習次にモデルを前半(A)と後半(B)に分けて, 元のA + 新しいB’ のモデルで自己教師あり学習を行う headを取っ替えるイメージ (B→B') このとき, testサンプルを使用して自己教師あり学習を行う ...

#論文

【論文メモ】Energy-Based Learning for Scene Graph Generation

📅 2022/9/19 · ☕ 3 min read

はじめに Energy Based Modelを用いて画像からscene graphを生成する手法(フレームワーク)を提案既存手法は次のようにクロスエントロピーでscene graphを生成する $$\log p(SG|I) = \sum_{i \in O} \log p(o_i| I) + \sum_{j \in R} \log p(r_j | I).$$ このとき, object $O$とrelation $R$が互いに独立に計算されているここが問題で, 本来なら互いに弱い依存性があるはずし ...

【論文メモ】Energy-Based Learning for Scene Graph Generation

【論文メモ】Your classifier is secretly an energy based model and you should treat it like one

📅 2022/8/28 · ☕ 1 min read

分類問題について, 生成モデルで用いられるEnergy Based Modelに基づいた学習手法を提案一般的な学習あるNNを $f_\theta(x)$とすると, 出力の $y$番目を $f_\theta(x)[y\rbrack$として, softmaxは以下のように表される $$p_{\theta}(y|{\bf x}) = \frac{\exp{\left(f_{\theta}({\bf x})[y\rbrack \right)} } { \sum_{y^{\prime}}\exp{\left(f_{\theta}({\bf x})[y^{\prime}\rbrack \right)} }$$ ここで, Energy Based Modelでは $$p_{\theta}(\boldsymbol{x},y) = \frac{\textrm{exp}(-E_{\theta}(\boldsymbol{x},y))}{Z_{\theta}}$$ と定義される ...

#論文

【論文メモ】Your classifier is secretly an energy based model and you should treat it like one

【論文メモ】MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

📅 2022/8/24 · ☕ 1 min read

CVPR22 タスク: KB-VQA 質問画像に含まれていない知識を要する質問に回答するタスク例えば, 以下のVQAでは, 外部知識=kawasakiを使わないと回答できない新規性知識グラフの構築は行わない scene graphを作るのではなく, 画像由来のHead Entity (領域画像)と, 言語由来のTail Entity (後述)について, (entity, relation, entity)のtripletを用い ...

【論文メモ】MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

【論文メモ】Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval

📅 2022/8/24 · ☕ 2 min read

Stanford Scene Graph Parserの論文 (ACL 2015) 一応, scene graphを自動化してimage retrievalできるようにしようという趣旨 https://nlp.stanford.edu/software/scenegraph-parser.shtml 流れ ①Universal Dependenciesを一部修正したものをsemantic graphとして生成 a lot of 等のquantificational modifiersの修正代名詞の解釈複数名詞への対応 → ノー ...