Posts

Attention

📅 2022/4/30 · ☕ 1 min read

Attentionは２つに大別される Self-Attention SourceTarget-Attention ↓ 引用 : https://www.arithmer.co.jp/post/20210413 ...

【論文メモ】CMO

📅 2022/4/27 · ☕ 1 min read

不均衡データに有効なaugmentation手法であるCMOを提案 Influenced-Balanced Loss と同じ著者 ...

#論文

ICLR 2022 普通のTransformer $$Attention(Q, K, V)_i = \frac{\sum_{j=1}^n\exp(q_i^Tk_j)\cdot v_j}{\sum_{j=1}^n\exp(q_i^Tk_j)}$$ expが括り出せれば, iとjとで分離できる → Linear Attention: Transformers are RNNs Attentionにおけるsoftmaxの重要な特性 Attention Matrix $A$が非負であること ReLUの場合を考えてみると, 負の値を0とすることで, 不要な値・誤った情報を掻き消すことができる非線形な重み付け ReLUよりもsoftmaxのほ ...

#論文

PyTorch 高速化

📅 2022/4/22 · ☕ 1 min read

https://qiita.com/sugulu_Ogawa_ISID/items/62f5f7adee083d96a587#31-ampautomatic-mixed-precision機能について ...

Automatic Mixed Precision

📅 2022/4/22 · ☕ 1 min read

float16とfloat32を混ぜて(Mixed)計算することで, GPUのMEM使用率を抑えることができる計算スピードも幾分速くなるらしい略してamp https://qiita.com/Sosuke115/items/40265e6aaf2e414e2fea https://tawara.hatenablog.com/entry/2021/05/31/220936 ...

Huber loss

📅 2022/4/21 · ☕ 1 min read

外れ値に強く, MSEよりもロバスト性が高い ...

#loss
#post

torch.flattenの方向

📅 2022/4/21 · ☕ 1 min read

#PyTorch ...

#post

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

📅 2022/4/18 · ☕ 1 min read

Decoupling Representation and Classifier for Long-Tailed Recognition を引用新規性は以下の２つ Adaptive Calibration Function Alignment with Generalized Re-weighting Adaptive Calibration Function 分類器の出力 $\boldsymbol{z}$を線形変換して重み付け + marginを加える Alignment with Generalized Re-weighting targetの確率に重み付け https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Distribution_Alignment_A_Unified_Framework_for_Long-Tail_Visual_Recognition_CVPR_2021_paper.pdf ...

#post

iBOT

📅 2022/4/18 · ☕ 1 min read

BeiTと同様にトークンベース ...

#post

SimSiam

📅 2022/4/17 · ☕ 1 min read

EMアルゴリズムとの関連 ↓ どうやらEMアルゴリズムと深い関係があるらしいことが論文中にも書いてある https://speakerdeck.com/sansandsoc/simsiam-exploring-simple-siamese-representation-learning?slide=17 ...

#post

【論文メモ】Double Descent

📅 2022/4/15 · ☕ 1 min read

U字からlossが落ちていく減少例えばシンプルな構造のニューラルネットワークと複雑なニューラルネットワークがあったとします。前者については従来から言われているように"under-fitting"と"over-fitting"からなるU字型の特性が観測できますが、後者は複雑にしてい ...

#論文

warmup

📅 2022/4/14 · ☕ 1 min read

MomentumやAdamといった移動平均を使うオプティマイザーなら、移動平均を取るための勾配の蓄積が足りないと, 学習の初期段階において値の信頼度が低い（よって変な値が出て精度を損ねる）ということも考えられます。 https://qiita.com/omiita/items/d24568a835da6911b01e ...

#post

学習率

📅 2022/4/14 · ☕ 1 min read

cosアニーリング warm-restart cyclical-learning rate バッチサイズと深い関係がある学習率の決め方 https://www.slideshare.net/TakujiTahara/20190713-kaggle-tokyo-meetup-lt-nn-no-gokigentori-tawara-155334755 ...

#post

Posts

Attention

📅 2022/4/30 · ☕ 1 min read

【論文メモ】CMO

📅 2022/4/27 · ☕ 1 min read

【論文メモ】cosFormer

📅 2022/4/24 · ☕ 1 min read

PyTorch 高速化

📅 2022/4/22 · ☕ 1 min read

Automatic Mixed Precision

📅 2022/4/22 · ☕ 1 min read

Huber loss

📅 2022/4/21 · ☕ 1 min read

torch.flattenの方向

📅 2022/4/21 · ☕ 1 min read

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

📅 2022/4/18 · ☕ 1 min read

iBOT

📅 2022/4/18 · ☕ 1 min read

SimSiam

📅 2022/4/17 · ☕ 1 min read

【論文メモ】Double Descent

📅 2022/4/15 · ☕ 1 min read

warmup

📅 2022/4/14 · ☕ 1 min read

学習率

📅 2022/4/14 · ☕ 1 min read