【論文メモ】LoRA: Low-Rank Adaptation of Large Language Models

📅 2023/2/12 · ☕ 1 min read

ICLR22
大規模モデルを高速かつ低消費メモリでfine-tuningする新たな手法
HypernetworksのようにTransformerの各層に学習可能なパラメタを挿入する (Adaptation層)
- しかし，重みを固定するにしてもAdaptation層を学習させるためにはGPUに載せないと意味ないので，結局時間が掛かってしまう
- そこで，新たな学習手法としてLoRAを提案
LoRAでは，重み $W \in R^{d \times k}$ の差分 $Δ W$ を学習させる．
また $Δ W$ を低ランクの行列 $B \in R^{d \times r}$ と $A \in R^{r \times k}$ に分解する． (どちらも学習対象)
$y = (W + Δ W) x = (W + B A) x$
encoder-decoderのイメージに近い
- 特定タスクに関して言えば，重みに無駄なものがあるだろうという推測
- When adapting to a specific task, Aghajanyan et al. (2020) shows that the pre-trained language models have a low “instrisic dimension” and can still learn efficiently despite a random projection to a smaller subspace.
- (Intrinsic dimensionのtypoでは…)
別の利点として，重みを全取っ替えしないで良いので，タスクごとに $B A$ だけを差し替えるだけで簡単にモデルを切り替えることができる．
Another benefit is that we can switch between tasks while deployed at a much lower cost by only swapping the LoRA weights as opposed to all the parameters.

著者

YuWd (Yuiga Wada)

機械学習・競プロ・iOS・Web

関連記事