This page looks best with JavaScript enabled

【論文メモ】Perceiver: General Perception with Iterative Attention

 ·  ☕ 1 min read


1
2
3
4
5
    scales = torch.linspace(1., max_freq / 2, num_bands, device = device, dtype = dtype)
    scales = scales[(*((None,) * (len(x.shape) - 1)), Ellipsis)]
    x = x * scales * pi
    x = torch.cat([x.sin(), x.cos()], dim = -1)
    x = torch.cat((x, orig_x), dim = -1)

We use a parameterization of Fourier features that allows us to (i) directly represent the position structure of the input data (preserving 1D temporal or 2D spatial structure for audio or images, respectively, or 3D spatiotemporal structure for videos), (ii) control the number of frequency bands in our position encoding independently of the cutoff frequency, and (iii) uniformly sample all frequencies up to a target resolution. We parametrize the frequency encoding to take the values (sin(fkπxd), cos(fkπxd)), where the frequency fk is the k th band of a bank of frequencies spaced equally between 1 and µ 2 . µ 2 can be naturally interpreted as the Nyquist frequency (Nyquist, 1928) corresponding to a target sampling rate of µ.

Share on

YuWd (Yuiga Wada)
WRITTEN BY
YuWd (Yuiga Wada)
機械学習・競プロ・iOS・Web