[R] Self-attention Does Not Need $O(n^2)$ Memory https://arxiv.org/abs/2112.05682 /r/MachineLearning https://redd.it/rfs5kq 36 views12:14