Sparse & Flash Attention, Quantisation, Pruning, Distillation, LORA
NeuralTuringMachineshaveexternalmemorythattheycanreadandwriteto.AttentionalInterfacesallowRNNstofocusonpartsoftheirinput.,Avisualoverviewofneuralattention,andthepowerfulextensionsofneuralnetworksbeingbuiltontopofit.Distillisdedicatedtoclearexplan...。參考影片的文章的如下: