🪿QRWKV-72B and 32B : Training large attention free models, with only 8 GPU's

‼️ Attention is NOT all you need ‼️
substack.recursal.ai substack.recursal.ai