Improving Hugging Face Training Efficiency Through Packing with Flash Attention
•
20
mamba
is now available in transformers. Thanks to
@tridao
and
@albertgu
for this brilliant model! 🚀 and the amazing mamba-ssm
kernels powering this!