Instructions to use kernels-community/megablocks with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use kernels-community/megablocks with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("kernels-community/megablocks") - Notebooks
- Google Colab
- Kaggle
fix-grid-limits
for people using megablocks for training and thus having seqlen=4096 . This will yield a Triton Error [CUDA]: invalid argument at _binned_copy[(num_experts, expert_capacity)] as expert_capacity needs to be < 65535 (as per cuda doc) . Reason for expert_capacity to be that large is that large is because of tokens_per_expert = top_k * tokens * world_size / num_experts. We can't change value of top_k and num_experts as most models has been trained with those specific set of values. One simple fix is to swap the dims of the kernels as 1st dim has a hard limit of 2^31-1. Plus num_experts rarely goes to that number anyway
@3outeille this change seems reasonable and makes sense but in order to avoid any regressions would you be able to add a test? Thanks!!
hello, I added tests here: https://huggingface.co/kernels-community/megablocks/discussions/2/files#d2h-153908