Fix SDPA & Flash-Attention

by Agnellino - opened Jun 18, 2025

base: refs/heads/main

←

from: refs/pr/7

Discussion Files changed

+178

-87

Agnellino

Jun 18, 2025

•

edited Jun 18, 2025

This PR aims at solving two issues with SDPA and flash attention.

SDPA uses _unmask_unattendedmethod of the AttentionMaskConverterbut this function appears nowhere. It is added in this PR.
Flash attention uses _get_unpad_data from transformers.models.llama.modeling_llama, but the star import does not include it for more recent versions of transformers (>=4.48.0).

The implementation of _unmask_unattended is a raw copy-paste of the implementation given in there, so nothing fancy to worry about: https://github.com/huggingface/transformers/blob/v4.37.0/src/transformers/modeling_attn_mask_utils.py#L189

I don't know why but it seems that a lot of lines of code are changed... it's not the case, simply an import of _get_unpad_data and the implementation of _unmask_unattended.

Adding `_unmask_unattended` as it was missing in AttentionMaskConverter.6d21096b

Agnellino changed pull request status to open Jun 18, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment