Remove flash-attn from requirements and GPU inference example

by YingxuHe - opened Mar 20

←

MERaLiON org Mar 20

Remove flash-attn as a required dependency and remove attn_implementation="flash_attention_2" from the GPU inference example.

The model works with PyTorch's built-in SDPA attention which is auto-selected by transformers when flash-attn is not installed.

YingxuHe changed pull request status to merged Mar 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment