FlashAttention Windows Wheel

Unofficial Windows-compatible wheel of flash-attention for Windows
Python 3.12-3.14 versions only.

Overview

This repository provides Windows-compatible wheels for FlashAttention-2 that are not officially distributed.
Pre-built version: flash_attn 2.9.0 with Python 3.12-3.14 support.

!!Important!!

If you intend to use this alongside xformers, please note that versions v2.9.0 and later—which I developed myself—are not compatible. If you are using xformers, please use the latest official build, v2.8.4.

Key Features

  • ✅ Native Windows support (Python 3.12-3.14)
  • ⚡ FlashAttention-2

Changelog

  • 15.11.2025 Uploaded v2.8.3 based on PyTorch 2.9.1+cu130
  • 12.02.2026 Uploaded v2.8.3 based on PyTorch 2.10.0+cu130
  • 29.03.2026 Uploaded v2.8.3 based on PyTorch 2.11.0+cu130
  • 14.05.2026 Uploaded v2.8.4 based on PyTorch 2.11.0+cu130
  • 13.05.2026 Uploaded v2.9.0 based on PyTorch 2.11.0+cu130 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
  • 15.05.2026 Uploaded v2.9.0 based on PyTorch 2.12.0+cu132
  • 23.05.2026 Uploaded v2.9.1 based on PyTorch 2.12.0+cu132 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
  • 03.07.2026 Uploaded v2.8.4 based on Pytorch 2.12.1+cu132 (Built for users of xformers)

About v2.9.x

v2.9.x is not an official FlashAttention release.
It is an independent fork build that continues FA2 kernel development while upstream focuses on FA3/F4.

Disclaimer

※Unofficial built version!! It works correctly in my environment, but I am not sure that will work in yours.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support