FlashAttention Windows Wheel
Unofficial Windows-compatible wheel of flash-attention for Windows
Python 3.12-3.14 versions only.
Overview
This repository provides Windows-compatible wheels for FlashAttention-2 that are not officially distributed.
Pre-built version: flash_attn 2.9.0 with Python 3.12-3.14 support.
!!Important!!
If you intend to use this alongside xformers, please note that versions v2.9.0 and later—which I developed myself—are not compatible. If you are using xformers, please use the latest official build, v2.8.4.
Key Features
- ✅ Native Windows support (Python 3.12-3.14)
- ⚡ FlashAttention-2
Changelog
- 15.11.2025 Uploaded v2.8.3 based on PyTorch 2.9.1+cu130
- 12.02.2026 Uploaded v2.8.3 based on PyTorch 2.10.0+cu130
- 29.03.2026 Uploaded v2.8.3 based on PyTorch 2.11.0+cu130
- 14.05.2026 Uploaded v2.8.4 based on PyTorch 2.11.0+cu130
- 13.05.2026 Uploaded v2.9.0 based on PyTorch 2.11.0+cu130 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
- 15.05.2026 Uploaded v2.9.0 based on PyTorch 2.12.0+cu132
- 23.05.2026 Uploaded v2.9.1 based on PyTorch 2.12.0+cu132 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
- 03.07.2026 Uploaded v2.8.4 based on Pytorch 2.12.1+cu132 (Built for users of xformers)
About v2.9.x
v2.9.x is not an official FlashAttention release.
It is an independent fork build that continues FA2 kernel development while upstream focuses on FA3/F4.
- Optimization plan (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/AI/FA2_BACKPORT_FROM_FA4_PLAN.md
- Kernel change notes (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/md/FA2_CHANGES_v1.2.md
- About v2.9.1 A-1 / A-2 Refinements (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/AI/A1_A2_REFINEMENTS_PLAN.md
Disclaimer
- No performance benchmarks have been run on this build.
- No multi-environment testing has been performed. (However, It's Working on my Stable Diffusuion A1111(※Customed), Forge Nunchaku(※Customed) and ComfyUI)
- Fork
flash_attn2.9.0 — test of whether features work and can be used normally (log, GitHub): https://github.com/ussoewwin/flash-attention/blob/main/md/2.9.0_COMPLETE_TEST_AND_VALIDATION_GUIDE.md - This is an unofficial fork build. Use at your own risk.
※Unofficial built version!! It works correctly in my environment, but I am not sure that will work in yours.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support