FlashAttention Windows Wheel

Unofficial Windows-compatible wheel of flash-attention for Windows
Python 3.12-3.14 versions only.

Overview

This repository provides Windows-compatible wheels for FlashAttention-2 that are not officially distributed.
Pre-built version: flash_attn 2.9.0 with Python 3.12-3.14 support.

!!Important!!

If you intend to use this alongside xformers, please note that versions v2.9.0 and later—which I developed myself—are not compatible. If you are using xformers, please use the latest official build, v2.8.4.

Key Features

✅ Native Windows support (Python 3.12-3.14)
⚡ FlashAttention-2

Changelog

15.11.2025 Uploaded v2.8.3 based on PyTorch 2.9.1+cu130
12.02.2026 Uploaded v2.8.3 based on PyTorch 2.10.0+cu130
29.03.2026 Uploaded v2.8.3 based on PyTorch 2.11.0+cu130
14.05.2026 Uploaded v2.8.4 based on PyTorch 2.11.0+cu130
13.05.2026 Uploaded v2.9.0 based on PyTorch 2.11.0+cu130 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
15.05.2026 Uploaded v2.9.0 based on PyTorch 2.12.0+cu132
23.05.2026 Uploaded v2.9.1 based on PyTorch 2.12.0+cu132 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.
03.07.2026 Uploaded v2.8.4 based on Pytorch 2.12.1+cu132 (Built for users of xformers)
11.07.2026 Uploaded v2.9.1 based on PyTorch 2.13.0+cu132 — unofficial fork-only build (not the official FlashAttention release). Includes FA2 A-1/A-2 optimizations.

About v2.9.x

v2.9.x is not an official FlashAttention release.
It is an independent fork build that continues FA2 kernel development while upstream focuses on FA3/F4.

Optimization plan (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/AI/FA2_BACKPORT_FROM_FA4_PLAN.md
Kernel change notes (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/md/FA2_CHANGES_v1.2.md
About v2.9.1 A-1 / A-2 Refinements (GitHub): https://github.com/ussoewwin/flash-attention/blob/main/AI/A1_A2_REFINEMENTS_PLAN.md

Disclaimer

No performance benchmarks have been run on this build.
No multi-environment testing has been performed. (However, It's Working on my Stable Diffusuion A1111(※Customed), Forge Nunchaku(※Customed) and ComfyUI)
Fork flash_attn 2.9.0 — test of whether features work and can be used normally (log, GitHub): https://github.com/ussoewwin/flash-attention/blob/main/md/2.9.0_COMPLETE_TEST_AND_VALIDATION_GUIDE.md
This is an unofficial fork build. Use at your own risk.

※Unofficial built version!! It works correctly in my environment, but I am not sure that will work in yours.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support