FOFPred / README.md

Upload FOFPred pipeline (#1)

e5e3587 verified 5 months ago

1.82 kB

license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
  - optical-flow prediction
  - motion prediction
  - diffusion

FOFPred: Language-Driven Future Optical Flow Prediction

FOFPred is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., "Moving the water bottle from right to left"), FOFPred generates 4 sequential optical flow frames showing how objects would move.

Usage

import torch
from fofpred.pipelines.fofpred.pipeline_fofpred import FOFPredPipeline
from fofpred.schedulers.scheduling_flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler
from PIL import Image

pipeline = FOFPredPipeline.from_pretrained(
    "Salesforce/FOFPred",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipeline.scheduler = FlowMatchEulerDiscreteScheduler()

results = pipeline(
    prompt="Moving the water bottle from right to left.",
    input_images=[Image.open("your_image.jpg")],
    width=256,
    height=256,
    num_inference_steps=1,
    num_images_per_prompt=4,
    frame_count=4,
    generator=torch.Generator(device="cuda").manual_seed(42),
    output_type="pt",
)

flow_frames = results.images  # [B, F, C, H, W]

Architecture

Component	Model
V-LLM	Qwen2.5-VL-3B-Instruct
DiT	OmniGen2Transformer3DModel
VAE	FLUX.1-dev AutoencoderKL
Scheduler	FlowMatchEulerDiscreteScheduler

Salesforce
/

FOFPred

FOFPred: Language-Driven Future Optical Flow Prediction

Usage

Architecture

Acknowledgements

License