FLUX.2-dev

Running on Zero

tchung1970 Claude commited on 16 days ago

Commit

ce80858

1 Parent(s): 06529b5

Localize UI to Korean and add CLAUDE.md

- Translated all UI elements to Korean (labels, buttons, messages)
- Added CLAUDE.md documentation for future Claude Code instances

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show

CLAUDE.md +93 -0
app.py +32 -32

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,93 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+This is a Hugging Face Gradio Space that implements a FLUX.2-dev image generation application. FLUX.2-dev is a 32B parameter rectified flow model capable of generating, editing, and combining images based on text instructions.
+The application uses a remote text encoder service and applies AOT (Ahead-of-Time) compilation optimizations for the transformer blocks to improve inference performance on ZeroGPU infrastructure.
+## Architecture
+### Core Components
+- **app.py**: Main Gradio application entry point
+  - Handles UI setup and user interactions
+  - Implements the `infer()` function decorated with `@spaces.GPU(duration=get_duration)` for dynamic GPU allocation
+  - Uses `remote_text_encoder()` to offload text encoding to an external Gradio client (`multimodalart/mistral-text-encoder`)
+  - Pipeline initialization with text encoder set to None (external text encoding)
+  - Sets attention backend to `"_flash_3_hub"` for optimized attention computation
+- **optimization.py**: AOT compilation optimization module
+  - `optimize_pipeline_()` function compiles transformer blocks using torch.export and AOT Inductor
+  - Handles both 'double' and 'single' transformer block types
+  - Uses dynamic shapes to support variable image sequence lengths (0-3 images at 1024x1024)
+  - Leverages `spaces.aoti_capture()`, `torch.export.export()`, and `spaces.aoti_compile()` for compilation
+  - Replaces block forward methods with `ZeroGPUCompiledModel` instances
+### Key Design Patterns
+1. **Remote Text Encoding**: Text encoding is offloaded to a separate Gradio service to reduce memory footprint and optimize GPU usage for the main diffusion pipeline.
+2. **Dynamic GPU Duration**: The `get_duration()` function dynamically calculates GPU duration based on the number of input images and inference steps, optimizing resource allocation on ZeroGPU infrastructure.
+3. **AOT Compilation**: Transformer blocks are compiled ahead-of-time with specific dynamic shapes and inductor configurations to maximize performance during inference.
+4. **Multi-Image Support**: The pipeline supports optional input images for image editing and combination tasks via the gallery input component.
+## Development Commands
+### Running the Application
+```bash
+python app.py
+```
+The Gradio app will launch and be accessible at the provided local URL (default: http://127.0.0.1:7860).
+### Dependencies
+Install dependencies from requirements.txt:
+```bash
+pip install -r requirements.txt
+```
+Note: The repository uses a specific diffusers commit from GitHub rather than the PyPI release.
+## Important Implementation Details
+### Pipeline Initialization
+The pipeline is initialized with `text_encoder=None` because text encoding is handled remotely. The transformer uses Flash Attention 3 (`_flash_3_hub` backend) for optimized attention computation.
+### GPU Allocation
+The `@spaces.GPU(duration=get_duration)` decorator dynamically allocates GPU time based on:
+- Base time: 65 seconds
+- Additional time per inference step: 1 + 0.7 × number_of_input_images seconds
+### Transformer Block Compilation
+When modifying the optimization logic in optimization.py:
+- The `TRANSFORMER_IMAGE_DIM` ranges from 4096 (0 images) to 16384 (3 images at 1024×1024)
+- Dynamic shapes are critical for supporting variable-length image sequences
+- Both 'double' and 'single' transformer blocks must be compiled separately
+- The compilation process takes up to 1200 seconds (20 minutes) per block type
+### Image Input Handling
+Input images are passed as a gallery component. The infer function converts the gallery format (list of tuples) to a simple list of PIL images by extracting `item[0]` from each gallery item.
+## Configuration
+- **Model**: `black-forest-labs/FLUX.2-dev` from Hugging Face Hub
+- **Device**: CUDA (bfloat16 precision)
+- **Max Image Size**: 1024×1024
+- **Default Inference Steps**: 30
+- **Default Guidance Scale**: 4.0
+## Testing Notes
+The application includes cached examples for both text-only and multi-image generation. Examples are cached in "lazy" mode, meaning they are computed on-demand when first accessed.

app.py CHANGED Viewed

@@ -74,20 +74,20 @@ def infer(prompt, input_images=None, seed=42, randomize_seed=False, width=1024,
     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
     # Get prompt embeddings from remote text encoder
-    progress(0.1, desc="Encoding prompt...")
     prompt_embeds = remote_text_encoder(prompt).to("cuda")
     # Prepare image list (convert None or empty gallery to None)
     image_list = None
     if input_images is not None and len(input_images) > 0:
         image_list = []
         for item in input_images:
             image_list.append(item[0])
     # Generate image
-    progress(0.3, desc="Generating image...")
     generator = torch.Generator(device=device).manual_seed(seed)
     image = pipe(
         prompt_embeds=prompt_embeds,
@@ -121,77 +121,77 @@ css="""
 """
 with gr.Blocks() as demo:
     with gr.Column(elem_id="col-container"):
         gr.Markdown(f"""# FLUX.2 [dev]
-FLUX.2 [dev] is a 32B model rectified flow capable of generating, editing and combining images based on text instructions model [[model](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[blog](https://bfl.ai/blog/flux-2)]
         """)
-        with gr.Accordion("Input image(s) (optional)", open=False):
             input_images = gr.Gallery(
-                label="Input Image(s)",
                 type="pil",
                 columns=3,
                 rows=1,
             )
         with gr.Row():
             prompt = gr.Text(
-                label="Prompt",
                 show_label=False,
                 max_lines=2,
-                placeholder="Enter your prompt",
                 container=False,
                 scale=3
             )
-            run_button = gr.Button("Run", scale=1)
-        result = gr.Image(label="Result", show_label=False)
-        with gr.Accordion("Advanced Settings", open=False):
             seed = gr.Slider(
-                label="Seed",
                 minimum=0,
                 maximum=MAX_SEED,
                 step=1,
                 value=0,
             )
-            randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
             with gr.Row():
                 width = gr.Slider(
-                    label="Width",
                     minimum=256,
                     maximum=MAX_IMAGE_SIZE,
                     step=32,
                     value=1024,
                 )
                 height = gr.Slider(
-                    label="Height",
                     minimum=256,
                     maximum=MAX_IMAGE_SIZE,
                     step=32,
                     value=1024,
                 )
             with gr.Row():
                 num_inference_steps = gr.Slider(
-                    label="Number of inference steps",
                     minimum=1,
                     maximum=100,
                     step=1,
                     value=30,
                 )
                 guidance_scale = gr.Slider(
-                    label="Guidance scale",
                     minimum=0.0,
                     maximum=10.0,
                     step=0.1,

     if randomize_seed:
         seed = random.randint(0, MAX_SEED)
     # Get prompt embeddings from remote text encoder
+    progress(0.1, desc="프롬프트 인코딩 중...")
     prompt_embeds = remote_text_encoder(prompt).to("cuda")
     # Prepare image list (convert None or empty gallery to None)
     image_list = None
     if input_images is not None and len(input_images) > 0:
         image_list = []
         for item in input_images:
             image_list.append(item[0])
     # Generate image
+    progress(0.3, desc="이미지 생성 중...")
     generator = torch.Generator(device=device).manual_seed(seed)
     image = pipe(
         prompt_embeds=prompt_embeds,
 """
 with gr.Blocks() as demo:
     with gr.Column(elem_id="col-container"):
         gr.Markdown(f"""# FLUX.2 [dev]
+FLUX.2 [dev]는 텍스트 지시사항을 기반으로 이미지를 생성, 편집 및 결합할 수 있는 32B 파라미터 rectified flow 모델입니다 [[모델](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[블로그](https://bfl.ai/blog/flux-2)]
         """)
+        with gr.Accordion("입력 이미지 (선택사항)", open=False):
             input_images = gr.Gallery(
+                label="입력 이미지",
                 type="pil",
                 columns=3,
                 rows=1,
             )
         with gr.Row():
             prompt = gr.Text(
+                label="프롬프트",
                 show_label=False,
                 max_lines=2,
+                placeholder="프롬프트를 입력하세요",
                 container=False,
                 scale=3
             )
+            run_button = gr.Button("실행", scale=1)
+        result = gr.Image(label="결과", show_label=False)
+        with gr.Accordion("고급 설정", open=False):
             seed = gr.Slider(
+                label="시드",
                 minimum=0,
                 maximum=MAX_SEED,
                 step=1,
                 value=0,
             )
+            randomize_seed = gr.Checkbox(label="랜덤 시드", value=True)
             with gr.Row():
                 width = gr.Slider(
+                    label="너비",
                     minimum=256,
                     maximum=MAX_IMAGE_SIZE,
                     step=32,
                     value=1024,
                 )
                 height = gr.Slider(
+                    label="높이",
                     minimum=256,
                     maximum=MAX_IMAGE_SIZE,
                     step=32,
                     value=1024,
                 )
             with gr.Row():
                 num_inference_steps = gr.Slider(
+                    label="추론 단계 수",
                     minimum=1,
                     maximum=100,
                     step=1,
                     value=30,
                 )
                 guidance_scale = gr.Slider(
+                    label="가이던스 스케일",
                     minimum=0.0,
                     maximum=10.0,
                     step=0.1,