tchung1970 Claude commited on
Commit
ce80858
ยท
1 Parent(s): 06529b5

Localize UI to Korean and add CLAUDE.md

Browse files

- Translated all UI elements to Korean (labels, buttons, messages)
- Added CLAUDE.md documentation for future Claude Code instances

๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (2) hide show
  1. CLAUDE.md +93 -0
  2. app.py +32 -32
CLAUDE.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ This is a Hugging Face Gradio Space that implements a FLUX.2-dev image generation application. FLUX.2-dev is a 32B parameter rectified flow model capable of generating, editing, and combining images based on text instructions.
8
+
9
+ The application uses a remote text encoder service and applies AOT (Ahead-of-Time) compilation optimizations for the transformer blocks to improve inference performance on ZeroGPU infrastructure.
10
+
11
+ ## Architecture
12
+
13
+ ### Core Components
14
+
15
+ - **app.py**: Main Gradio application entry point
16
+ - Handles UI setup and user interactions
17
+ - Implements the `infer()` function decorated with `@spaces.GPU(duration=get_duration)` for dynamic GPU allocation
18
+ - Uses `remote_text_encoder()` to offload text encoding to an external Gradio client (`multimodalart/mistral-text-encoder`)
19
+ - Pipeline initialization with text encoder set to None (external text encoding)
20
+ - Sets attention backend to `"_flash_3_hub"` for optimized attention computation
21
+
22
+ - **optimization.py**: AOT compilation optimization module
23
+ - `optimize_pipeline_()` function compiles transformer blocks using torch.export and AOT Inductor
24
+ - Handles both 'double' and 'single' transformer block types
25
+ - Uses dynamic shapes to support variable image sequence lengths (0-3 images at 1024x1024)
26
+ - Leverages `spaces.aoti_capture()`, `torch.export.export()`, and `spaces.aoti_compile()` for compilation
27
+ - Replaces block forward methods with `ZeroGPUCompiledModel` instances
28
+
29
+ ### Key Design Patterns
30
+
31
+ 1. **Remote Text Encoding**: Text encoding is offloaded to a separate Gradio service to reduce memory footprint and optimize GPU usage for the main diffusion pipeline.
32
+
33
+ 2. **Dynamic GPU Duration**: The `get_duration()` function dynamically calculates GPU duration based on the number of input images and inference steps, optimizing resource allocation on ZeroGPU infrastructure.
34
+
35
+ 3. **AOT Compilation**: Transformer blocks are compiled ahead-of-time with specific dynamic shapes and inductor configurations to maximize performance during inference.
36
+
37
+ 4. **Multi-Image Support**: The pipeline supports optional input images for image editing and combination tasks via the gallery input component.
38
+
39
+ ## Development Commands
40
+
41
+ ### Running the Application
42
+
43
+ ```bash
44
+ python app.py
45
+ ```
46
+
47
+ The Gradio app will launch and be accessible at the provided local URL (default: http://127.0.0.1:7860).
48
+
49
+ ### Dependencies
50
+
51
+ Install dependencies from requirements.txt:
52
+
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ Note: The repository uses a specific diffusers commit from GitHub rather than the PyPI release.
58
+
59
+ ## Important Implementation Details
60
+
61
+ ### Pipeline Initialization
62
+
63
+ The pipeline is initialized with `text_encoder=None` because text encoding is handled remotely. The transformer uses Flash Attention 3 (`_flash_3_hub` backend) for optimized attention computation.
64
+
65
+ ### GPU Allocation
66
+
67
+ The `@spaces.GPU(duration=get_duration)` decorator dynamically allocates GPU time based on:
68
+ - Base time: 65 seconds
69
+ - Additional time per inference step: 1 + 0.7 ร— number_of_input_images seconds
70
+
71
+ ### Transformer Block Compilation
72
+
73
+ When modifying the optimization logic in optimization.py:
74
+ - The `TRANSFORMER_IMAGE_DIM` ranges from 4096 (0 images) to 16384 (3 images at 1024ร—1024)
75
+ - Dynamic shapes are critical for supporting variable-length image sequences
76
+ - Both 'double' and 'single' transformer blocks must be compiled separately
77
+ - The compilation process takes up to 1200 seconds (20 minutes) per block type
78
+
79
+ ### Image Input Handling
80
+
81
+ Input images are passed as a gallery component. The infer function converts the gallery format (list of tuples) to a simple list of PIL images by extracting `item[0]` from each gallery item.
82
+
83
+ ## Configuration
84
+
85
+ - **Model**: `black-forest-labs/FLUX.2-dev` from Hugging Face Hub
86
+ - **Device**: CUDA (bfloat16 precision)
87
+ - **Max Image Size**: 1024ร—1024
88
+ - **Default Inference Steps**: 30
89
+ - **Default Guidance Scale**: 4.0
90
+
91
+ ## Testing Notes
92
+
93
+ The application includes cached examples for both text-only and multi-image generation. Examples are cached in "lazy" mode, meaning they are computed on-demand when first accessed.
app.py CHANGED
@@ -74,20 +74,20 @@ def infer(prompt, input_images=None, seed=42, randomize_seed=False, width=1024,
74
 
75
  if randomize_seed:
76
  seed = random.randint(0, MAX_SEED)
77
-
78
  # Get prompt embeddings from remote text encoder
79
- progress(0.1, desc="Encoding prompt...")
80
  prompt_embeds = remote_text_encoder(prompt).to("cuda")
81
-
82
  # Prepare image list (convert None or empty gallery to None)
83
  image_list = None
84
  if input_images is not None and len(input_images) > 0:
85
  image_list = []
86
  for item in input_images:
87
  image_list.append(item[0])
88
-
89
  # Generate image
90
- progress(0.3, desc="Generating image...")
91
  generator = torch.Generator(device=device).manual_seed(seed)
92
  image = pipe(
93
  prompt_embeds=prompt_embeds,
@@ -121,77 +121,77 @@ css="""
121
  """
122
 
123
  with gr.Blocks() as demo:
124
-
125
  with gr.Column(elem_id="col-container"):
126
  gr.Markdown(f"""# FLUX.2 [dev]
127
- FLUX.2 [dev] is a 32B model rectified flow capable of generating, editing and combining images based on text instructions model [[model](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[blog](https://bfl.ai/blog/flux-2)]
128
  """)
129
 
130
- with gr.Accordion("Input image(s) (optional)", open=False):
131
  input_images = gr.Gallery(
132
- label="Input Image(s)",
133
  type="pil",
134
  columns=3,
135
  rows=1,
136
  )
137
-
138
  with gr.Row():
139
-
140
  prompt = gr.Text(
141
- label="Prompt",
142
  show_label=False,
143
  max_lines=2,
144
- placeholder="Enter your prompt",
145
  container=False,
146
  scale=3
147
  )
148
-
149
- run_button = gr.Button("Run", scale=1)
150
-
151
- result = gr.Image(label="Result", show_label=False)
152
 
153
- with gr.Accordion("Advanced Settings", open=False):
154
-
 
 
155
  seed = gr.Slider(
156
- label="Seed",
157
  minimum=0,
158
  maximum=MAX_SEED,
159
  step=1,
160
  value=0,
161
  )
162
-
163
- randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
164
-
165
  with gr.Row():
166
-
167
  width = gr.Slider(
168
- label="Width",
169
  minimum=256,
170
  maximum=MAX_IMAGE_SIZE,
171
  step=32,
172
  value=1024,
173
  )
174
-
175
  height = gr.Slider(
176
- label="Height",
177
  minimum=256,
178
  maximum=MAX_IMAGE_SIZE,
179
  step=32,
180
  value=1024,
181
  )
182
-
183
  with gr.Row():
184
-
185
  num_inference_steps = gr.Slider(
186
- label="Number of inference steps",
187
  minimum=1,
188
  maximum=100,
189
  step=1,
190
  value=30,
191
  )
192
-
193
  guidance_scale = gr.Slider(
194
- label="Guidance scale",
195
  minimum=0.0,
196
  maximum=10.0,
197
  step=0.1,
 
74
 
75
  if randomize_seed:
76
  seed = random.randint(0, MAX_SEED)
77
+
78
  # Get prompt embeddings from remote text encoder
79
+ progress(0.1, desc="ํ”„๋กฌํ”„ํŠธ ์ธ์ฝ”๋”ฉ ์ค‘...")
80
  prompt_embeds = remote_text_encoder(prompt).to("cuda")
81
+
82
  # Prepare image list (convert None or empty gallery to None)
83
  image_list = None
84
  if input_images is not None and len(input_images) > 0:
85
  image_list = []
86
  for item in input_images:
87
  image_list.append(item[0])
88
+
89
  # Generate image
90
+ progress(0.3, desc="์ด๋ฏธ์ง€ ์ƒ์„ฑ ์ค‘...")
91
  generator = torch.Generator(device=device).manual_seed(seed)
92
  image = pipe(
93
  prompt_embeds=prompt_embeds,
 
121
  """
122
 
123
  with gr.Blocks() as demo:
124
+
125
  with gr.Column(elem_id="col-container"):
126
  gr.Markdown(f"""# FLUX.2 [dev]
127
+ FLUX.2 [dev]๋Š” ํ…์ŠคํŠธ ์ง€์‹œ์‚ฌํ•ญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ, ํŽธ์ง‘ ๋ฐ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” 32B ํŒŒ๋ผ๋ฏธํ„ฐ rectified flow ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค [[๋ชจ๋ธ](https://huggingface.co/black-forest-labs/FLUX.2-dev)], [[๋ธ”๋กœ๊ทธ](https://bfl.ai/blog/flux-2)]
128
  """)
129
 
130
+ with gr.Accordion("์ž…๋ ฅ ์ด๋ฏธ์ง€ (์„ ํƒ์‚ฌํ•ญ)", open=False):
131
  input_images = gr.Gallery(
132
+ label="์ž…๋ ฅ ์ด๋ฏธ์ง€",
133
  type="pil",
134
  columns=3,
135
  rows=1,
136
  )
137
+
138
  with gr.Row():
139
+
140
  prompt = gr.Text(
141
+ label="ํ”„๋กฌํ”„ํŠธ",
142
  show_label=False,
143
  max_lines=2,
144
+ placeholder="ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•˜์„ธ์š”",
145
  container=False,
146
  scale=3
147
  )
148
+
149
+ run_button = gr.Button("์‹คํ–‰", scale=1)
 
 
150
 
151
+ result = gr.Image(label="๊ฒฐ๊ณผ", show_label=False)
152
+
153
+ with gr.Accordion("๊ณ ๊ธ‰ ์„ค์ •", open=False):
154
+
155
  seed = gr.Slider(
156
+ label="์‹œ๋“œ",
157
  minimum=0,
158
  maximum=MAX_SEED,
159
  step=1,
160
  value=0,
161
  )
162
+
163
+ randomize_seed = gr.Checkbox(label="๋žœ๋ค ์‹œ๋“œ", value=True)
164
+
165
  with gr.Row():
166
+
167
  width = gr.Slider(
168
+ label="๋„ˆ๋น„",
169
  minimum=256,
170
  maximum=MAX_IMAGE_SIZE,
171
  step=32,
172
  value=1024,
173
  )
174
+
175
  height = gr.Slider(
176
+ label="๋†’์ด",
177
  minimum=256,
178
  maximum=MAX_IMAGE_SIZE,
179
  step=32,
180
  value=1024,
181
  )
182
+
183
  with gr.Row():
184
+
185
  num_inference_steps = gr.Slider(
186
+ label="์ถ”๋ก  ๋‹จ๊ณ„ ์ˆ˜",
187
  minimum=1,
188
  maximum=100,
189
  step=1,
190
  value=30,
191
  )
192
+
193
  guidance_scale = gr.Slider(
194
+ label="๊ฐ€์ด๋˜์Šค ์Šค์ผ€์ผ",
195
  minimum=0.0,
196
  maximum=10.0,
197
  step=0.1,