Upload folder using huggingface_hub
Browse files- .gitattributes +6 -0
- README.md +80 -11
- ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors +3 -0
- ltx23_inpaint_masked_t2v_rank128_v1_02500steps.safetensors +3 -0
- ltx23_inpaint_masked_t2v_rank128_v1_10000steps.safetensors +3 -0
- videos/sample_1_inpaint_10000.mp4 +3 -0
- videos/sample_1_inpaint_2500.mp4 +3 -0
- videos/sample_1_masked_r2v_3000.mp4 +3 -0
- videos/sample_2_masked_r2v_3000.mp4 +3 -0
- videos/sample_3_masked_r2v_3000.mp4 +3 -0
- videos/sample_4_masked_r2v_3000.mp4 +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
videos/sample_1_inpaint_10000.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
videos/sample_1_inpaint_2500.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
videos/sample_1_masked_r2v_3000.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
videos/sample_2_masked_r2v_3000.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
videos/sample_3_masked_r2v_3000.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
videos/sample_4_masked_r2v_3000.mp4 filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -18,8 +18,9 @@ These LoRAs may cover different use cases over time, so this repository is not l
|
|
| 18 |
|
| 19 |
| File | Description |
|
| 20 |
|---|---|
|
| 21 |
-
| `ltx23_inpaint_rank128_v1_02500steps.safetensors` | Sometimes this checkpoint
|
| 22 |
-
| `ltx23_inpaint_rank128_v1_10000steps.safetensors` | Sometimes this checkpoint doesn't follow instructions quite right, because it focuses more on the size of the mask
|
|
|
|
| 23 |
|
| 24 |
Use whatever suits you best.
|
| 25 |
|
|
@@ -40,6 +41,13 @@ must be treated as **a single video**.
|
|
| 40 |
|
| 41 |
After that, you need to use the **`LTXVAddGuideMulti`** node to pass the guide video into the model.
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## About the mask format used during training
|
| 44 |
|
| 45 |
My dataset included samples where the mask was more **blockified**. In other words, the default pattern used **8x8 blocks**.
|
|
@@ -50,6 +58,13 @@ To better reproduce the training conditions during inference, you can use:
|
|
| 50 |
|
| 51 |
This may help make the mask distribution closer to what the model saw during training.
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
## Notes
|
| 54 |
|
| 55 |
- **Base model:** `Lightricks/LTX-2.3`
|
|
@@ -57,6 +72,8 @@ This may help make the mask distribution closer to what the model saw during tra
|
|
| 57 |
- prompt adherence
|
| 58 |
- use of the masked area
|
| 59 |
- overfitting tendency
|
|
|
|
|
|
|
| 60 |
|
| 61 |
## Practical recommendations
|
| 62 |
|
|
@@ -64,6 +81,7 @@ For the inpainting LoRAs in this repo:
|
|
| 64 |
|
| 65 |
- If you want **better prompt adherence**, try the **2500 steps** checkpoint first
|
| 66 |
- If you want **better use of the masked area**, try the **10000 steps** checkpoint first
|
|
|
|
| 67 |
|
| 68 |
The best approach is to compare both in your workflow, since preference may vary depending on the scene, mask, and prompt.
|
| 69 |
|
|
@@ -75,38 +93,89 @@ The best approach is to compare both in your workflow, since preference may vary
|
|
| 75 |
|
| 76 |
**Model:** `ltx23_inpaint_rank128_v1_02500steps.safetensors`
|
| 77 |
|
| 78 |
-
**Video:**
|
|
|
|
|
|
|
| 79 |
|
| 80 |
**Prompt:**
|
| 81 |
|
| 82 |
---
|
| 83 |
|
| 84 |
-
##
|
| 85 |
|
| 86 |
-
|
| 87 |
|
| 88 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
**Prompt:**
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
-
## Examples —
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
### Example 1
|
| 97 |
|
| 98 |
-
**Model:** `
|
|
|
|
|
|
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
**Prompt:**
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
---
|
| 105 |
|
| 106 |
### Example 2
|
| 107 |
|
| 108 |
-
**Model:** `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
-
|
| 111 |
|
| 112 |
**Prompt:**
|
|
|
|
|
|
| 18 |
|
| 19 |
| File | Description |
|
| 20 |
|---|---|
|
| 21 |
+
| `ltx23_inpaint_rank128_v1_02500steps.safetensors` | Sometimes this checkpoint follows the prompt better, probably because it experienced less overfitting. |
|
| 22 |
+
| `ltx23_inpaint_rank128_v1_10000steps.safetensors` | Sometimes this checkpoint doesn't follow instructions quite right, because it focuses more on the size of the mask; but other than that, it uses the masked region better. This is probably because it experienced more overfitting after a longer training period on a more limited dataset. |
|
| 23 |
+
| `ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors` | Inpainting LoRA with reference support. This model allows inpainting while also using a visual reference, which can help guide the desired replacement more precisely. Prompt quality is extremely important for good results, and mask size matters even more. |
|
| 24 |
|
| 25 |
Use whatever suits you best.
|
| 26 |
|
|
|
|
| 41 |
|
| 42 |
After that, you need to use the **`LTXVAddGuideMulti`** node to pass the guide video into the model.
|
| 43 |
|
| 44 |
+
### Required colors
|
| 45 |
+
|
| 46 |
+
For inference to match the training setup, the colors are important:
|
| 47 |
+
|
| 48 |
+
- the **mask must be magenta**: **`(255, 0, 255)`**
|
| 49 |
+
- the **green area of the reference must be chroma key green**: **`(0, 255, 0)`**
|
| 50 |
+
|
| 51 |
## About the mask format used during training
|
| 52 |
|
| 53 |
My dataset included samples where the mask was more **blockified**. In other words, the default pattern used **8x8 blocks**.
|
|
|
|
| 58 |
|
| 59 |
This may help make the mask distribution closer to what the model saw during training.
|
| 60 |
|
| 61 |
+
For the new reference-based inpainting LoRA, this is especially important:
|
| 62 |
+
|
| 63 |
+
- sometimes you need to use **blockify** so the mask becomes more agnostic to the previous object's shape
|
| 64 |
+
- sometimes you need to expand the mask to give the new object more room to work properly
|
| 65 |
+
- a good default recommendation is **Blockify Mask** with **size 8**
|
| 66 |
+
- you can expand up to **512**, which effectively makes the mask become a full rectangle
|
| 67 |
+
|
| 68 |
## Notes
|
| 69 |
|
| 70 |
- **Base model:** `Lightricks/LTX-2.3`
|
|
|
|
| 72 |
- prompt adherence
|
| 73 |
- use of the masked area
|
| 74 |
- overfitting tendency
|
| 75 |
+
- For the reference-based LoRA, **prompting is extremely important** for result quality
|
| 76 |
+
- For the reference-based LoRA, **mask size and mask preparation are critical**
|
| 77 |
|
| 78 |
## Practical recommendations
|
| 79 |
|
|
|
|
| 81 |
|
| 82 |
- If you want **better prompt adherence**, try the **2500 steps** checkpoint first
|
| 83 |
- If you want **better use of the masked area**, try the **10000 steps** checkpoint first
|
| 84 |
+
- If you want **an inpainting LoRA that works very well both with a visual reference and in a text-only setup**, try **`ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors`**
|
| 85 |
|
| 86 |
The best approach is to compare both in your workflow, since preference may vary depending on the scene, mask, and prompt.
|
| 87 |
|
|
|
|
| 93 |
|
| 94 |
**Model:** `ltx23_inpaint_rank128_v1_02500steps.safetensors`
|
| 95 |
|
| 96 |
+
**Video:** `videos/sample_1_inpaint_2500.mp4`
|
| 97 |
+
|
| 98 |
+
<video src="videos/sample_1_inpaint_2500.mp4" controls></video>
|
| 99 |
|
| 100 |
**Prompt:**
|
| 101 |
|
| 102 |
---
|
| 103 |
|
| 104 |
+
## Examples — 10000 Steps
|
| 105 |
|
| 106 |
+
### Example 1
|
| 107 |
|
| 108 |
+
**Model:** `ltx23_inpaint_rank128_v1_10000steps.safetensors`
|
| 109 |
+
|
| 110 |
+
**Video:** `videos/sample_1_inpaint_10000.mp4`
|
| 111 |
+
|
| 112 |
+
<video src="videos/sample_1_inpaint_10000.mp4" controls></video>
|
| 113 |
|
| 114 |
**Prompt:**
|
| 115 |
|
| 116 |
---
|
| 117 |
|
| 118 |
+
## Examples — Reference Inpainting (R2V) — 3000 Steps
|
| 119 |
+
|
| 120 |
+
This LoRA can be used in two ways:
|
| 121 |
+
|
| 122 |
+
- **Reference-guided inpainting**, where the reference image actively guides the replacement
|
| 123 |
+
- **Text-only style inpainting**, by sending a **blank image** as the reference input
|
| 124 |
+
|
| 125 |
+
A practical issue to keep in mind is **identity leakage**. In some scenes, if the prompt is not specific enough, the model may copy identity traits or visual details from another character already present in the source scene instead of following the intended reference closely. This is especially important for full-body references, so prompt specificity matters a lot.
|
| 126 |
|
| 127 |
### Example 1
|
| 128 |
|
| 129 |
+
**Model:** `ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors`
|
| 130 |
+
|
| 131 |
+
**Video:** `videos/sample_1_masked_r2v_3000.mp4`
|
| 132 |
|
| 133 |
+
<video src="videos/sample_1_masked_r2v_3000.mp4" controls></video>
|
| 134 |
|
| 135 |
**Prompt:**
|
| 136 |
+
`A man resembling Donald Trump playing an electric guitar on stage, making energetic performance movements, with a confident pose, expressive body language, and dynamic rockstar attitude. He is holding the guitar dramatically while performing, with stage lighting, motion, and a lively concert atmosphere.`
|
| 137 |
+
|
| 138 |
+
**Note:** This example used a **full-body reference**. The prompt had to be very specific, otherwise the model would leak identity details from another character already present in the scene instead of following the intended reference. In practice, this means prompt specificity is important to reduce **identity leakage**.
|
| 139 |
|
| 140 |
---
|
| 141 |
|
| 142 |
### Example 2
|
| 143 |
|
| 144 |
+
**Model:** `ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors`
|
| 145 |
+
|
| 146 |
+
**Video:** `videos/sample_2_masked_r2v_3000.mp4`
|
| 147 |
+
|
| 148 |
+
<video src="videos/sample_2_masked_r2v_3000.mp4" controls></video>
|
| 149 |
+
|
| 150 |
+
**Prompt:**
|
| 151 |
+
|
| 152 |
+
`A nighttime mountain road drift scene with a Tesla Cybertruck performing a dramatic high-speed drift around a sharp curve, sliding sideways across the asphalt with aggressive motion and strong driving energy. The Cybertruck has its iconic angular triangular wedge-shaped body, metallic panels, sharp geometric silhouette, bright headlights, and futuristic design clearly visible. Tire smoke and drift streaks trail behind the vehicle, matching the speed and intensity of the original action scene. Streetlights illuminate the road, the forest background remains dark and dense, and the camera perspective stays low and cinematic, emphasizing motion, speed, and control. The Cybertruck is fully integrated into the scene with realistic scale, lighting, shadows, reflections, and ground contact.`
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
### Example 3
|
| 157 |
+
|
| 158 |
+
**Model:** `ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors`
|
| 159 |
+
|
| 160 |
+
**Video:** `videos/sample_3_masked_r2v_3000.mp4`
|
| 161 |
+
|
| 162 |
+
<video src="videos/sample_3_masked_r2v_3000.mp4" controls></video>
|
| 163 |
+
|
| 164 |
+
**Prompt:**
|
| 165 |
+
|
| 166 |
+
`A nighttime mountain road drift scene with a classic Volkswagen Beetle performing a dramatic high-speed drift around a sharp curve, sliding sideways across the asphalt with strong motion and driving energy. The Beetle has its iconic rounded body, compact vintage shape, circular headlights, curved roofline, and unmistakable classic design clearly visible. Tire smoke and drift streaks trail behind the car, matching the speed and intensity of the original action scene. Streetlights illuminate the road, the forest background remains dark and dense, and the camera perspective stays low and cinematic, emphasizing motion, speed, and control. The Beetle is fully integrated into the scene with realistic scale, lighting, shadows, reflections, and ground contact.`
|
| 167 |
+
|
| 168 |
+
**Note:** This example shows that the **masked R2V model can also be used like a text-only inpainting model without a real reference**. To do that, simply send a **blank image** in place of the reference image.
|
| 169 |
+
|
| 170 |
+
---
|
| 171 |
+
|
| 172 |
+
### Example 4
|
| 173 |
+
|
| 174 |
+
**Model:** `ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors`
|
| 175 |
+
|
| 176 |
+
**Video:** `videos/sample_4_masked_r2v_3000.mp4`
|
| 177 |
|
| 178 |
+
<video src="videos/sample_4_masked_r2v_3000.mp4" controls></video>
|
| 179 |
|
| 180 |
**Prompt:**
|
| 181 |
+
`A man riding a tiger on a rural road in daylight, with the tiger rearing upward in a dramatic wheelie-like pose, matching the same dynamic action and position as the original motorcycle. The tiger is large, powerful, and realistic, with orange fur, black stripes, strong muscles, and natural anatomy clearly visible. The man is balanced on top of the tiger as if controlling it during the stunt, with believable body posture and strong motion energy. Keep the same road, camera angle, lighting, shadows, background, and overall composition unchanged. The scene should feel like a real action moment captured outdoors.`
|
ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0da59538eaa5737ad030e1459c9b22505696add818a413322cd04911798d220a
|
| 3 |
+
size 327287384
|
ltx23_inpaint_masked_t2v_rank128_v1_02500steps.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7e8d2082f79be715774026a4fbbddaa2d64154d4dc9b956ade5208cc4dd8adf8
|
| 3 |
+
size 1308756416
|
ltx23_inpaint_masked_t2v_rank128_v1_10000steps.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1f2089f5cd3cac56f93d3641c90695a7296514aa1292ec8c6f3a6ad369eda728
|
| 3 |
+
size 1308756416
|
videos/sample_1_inpaint_10000.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:167c53ffcc2243d44b7afa2f0c7234918d945f38dd36be6b6924bc9126ed3bb9
|
| 3 |
+
size 2188724
|
videos/sample_1_inpaint_2500.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a65e4805d85a2a642ffcd93f6ec88ce748d786cdb596c0856bd1af8902ffd1cc
|
| 3 |
+
size 2194903
|
videos/sample_1_masked_r2v_3000.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:822cdf8d15875867b3d0638f368d998459ca7d678660166faa04b6cc2767db32
|
| 3 |
+
size 3485848
|
videos/sample_2_masked_r2v_3000.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ff6fb69316e686d8e85c62a2f294b9596828adeb6fd7620003a7c0138e19c709
|
| 3 |
+
size 2929012
|
videos/sample_3_masked_r2v_3000.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:af20d824f50b257a5a19092fd709d6f35a1175e6b6fd0ef1effa5d3be50fbad9
|
| 3 |
+
size 3005551
|
videos/sample_4_masked_r2v_3000.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0923c1ea4075016e4e4219b60adc6663865118de408d43885a1657974b6e389b
|
| 3 |
+
size 3561935
|