bubbliiiing commited on
Commit
bd36100
Β·
verified Β·
1 Parent(s): 3a50865

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -102
README.md CHANGED
@@ -1,103 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- # Z-Image-Turbo-Fun-Controlnet-Union
6
-
7
- [![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
8
-
9
- ## Model Features
10
- - This ControlNet is added on 6 blocks.
11
- - The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
12
- - It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
13
- - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.
14
-
15
- ## TODO
16
- - [ ] Train on more data and for more steps.
17
- - [ ] Support inpaint mode.
18
-
19
- ## Results
20
-
21
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
22
- <tr>
23
- <td>Pose</td>
24
- <td>Output</td>
25
- </tr>
26
- <tr>
27
- <td><img src="asset/pose2.jpg" width="100%" /></td>
28
- <td><img src="results/pose2.png" width="100%" /></td>
29
- </tr>
30
- </table>
31
-
32
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
33
- <tr>
34
- <td>Pose</td>
35
- <td>Output</td>
36
- </tr>
37
- <tr>
38
- <td><img src="asset/pose.jpg" width="100%" /></td>
39
- <td><img src="results/pose.png" width="100%" /></td>
40
- </tr>
41
- </table>
42
-
43
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
44
- <tr>
45
- <td>Canny</td>
46
- <td>Output</td>
47
- </tr>
48
- <tr>
49
- <td><img src="asset/canny.jpg" width="100%" /></td>
50
- <td><img src="results/canny.png" width="100%" /></td>
51
- </tr>
52
- </table>
53
-
54
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
55
- <tr>
56
- <td>HED</td>
57
- <td>Output</td>
58
- </tr>
59
- <tr>
60
- <td><img src="asset/hed.jpg" width="100%" /></td>
61
- <td><img src="results/hed.png" width="100%" /></td>
62
- </tr>
63
- </table>
64
-
65
- <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
66
- <tr>
67
- <td>Depth</td>
68
- <td>Output</td>
69
- </tr>
70
- <tr>
71
- <td><img src="asset/depth.jpg" width="100%" /></td>
72
- <td><img src="results/depth.png" width="100%" /></td>
73
- </tr>
74
- </table>
75
-
76
- ## Inference
77
- Go to the VideoX-Fun repository for more details.
78
-
79
- Please clone the VideoX-Fun repository and create the required directories:
80
-
81
- ```sh
82
- # Clone the code
83
- git clone https://github.com/aigc-apps/VideoX-Fun.git
84
-
85
- # Enter VideoX-Fun's directory
86
- cd VideoX-Fun
87
-
88
- # Create model directories
89
- mkdir -p models/Diffusion_Transformer
90
- mkdir -p models/Personalized_Model
91
- ```
92
-
93
- Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
94
-
95
- ```
96
- πŸ“¦ models/
97
- β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
98
- β”‚ └── πŸ“‚ Z-Image-Turbo/
99
- β”œβ”€β”€ πŸ“‚ Personalized_Model/
100
- β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors
101
- ```
102
-
 
103
  Then run the file `examples/z_image_fun/predict_t2i_control.py`.
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: videox_fun
4
+ ---
5
+
6
+ # Z-Image-Turbo-Fun-Controlnet-Union
7
+
8
+ [![Github](https://img.shields.io/badge/🎬%20Code-Github-blue)](https://github.com/aigc-apps/VideoX-Fun)
9
+
10
+ ## Model Features
11
+ - This ControlNet is added on 6 blocks.
12
+ - The model was trained from scratch for 10,000 steps on a dataset of 1 million high-quality images covering both general and human-centric content. Training was performed at 1328 resolution using BFloat16 precision, with a batch size of 64, a learning rate of 2e-5, and a text dropout ratio of 0.10.
13
+ - It supports multiple control conditionsβ€”including Canny, HED, Depth, Pose and MLSD can be used like a standard ControlNet.
14
+ - You can adjust control_context_scale for stronger control and better detail preservation. For better stability, we highly recommend using a detailed prompt. The optimal range for control_context_scale is from 0.65 to 0.80.
15
+
16
+ ## TODO
17
+ - [ ] Train on more data and for more steps.
18
+ - [ ] Support inpaint mode.
19
+
20
+ ## Results
21
+
22
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
23
+ <tr>
24
+ <td>Pose</td>
25
+ <td>Output</td>
26
+ </tr>
27
+ <tr>
28
+ <td><img src="asset/pose2.jpg" width="100%" /></td>
29
+ <td><img src="results/pose2.png" width="100%" /></td>
30
+ </tr>
31
+ </table>
32
+
33
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
34
+ <tr>
35
+ <td>Pose</td>
36
+ <td>Output</td>
37
+ </tr>
38
+ <tr>
39
+ <td><img src="asset/pose.jpg" width="100%" /></td>
40
+ <td><img src="results/pose.png" width="100%" /></td>
41
+ </tr>
42
+ </table>
43
+
44
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
45
+ <tr>
46
+ <td>Canny</td>
47
+ <td>Output</td>
48
+ </tr>
49
+ <tr>
50
+ <td><img src="asset/canny.jpg" width="100%" /></td>
51
+ <td><img src="results/canny.png" width="100%" /></td>
52
+ </tr>
53
+ </table>
54
+
55
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
56
+ <tr>
57
+ <td>HED</td>
58
+ <td>Output</td>
59
+ </tr>
60
+ <tr>
61
+ <td><img src="asset/hed.jpg" width="100%" /></td>
62
+ <td><img src="results/hed.png" width="100%" /></td>
63
+ </tr>
64
+ </table>
65
+
66
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
67
+ <tr>
68
+ <td>Depth</td>
69
+ <td>Output</td>
70
+ </tr>
71
+ <tr>
72
+ <td><img src="asset/depth.jpg" width="100%" /></td>
73
+ <td><img src="results/depth.png" width="100%" /></td>
74
+ </tr>
75
+ </table>
76
+
77
+ ## Inference
78
+ Go to the VideoX-Fun repository for more details.
79
+
80
+ Please clone the VideoX-Fun repository and create the required directories:
81
+
82
+ ```sh
83
+ # Clone the code
84
+ git clone https://github.com/aigc-apps/VideoX-Fun.git
85
+
86
+ # Enter VideoX-Fun's directory
87
+ cd VideoX-Fun
88
+
89
+ # Create model directories
90
+ mkdir -p models/Diffusion_Transformer
91
+ mkdir -p models/Personalized_Model
92
+ ```
93
+
94
+ Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
95
+
96
+ ```
97
+ πŸ“¦ models/
98
+ β”œβ”€β”€ πŸ“‚ Diffusion_Transformer/
99
+ β”‚ └── πŸ“‚ Z-Image-Turbo/
100
+ β”œβ”€β”€ πŸ“‚ Personalized_Model/
101
+ β”‚ └── πŸ“¦ Z-Image-Turbo-Fun-Controlnet-Union.safetensors
102
+ ```
103
+
104
  Then run the file `examples/z_image_fun/predict_t2i_control.py`.