|
1 | 1 | # Run |
2 | 2 |
|
3 | 3 | ``` |
4 | | -usage: ./bin/sd [arguments] |
| 4 | +usage: ./bin/sd [options] |
5 | 5 |
|
6 | | -arguments: |
7 | | - -h, --help show this help message and exit |
8 | | - -M, --mode [MODE] run mode, one of: [img_gen, vid_gen, upscale, convert], default: img_gen |
9 | | - -t, --threads N number of threads to use during computation (default: -1) |
10 | | - If threads <= 0, then threads will be set to the number of CPU physical cores |
11 | | - --offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM when needed |
12 | | - -m, --model [MODEL] path to full model |
13 | | - --diffusion-model path to the standalone diffusion model |
14 | | - --high-noise-diffusion-model path to the standalone high noise diffusion model |
15 | | - --clip_l path to the clip-l text encoder |
16 | | - --clip_g path to the clip-g text encoder |
17 | | - --clip_vision path to the clip-vision encoder |
18 | | - --t5xxl path to the t5xxl text encoder |
19 | | - --qwen2vl path to the qwen2vl text encoder |
20 | | - --qwen2vl_vision path to the qwen2vl vit |
21 | | - --vae [VAE] path to vae |
22 | | - --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) |
23 | | - --control-net [CONTROL_PATH] path to control net model |
24 | | - --embd-dir [EMBEDDING_PATH] path to embeddings |
25 | | - --upscale-model [ESRGAN_PATH] path to esrgan model. For img_gen mode, upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now |
26 | | - --upscale-repeats Run the ESRGAN upscaler this many times (default 1) |
27 | | - --type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K) |
28 | | - If not specified, the default is the type of the weight file |
29 | | - --tensor-type-rules [EXPRESSION] weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") |
30 | | - --lora-model-dir [DIR] lora model directory |
31 | | - -i, --init-img [IMAGE] path to the init image, required by img2img |
32 | | - --mask [MASK] path to the mask image, required by img2img with mask |
33 | | - -i, --end-img [IMAGE] path to the end image, required by flf2v |
34 | | - --control-image [IMAGE] path to image condition, control net |
35 | | - -r, --ref-image [PATH] reference image for Flux Kontext models (can be used multiple times) |
36 | | - --control-video [PATH] path to control video frames, It must be a directory path. |
37 | | - The video frames inside should be stored as images in lexicographical (character) order |
38 | | - For example, if the control video path is `frames`, the directory contain images such as 00.png, 01.png, ... etc. |
39 | | - --increase-ref-index automatically increase the indices of references images based on the order they are listed (starting with 1). |
40 | | - --disable-auto-resize-ref-image disable auto resize of ref images |
41 | | - -o, --output OUTPUT path to write result image to (default: ./output.png) |
42 | | - -p, --prompt [PROMPT] the prompt to render |
43 | | - -n, --negative-prompt PROMPT the negative prompt (default: "") |
44 | | - --cfg-scale SCALE unconditional guidance scale: (default: 7.0) |
45 | | - --img-cfg-scale SCALE image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale) |
46 | | - --guidance SCALE distilled guidance scale for models with guidance input (default: 3.5) |
47 | | - --slg-scale SCALE skip layer guidance (SLG) scale, only for DiT models: (default: 0) |
48 | | - 0 means disabled, a value of 2.5 is nice for sd3.5 medium |
49 | | - --eta SCALE eta in DDIM, only for DDIM and TCD: (default: 0) |
50 | | - --skip-layers LAYERS Layers to skip for SLG steps: (default: [7,8,9]) |
51 | | - --skip-layer-start START SLG enabling point: (default: 0.01) |
52 | | - --skip-layer-end END SLG disabling point: (default: 0.2) |
53 | | - --scheduler {discrete, karras, exponential, ays, gits, smoothstep, sgm_uniform, simple} Denoiser sigma scheduler (default: discrete) |
54 | | - --sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd} |
55 | | - sampling method (default: "euler" for Flux/SD3/Wan, "euler_a" otherwise) |
56 | | - --timestep-shift N shift timestep for NitroFusion models, default: 0, recommended N for NitroSD-Realism around 250 and 500 for NitroSD-Vibrant |
57 | | - --steps STEPS number of sample steps (default: 20) |
58 | | - --high-noise-cfg-scale SCALE (high noise) unconditional guidance scale: (default: 7.0) |
59 | | - --high-noise-img-cfg-scale SCALE (high noise) image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale) |
60 | | - --high-noise-guidance SCALE (high noise) distilled guidance scale for models with guidance input (default: 3.5) |
61 | | - --high-noise-slg-scale SCALE (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: 0) |
62 | | - 0 means disabled, a value of 2.5 is nice for sd3.5 medium |
63 | | - --high-noise-eta SCALE (high noise) eta in DDIM, only for DDIM and TCD: (default: 0) |
64 | | - --high-noise-skip-layers LAYERS (high noise) Layers to skip for SLG steps: (default: [7,8,9]) |
65 | | - --high-noise-skip-layer-start (high noise) SLG enabling point: (default: 0.01) |
66 | | - --high-noise-skip-layer-end END (high noise) SLG disabling point: (default: 0.2) |
67 | | - --high-noise-scheduler {discrete, karras, exponential, ays, gits, smoothstep, sgm_uniform, simple} Denoiser sigma scheduler (default: discrete) |
68 | | - --high-noise-sampling-method {euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, tcd} |
69 | | - (high noise) sampling method (default: "euler_a") |
70 | | - --high-noise-steps STEPS (high noise) number of sample steps (default: -1 = auto) |
71 | | - SLG will be enabled at step int([STEPS]*[START]) and disabled at int([STEPS]*[END]) |
72 | | - --strength STRENGTH strength for noising/unnoising (default: 0.75) |
73 | | - --control-strength STRENGTH strength to apply Control Net (default: 0.9) |
74 | | - 1.0 corresponds to full destruction of information in init image |
75 | | - -H, --height H image height, in pixel space (default: 512) |
76 | | - -W, --width W image width, in pixel space (default: 512) |
77 | | - --rng {std_default, cuda} RNG (default: cuda) |
78 | | - -s SEED, --seed SEED RNG seed (default: 42, use random seed for < 0) |
79 | | - -b, --batch-count COUNT number of images to generate |
80 | | - --prediction {eps, v, edm_v, sd3_flow, flux_flow} Prediction type override |
81 | | - --clip-skip N ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1) |
82 | | - <= 0 represents unspecified, will be 1 for SD1.x, 2 for SD2.x |
83 | | - --vae-tiling process vae in tiles to reduce memory usage |
84 | | - --vae-tile-size [X]x[Y] tile size for vae tiling (default: 32x32) |
85 | | - --vae-relative-tile-size [X]x[Y] relative tile size for vae tiling, in fraction of image size if < 1, in number of tiles per dim if >=1 (overrides --vae-tile-size) |
86 | | - --vae-tile-overlap OVERLAP tile overlap for vae tiling, in fraction of tile size (default: 0.5) |
87 | | - --force-sdxl-vae-conv-scale force use of conv scale on sdxl vae |
88 | | - --vae-on-cpu keep vae in cpu (for low vram) |
89 | | - --clip-on-cpu keep clip in cpu (for low vram) |
90 | | - --diffusion-fa use flash attention in the diffusion model (for low vram) |
91 | | - Might lower quality, since it implies converting k and v to f16. |
92 | | - This might crash if it is not supported by the backend. |
93 | | - --diffusion-conv-direct use Conv2d direct in the diffusion model |
94 | | - This might crash if it is not supported by the backend. |
95 | | - --vae-conv-direct use Conv2d direct in the vae model (should improve the performance) |
96 | | - This might crash if it is not supported by the backend. |
97 | | - --control-net-cpu keep controlnet in cpu (for low vram) |
98 | | - --canny apply canny preprocessor (edge detection) |
99 | | - --color colors the logging tags according to level |
100 | | - --chroma-disable-dit-mask disable dit mask for chroma |
101 | | - --chroma-enable-t5-mask enable t5 mask for chroma |
102 | | - --chroma-t5-mask-pad PAD_SIZE t5 mask pad size of chroma |
103 | | - --video-frames video frames (default: 1) |
104 | | - --fps fps (default: 24) |
105 | | - --moe-boundary BOUNDARY timestep boundary for Wan2.2 MoE model. (default: 0.875) |
106 | | - only enabled if `--high-noise-steps` is set to -1 |
107 | | - --flow-shift SHIFT shift value for Flow models like SD3.x or WAN (default: auto) |
108 | | - --vace-strength wan vace strength |
109 | | - --photo-maker path to PHOTOMAKER model |
110 | | - --pm-id-images-dir [DIR] path to PHOTOMAKER input id images dir |
111 | | - --pm-id-embed-path [PATH] path to PHOTOMAKER v2 id embed |
112 | | - --pm-style-strength strength for keeping PHOTOMAKER input identity (default: 20) |
113 | | - -v, --verbose print extra info |
| 6 | +Options: |
| 7 | + -m, --model <string> path to full model |
| 8 | + --clip_l <string> path to the clip-l text encoder |
| 9 | + --clip_g <string> path to the clip-g text encoder |
| 10 | + --clip_vision <string> path to the clip-vision encoder |
| 11 | + --t5xxl <string> path to the t5xxl text encoder |
| 12 | + --qwen2vl <string> path to the qwen2vl text encoder |
| 13 | + --qwen2vl_vision <string> path to the qwen2vl vit |
| 14 | + --diffusion-model <string> path to the standalone diffusion model |
| 15 | + --high-noise-diffusion-model <string> path to the standalone high noise diffusion model |
| 16 | + --vae <string> path to standalone vae model |
| 17 | + --taesd <string> path to taesd. Using Tiny AutoEncoder for fast decoding (low quality) |
| 18 | + --control-net <string> path to control net model |
| 19 | + --embd-dir <string> embeddings directory |
| 20 | + --lora-model-dir <string> lora model directory |
| 21 | + -i, --init-img <string> path to the init image |
| 22 | + --end-img <string> path to the end image, required by flf2v |
| 23 | + --tensor-type-rules <string> weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0") |
| 24 | + --photo-maker <string> path to PHOTOMAKER model |
| 25 | + --pm-id-images-dir <string> path to PHOTOMAKER input id images dir |
| 26 | + --pm-id-embed-path <string> path to PHOTOMAKER v2 id embed |
| 27 | + --mask <string> path to the mask image |
| 28 | + --control-image <string> path to control image, control net |
| 29 | + --control-video <string> path to control video frames, It must be a directory path. The video frames inside should be stored as images in |
| 30 | + lexicographical (character) order. For example, if the control video path is |
| 31 | + `frames`, the directory contain images such as 00.png, 01.png, ... etc. |
| 32 | + -o, --output <string> path to write result image to (default: ./output.png) |
| 33 | + -p, --prompt <string> the prompt to render |
| 34 | + -n, --negative-prompt <string> the negative prompt (default: "") |
| 35 | + --upscale-model <string> path to esrgan model. |
| 36 | + -t, --threads <int> number of threads to use during computation (default: -1). If threads <= 0, then threads will be set to the number of |
| 37 | + CPU physical cores |
| 38 | + --upscale-repeats <int> Run the ESRGAN upscaler this many times (default: 1) |
| 39 | + -H, --height <int> image height, in pixel space (default: 512) |
| 40 | + -W, --width <int> image width, in pixel space (default: 512) |
| 41 | + --steps <int> number of sample steps (default: 20) |
| 42 | + --high-noise-steps <int> (high noise) number of sample steps (default: -1 = auto) |
| 43 | + --clip-skip <int> ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1). <= 0 represents unspecified, |
| 44 | + will be 1 for SD1.x, 2 for SD2.x |
| 45 | + -b, --batch-count <int> batch count |
| 46 | + --chroma-t5-mask-pad <int> t5 mask pad size of chroma |
| 47 | + --video-frames <int> video frames (default: 1) |
| 48 | + --fps <int> fps (default: 24) |
| 49 | + --timestep-shift <int> shift timestep for NitroFusion models (default: 0). recommended N for NitroSD-Realism around 250 and 500 for |
| 50 | + NitroSD-Vibrant |
| 51 | + --cfg-scale <float> unconditional guidance scale: (default: 7.0) |
| 52 | + --img-cfg-scale <float> image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale) |
| 53 | + --guidance <float> distilled guidance scale for models with guidance input (default: 3.5) |
| 54 | + --slg-scale <float> skip layer guidance (SLG) scale, only for DiT models: (default: 0). 0 means disabled, a value of 2.5 is nice for sd3.5 |
| 55 | + medium |
| 56 | + --skip-layer-start <float> SLG enabling point (default: 0.01) |
| 57 | + --skip-layer-end <float> SLG disabling point (default: 0.2) |
| 58 | + --eta <float> eta in DDIM, only for DDIM and TCD (default: 0) |
| 59 | + --high-noise-cfg-scale <float> (high noise) unconditional guidance scale: (default: 7.0) |
| 60 | + --high-noise-img-cfg-scale <float> (high noise) image guidance scale for inpaint or instruct-pix2pix models (default: same as --cfg-scale) |
| 61 | + --high-noise-guidance <float> (high noise) distilled guidance scale for models with guidance input (default: 3.5) |
| 62 | + --high-noise-slg-scale <float> (high noise) skip layer guidance (SLG) scale, only for DiT models: (default: 0) |
| 63 | + --high-noise-skip-layer-start <float> (high noise) SLG enabling point (default: 0.01) |
| 64 | + --high-noise-skip-layer-end <float> (high noise) SLG disabling point (default: 0.2) |
| 65 | + --high-noise-eta <float> (high noise) eta in DDIM, only for DDIM and TCD (default: 0) |
| 66 | + --strength <float> strength for noising/unnoising (default: 0.75) |
| 67 | + --pm-style-strength <float> |
| 68 | + --control-strength <float> strength to apply Control Net (default: 0.9). 1.0 corresponds to full destruction of information in init image |
| 69 | + --moe-boundary <float> timestep boundary for Wan2.2 MoE model. (default: 0.875). Only enabled if `--high-noise-steps` is set to -1 |
| 70 | + --flow-shift <float> shift value for Flow models like SD3.x or WAN (default: auto) |
| 71 | + --vace-strength <float> wan vace strength |
| 72 | + --vae-tile-overlap <float> tile overlap for vae tiling, in fraction of tile size (default: 0.5) |
| 73 | + --vae-tiling process vae in tiles to reduce memory usage |
| 74 | + --force-sdxl-vae-conv-scale force use of conv scale on sdxl vae |
| 75 | + --offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM when needed |
| 76 | + --control-net-cpu keep controlnet in cpu (for low vram) |
| 77 | + --clip-on-cpu keep clip in cpu (for low vram) |
| 78 | + --vae-on-cpu keep vae in cpu (for low vram) |
| 79 | + --diffusion-fa use flash attention in the diffusion model |
| 80 | + --diffusion-conv-direct use ggml_conv2d_direct in the diffusion model |
| 81 | + --vae-conv-direct use ggml_conv2d_direct in the vae model |
| 82 | + --canny apply canny preprocessor (edge detection) |
| 83 | + -v, --verbose print extra info |
| 84 | + --color colors the logging tags according to level |
| 85 | + --chroma-disable-dit-mask disable dit mask for chroma |
| 86 | + --chroma-enable-t5-mask enable t5 mask for chroma |
| 87 | + --increase-ref-index automatically increase the indices of references images based on the order they are listed (starting with 1). |
| 88 | + --disable-auto-resize-ref-image disable auto resize of ref images |
| 89 | + -M, --mode run mode, one of [img_gen, vid_gen, upscale, convert], default: img_gen |
| 90 | + --type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K). If not specified, the default is the |
| 91 | + type of the weight file |
| 92 | + --rng RNG, one of [std_default, cuda], default: cuda |
| 93 | + -s, --seed RNG seed (default: 42, use random seed for < 0) |
| 94 | + --sampling-method sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing, |
| 95 | + tcd] (default: euler for Flux/SD3/Wan, euler_a otherwise) |
| 96 | + --prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow] |
| 97 | + --scheduler denoiser sigma scheduler, one of [discrete, karras, exponential, ays, gits, smoothstep, sgm_uniform, simple], default: |
| 98 | + discrete |
| 99 | + --skip-layers layers to skip for SLG steps (default: [7,8,9]) |
| 100 | + --high-noise-sampling-method (high noise) sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, |
| 101 | + ddim_trailing, tcd] default: euler for Flux/SD3/Wan, euler_a otherwise |
| 102 | + --high-noise-scheduler (high noise) denoiser sigma scheduler, one of [discrete, karras, exponential, ays, gits, smoothstep, sgm_uniform, |
| 103 | + simple], default: discrete |
| 104 | + --high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9]) |
| 105 | + -r, --ref-image reference image for Flux Kontext models (can be used multiple times) |
| 106 | + -h, --help show this help message and exit |
| 107 | + --vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32) |
| 108 | + --vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size if < 1, in number of tiles per dim if >=1 |
| 109 | + (overrides --vae-tile-size) |
114 | 110 | ``` |
0 commit comments