Skip to content

Commit 0ebe6fe

Browse files
authored
refactor: simplify the logic of pm id image loading (#827)
1 parent 55c2e05 commit 0ebe6fe

File tree

11 files changed

+182
-515
lines changed

11 files changed

+182
-515
lines changed

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -299,9 +299,6 @@ arguments:
299299
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
300300
--control-net [CONTROL_PATH] path to control net model
301301
--embd-dir [EMBEDDING_PATH] path to embeddings
302-
--stacked-id-embd-dir [DIR] path to PHOTOMAKER stacked id embeddings
303-
--input-id-images-dir [DIR] path to PHOTOMAKER input id images dir
304-
--normalize-input normalize PHOTOMAKER input id images
305302
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now
306303
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
307304
--type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K)
@@ -348,7 +345,6 @@ arguments:
348345
--high-noise-steps STEPS (high noise) number of sample steps (default: -1 = auto)
349346
SLG will be enabled at step int([STEPS]*[START]) and disabled at int([STEPS]*[END])
350347
--strength STRENGTH strength for noising/unnoising (default: 0.75)
351-
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20)
352348
--control-strength STRENGTH strength to apply Control Net (default: 0.9)
353349
1.0 corresponds to full destruction of information in init image
354350
-H, --height H image height, in pixel space (default: 512)
@@ -383,6 +379,11 @@ arguments:
383379
only enabled if `--high-noise-steps` is set to -1
384380
--flow-shift SHIFT shift value for Flow models like SD3.x or WAN (default: auto)
385381
--vace-strength wan vace strength
382+
--photo-maker path to PHOTOMAKER model
383+
--pm-id-images-dir [DIR] path to PHOTOMAKER input id images dir
384+
--pm-id-embed-path [PATH] path to PHOTOMAKER v2 id embed
385+
--pm-style-strength strength for keeping PHOTOMAKER input identity (default: 20)
386+
--normalize-input normalize PHOTOMAKER input id images
386387
-v, --verbose print extra info
387388
```
388389

docs/photo_maker.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,15 @@ You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personaliz
66

77
Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with ```stablediffusion.cpp```.
88

9-
- Specify the PhotoMaker model path using the `--stacked-id-embd-dir PATH` parameter.
10-
- Specify the input images path using the `--input-id-images-dir PATH` parameter.
11-
- input images **must** have the same width and height for preprocessing (to be improved)
9+
- Specify the PhotoMaker model path using the `--photo-maker PATH` parameter.
10+
- Specify the input images path using the `--pm-id-images-dir PATH` parameter.
1211

1312
In prompt, make sure you have a class word followed by the trigger word ```"img"``` (hard-coded for now). The class word could be one of ```"man, woman, girl, boy"```. If input ID images contain asian faces, add ```Asian``` before the class
1413
word.
1514

1615
Another PhotoMaker specific parameter:
1716

18-
- ```--style-ratio (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).
17+
- ```--pm-style-strength (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).
1918

2019
Other parameters recommended for running Photomaker:
2120

@@ -28,7 +27,7 @@ If on low memory GPUs (<= 8GB), recommend running with ```--vae-on-cpu``` option
2827
Example:
2928

3029
```bash
31-
bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors --vae ../models/sdxl_vae.safetensors --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir ../assets/photomaker_examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 10 --vae-on-cpu -o output.png
30+
bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors --vae ../models/sdxl_vae.safetensors --photo-maker ../models/photomaker-v1.safetensors --pm-id-images-dir ../assets/photomaker_examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --pm-style-strength 10 --vae-on-cpu --steps 50
3231
```
3332

3433
## PhotoMaker Version 2

examples/cli/main.cpp

Lines changed: 104 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,6 @@ struct SDParams {
6666
std::string esrgan_path;
6767
std::string control_net_path;
6868
std::string embedding_dir;
69-
std::string stacked_id_embed_dir;
70-
std::string input_id_images_path;
7169
sd_type_t wtype = SD_TYPE_COUNT;
7270
std::string tensor_type_rules;
7371
std::string lora_model_dir;
@@ -82,11 +80,10 @@ struct SDParams {
8280

8381
std::string prompt;
8482
std::string negative_prompt;
85-
float style_ratio = 20.f;
86-
int clip_skip = -1; // <= 0 represents unspecified
87-
int width = 512;
88-
int height = 512;
89-
int batch_count = 1;
83+
int clip_skip = -1; // <= 0 represents unspecified
84+
int width = 512;
85+
int height = 512;
86+
int batch_count = 1;
9087

9188
std::vector<int> skip_layers = {7, 8, 9};
9289
sd_sample_params_t sample_params;
@@ -116,6 +113,12 @@ struct SDParams {
116113
bool color = false;
117114
int upscale_repeats = 1;
118115

116+
// Photo Maker
117+
std::string photo_maker_path;
118+
std::string pm_id_images_dir;
119+
std::string pm_id_embed_path;
120+
float pm_style_strength = 20.f;
121+
119122
bool chroma_use_dit_mask = true;
120123
bool chroma_use_t5_mask = false;
121124
int chroma_t5_mask_pad = 1;
@@ -149,9 +152,10 @@ void print_params(SDParams params) {
149152
printf(" esrgan_path: %s\n", params.esrgan_path.c_str());
150153
printf(" control_net_path: %s\n", params.control_net_path.c_str());
151154
printf(" embedding_dir: %s\n", params.embedding_dir.c_str());
152-
printf(" stacked_id_embed_dir: %s\n", params.stacked_id_embed_dir.c_str());
153-
printf(" input_id_images_path: %s\n", params.input_id_images_path.c_str());
154-
printf(" style ratio: %.2f\n", params.style_ratio);
155+
printf(" photo_maker_path: %s\n", params.photo_maker_path.c_str());
156+
printf(" pm_id_images_dir: %s\n", params.pm_id_images_dir.c_str());
157+
printf(" pm_id_embed_path: %s\n", params.pm_id_embed_path.c_str());
158+
printf(" pm_style_strength: %.2f\n", params.pm_style_strength);
155159
printf(" normalize input image: %s\n", params.normalize_input ? "true" : "false");
156160
printf(" output_path: %s\n", params.output_path.c_str());
157161
printf(" init_image_path: %s\n", params.init_image_path.c_str());
@@ -217,9 +221,6 @@ void print_usage(int argc, const char* argv[]) {
217221
printf(" --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)\n");
218222
printf(" --control-net [CONTROL_PATH] path to control net model\n");
219223
printf(" --embd-dir [EMBEDDING_PATH] path to embeddings\n");
220-
printf(" --stacked-id-embd-dir [DIR] path to PHOTOMAKER stacked id embeddings\n");
221-
printf(" --input-id-images-dir [DIR] path to PHOTOMAKER input id images dir\n");
222-
printf(" --normalize-input normalize PHOTOMAKER input id images\n");
223224
printf(" --upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now\n");
224225
printf(" --upscale-repeats Run the ESRGAN upscaler this many times (default 1)\n");
225226
printf(" --type [TYPE] weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K)\n");
@@ -266,7 +267,6 @@ void print_usage(int argc, const char* argv[]) {
266267
printf(" --high-noise-steps STEPS (high noise) number of sample steps (default: -1 = auto)\n");
267268
printf(" SLG will be enabled at step int([STEPS]*[START]) and disabled at int([STEPS]*[END])\n");
268269
printf(" --strength STRENGTH strength for noising/unnoising (default: 0.75)\n");
269-
printf(" --style-ratio STYLE-RATIO strength for keeping input identity (default: 20)\n");
270270
printf(" --control-strength STRENGTH strength to apply Control Net (default: 0.9)\n");
271271
printf(" 1.0 corresponds to full destruction of information in init image\n");
272272
printf(" -H, --height H image height, in pixel space (default: 512)\n");
@@ -301,6 +301,11 @@ void print_usage(int argc, const char* argv[]) {
301301
printf(" only enabled if `--high-noise-steps` is set to -1\n");
302302
printf(" --flow-shift SHIFT shift value for Flow models like SD3.x or WAN (default: auto)\n");
303303
printf(" --vace-strength wan vace strength\n");
304+
printf(" --photo-maker path to PHOTOMAKER model\n");
305+
printf(" --pm-id-images-dir [DIR] path to PHOTOMAKER input id images dir\n");
306+
printf(" --pm-id-embed-path [PATH] path to PHOTOMAKER v2 id embed\n");
307+
printf(" --pm-style-strength strength for keeping PHOTOMAKER input identity (default: 20)\n");
308+
printf(" --normalize-input normalize PHOTOMAKER input id images\n");
304309
printf(" -v, --verbose print extra info\n");
305310
}
306311

@@ -487,12 +492,13 @@ void parse_args(int argc, const char** argv, SDParams& params) {
487492
{"", "--taesd", "", &params.taesd_path},
488493
{"", "--control-net", "", &params.control_net_path},
489494
{"", "--embd-dir", "", &params.embedding_dir},
490-
{"", "--stacked-id-embd-dir", "", &params.stacked_id_embed_dir},
491495
{"", "--lora-model-dir", "", &params.lora_model_dir},
492496
{"-i", "--init-img", "", &params.init_image_path},
493497
{"", "--end-img", "", &params.end_image_path},
494498
{"", "--tensor-type-rules", "", &params.tensor_type_rules},
495-
{"", "--input-id-images-dir", "", &params.input_id_images_path},
499+
{"", "--photo-maker", "", &params.photo_maker_path},
500+
{"", "--pm-id-images-dir", "", &params.pm_id_images_dir},
501+
{"", "--pm-id-embed-path", "", &params.pm_id_embed_path},
496502
{"", "--mask", "", &params.mask_image_path},
497503
{"", "--control-image", "", &params.control_image_path},
498504
{"", "--control-video", "", &params.control_video_path},
@@ -532,7 +538,7 @@ void parse_args(int argc, const char** argv, SDParams& params) {
532538
{"", "--high-noise-skip-layer-end", "", &params.high_noise_sample_params.guidance.slg.layer_end},
533539
{"", "--high-noise-eta", "", &params.high_noise_sample_params.eta},
534540
{"", "--strength", "", &params.strength},
535-
{"", "--style-ratio", "", &params.style_ratio},
541+
{"", "--pm-style-strength", "", &params.pm_style_strength},
536542
{"", "--control-strength", "", &params.control_strength},
537543
{"", "--moe-boundary", "", &params.moe_boundary},
538544
{"", "--flow-shift", "", &params.flow_shift},
@@ -1075,14 +1081,58 @@ uint8_t* load_image(const char* image_path, int& width, int& height, int expecte
10751081
STBIR_EDGE_CLAMP, STBIR_EDGE_CLAMP,
10761082
STBIR_FILTER_BOX, STBIR_FILTER_BOX,
10771083
STBIR_COLORSPACE_SRGB, nullptr);
1078-
1079-
// Save resized result
1084+
width = resized_width;
1085+
height = resized_height;
10801086
free(image_buffer);
10811087
image_buffer = resized_image_buffer;
10821088
}
10831089
return image_buffer;
10841090
}
10851091

1092+
bool load_images_from_dir(const std::string dir,
1093+
std::vector<sd_image_t>& images,
1094+
int expected_width = 0,
1095+
int expected_height = 0,
1096+
int max_image_num = 0,
1097+
bool verbose = false) {
1098+
if (!fs::exists(dir) || !fs::is_directory(dir)) {
1099+
fprintf(stderr, "'%s' is not a valid directory\n", dir.c_str());
1100+
return false;
1101+
}
1102+
1103+
for (const auto& entry : fs::directory_iterator(dir)) {
1104+
if (!entry.is_regular_file())
1105+
continue;
1106+
1107+
std::string path = entry.path().string();
1108+
std::string ext = entry.path().extension().string();
1109+
std::transform(ext.begin(), ext.end(), ext.begin(), ::tolower);
1110+
1111+
if (ext == ".jpg" || ext == ".jpeg" || ext == ".png" || ext == ".bmp") {
1112+
if (verbose) {
1113+
printf("load image %zu from '%s'\n", images.size(), path.c_str());
1114+
}
1115+
int width = 0;
1116+
int height = 0;
1117+
uint8_t* image_buffer = load_image(path.c_str(), width, height, expected_width, expected_height);
1118+
if (image_buffer == NULL) {
1119+
fprintf(stderr, "load image from '%s' failed\n", path.c_str());
1120+
return false;
1121+
}
1122+
1123+
images.push_back({(uint32_t)width,
1124+
(uint32_t)height,
1125+
3,
1126+
image_buffer});
1127+
1128+
if (max_image_num > 0 && images.size() >= max_image_num) {
1129+
break;
1130+
}
1131+
}
1132+
}
1133+
return true;
1134+
}
1135+
10861136
int main(int argc, const char* argv[]) {
10871137
SDParams params;
10881138
parse_args(argc, argv, params);
@@ -1122,21 +1172,27 @@ int main(int argc, const char* argv[]) {
11221172
sd_image_t control_image = {(uint32_t)params.width, (uint32_t)params.height, 3, NULL};
11231173
sd_image_t mask_image = {(uint32_t)params.width, (uint32_t)params.height, 1, NULL};
11241174
std::vector<sd_image_t> ref_images;
1175+
std::vector<sd_image_t> pmid_images;
11251176
std::vector<sd_image_t> control_frames;
11261177

11271178
auto release_all_resources = [&]() {
11281179
free(init_image.data);
11291180
free(end_image.data);
11301181
free(control_image.data);
11311182
free(mask_image.data);
1132-
for (auto ref_image : ref_images) {
1133-
free(ref_image.data);
1134-
ref_image.data = NULL;
1183+
for (auto image : ref_images) {
1184+
free(image.data);
1185+
image.data = NULL;
11351186
}
11361187
ref_images.clear();
1137-
for (auto frame : control_frames) {
1138-
free(frame.data);
1139-
frame.data = NULL;
1188+
for (auto image : pmid_images) {
1189+
free(image.data);
1190+
image.data = NULL;
1191+
}
1192+
pmid_images.clear();
1193+
for (auto image : control_frames) {
1194+
free(image.data);
1195+
image.data = NULL;
11401196
}
11411197
control_frames.clear();
11421198
};
@@ -1225,44 +1281,26 @@ int main(int argc, const char* argv[]) {
12251281
}
12261282

12271283
if (!params.control_video_path.empty()) {
1228-
std::string dir = params.control_video_path;
1229-
1230-
if (!fs::exists(dir) || !fs::is_directory(dir)) {
1231-
fprintf(stderr, "'%s' is not a valid directory\n", dir.c_str());
1284+
if (!load_images_from_dir(params.control_video_path,
1285+
control_frames,
1286+
params.width,
1287+
params.height,
1288+
params.video_frames,
1289+
params.verbose)) {
12321290
release_all_resources();
12331291
return 1;
12341292
}
1293+
}
12351294

1236-
for (const auto& entry : fs::directory_iterator(dir)) {
1237-
if (!entry.is_regular_file())
1238-
continue;
1239-
1240-
std::string path = entry.path().string();
1241-
std::string ext = entry.path().extension().string();
1242-
std::transform(ext.begin(), ext.end(), ext.begin(), ::tolower);
1243-
1244-
if (ext == ".jpg" || ext == ".jpeg" || ext == ".png" || ext == ".bmp") {
1245-
if (params.verbose) {
1246-
printf("load control frame %zu from '%s'\n", control_frames.size(), path.c_str());
1247-
}
1248-
int width = 0;
1249-
int height = 0;
1250-
uint8_t* image_buffer = load_image(path.c_str(), width, height, params.width, params.height);
1251-
if (image_buffer == NULL) {
1252-
fprintf(stderr, "load image from '%s' failed\n", path.c_str());
1253-
release_all_resources();
1254-
return 1;
1255-
}
1256-
1257-
control_frames.push_back({(uint32_t)params.width,
1258-
(uint32_t)params.height,
1259-
3,
1260-
image_buffer});
1261-
1262-
if (control_frames.size() >= params.video_frames) {
1263-
break;
1264-
}
1265-
}
1295+
if (!params.pm_id_images_dir.empty()) {
1296+
if (!load_images_from_dir(params.pm_id_images_dir,
1297+
pmid_images,
1298+
0,
1299+
0,
1300+
0,
1301+
params.verbose)) {
1302+
release_all_resources();
1303+
return 1;
12661304
}
12671305
}
12681306

@@ -1283,7 +1321,7 @@ int main(int argc, const char* argv[]) {
12831321
params.control_net_path.c_str(),
12841322
params.lora_model_dir.c_str(),
12851323
params.embedding_dir.c_str(),
1286-
params.stacked_id_embed_dir.c_str(),
1324+
params.photo_maker_path.c_str(),
12871325
vae_decode_only,
12881326
true,
12891327
params.n_threads,
@@ -1334,9 +1372,13 @@ int main(int argc, const char* argv[]) {
13341372
params.batch_count,
13351373
control_image,
13361374
params.control_strength,
1337-
params.style_ratio,
13381375
params.normalize_input,
1339-
params.input_id_images_path.c_str(),
1376+
{
1377+
pmid_images.data(),
1378+
(int)pmid_images.size(),
1379+
params.pm_id_embed_path.c_str(),
1380+
params.pm_style_strength,
1381+
}, // pm_params
13401382
params.vae_tiling_params,
13411383
};
13421384

0 commit comments

Comments
 (0)