Skip to content

Conversation

@semjonsona
Copy link
Collaborator

@semjonsona semjonsona commented Jul 5, 2023

Status (2023-07-18): needs merging

This is expected to be semi-broken. My plan currently is to refactor it more, and only then do a good test to see what is broken (and fix it). Still would appreciate nods into the right directions - what broke and needs fixing, what could be improved in the refactor, what else is worth to refactor.

The changed line count is inflated by changing tabs to spaces. Some editors (e.g. Pycharm 2023.1) will show whitespace edits differently than style and significant edits. Also, I moved code to different files.

@semjonsona
Copy link
Collaborator Author

@graemeniedermayer @thygate I would kindly ask you to keep an eye on this, as this will create totally massive merge conflicts if main branch receives significant changes while this is in the works.

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 6, 2023

@thygate
(A)
I am now refactoring invert_depth. Did I get all this right?

invert_depth is enabled with this option Desired effect of invert_depth on the option
Clip and renormalize Resulting image is inverted and clipped
Generate stereoscopic image(s) Resulting depthmap is inverted, no effect on the stereo image
Generate simple 3D mesh No effect, mesh is exactly as if invert_depth was disabled
Generate 3D inpainted mesh No effect, mesh is exactly as if invert_depth was disabled
Remove background Does not depend on the generated depthmap, no effect

Large refactor part, may be broken

Also indent everything with spaces, add/remove line breaks and add some TODOs
…_depthmap

Large refactor part, may be broken
Large refactor part, may be broken
Large refactor part, may be broken
@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 6, 2023

@thygate
(B)
Clipdepth (Clip and renormalize) is also quite tricky. Do you think these interactions are ok? Would you change something? These are just a plan at the moment, at this stage it is easy to change.

clipdepth is enabled with this option Desired effect of clipdepth on the option
Invert depth Resulting image is inverted and clipped
Generate stereoscopic image(s) Depending on whether it is clipped from below or above: the closest objects will appear as if distance to them is the same and/or the furthest objects will appear as having the same distance
Generate simple 3D mesh Mesh will be distorted along the depth axis
Generate 3D inpainted mesh Mesh will be distorted along the depth axis
Remove background Does not depend on the generated depthmap, no effect

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 6, 2023

@thygate
(C)
There is an option to show depthmap, and to save depthmap. Do you think it would make sense to always show everything (depthmaps, stereoimages, etc.), but have an option called "save results" (or something like that), that would be enabled by default? This approach is marginally easier to implement and IMHO would be more user-friendly.

@thygate
Copy link
Owner

thygate commented Jul 6, 2023

(C) Yes! The show depthmap is a leftover from the very first version, it serves no purpose. Always showing everything that was generated would be the most intuitive I agree.

@thygate
Copy link
Owner

thygate commented Jul 6, 2023

(B) The clip depth (far/near) function was something I implemented quickly and never really properly tested tbh ..
It was requested at some point, but I'm not sure anyone is actually using it, I believe it was for raw depthmap outputs only, I don't think, or can't think of a scenario where it would be useful for the other options, like generating stereo pairs or meshes ..
So the suggested outcomes seem fine to me ..

@thygate
Copy link
Owner

thygate commented Jul 6, 2023

(A) No effect for all seems like the correct approach to me, this option (invert depth) was meant for raw depthmap output only since some external software requires it. It should have no effect on any of the options except clip and renormalize. So all seem good to me.

Large refactor part, may be broken

Also fix downloading models
Large refactor part, may be broken

Also restore "wrong mode" bug workaround
@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 7, 2023

@thygate Thank you!
(C) Implemented :)
(A) (B) Working on it... Done!

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 8, 2023

(D) net_w and net_h are assigned, but never used; net_width and net_height are actually used. I guess it should work differently... Do I get it right that every model has its original (suggested) net size, but sometimes it is beneficial to override it? How do you think: maybe the net size should be a dropdown with values: "default" (match model), "match image" and "custom", where custom shows two sliders: for width and height? Or simply "match model size" and "match input size"?

@thygate
Copy link
Owner

thygate commented Jul 9, 2023

(D) Yes correct, net_w and net_h were the initial model sizes, at the start the sliders would mirror this initial model size, but this changed at one point and I never removed them. Drop down would be fine, but not all models support all sizes, and tweaking the size manually can be beneficial to tune for detail. I would prefer to keep the option to manually set size.

Large refactor part, may be broken
Large refactor part, may be broken
@semjonsona semjonsona force-pushed the major-refactor branch 2 times, most recently from f35d5ae to b8c72a9 Compare July 11, 2023 13:22
@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 11, 2023

Alright! I am tempted to call it a day week! This MR substantially improves the overall state of the code. As with many other projects, adding features (especially in a short timeframe) usually contributes to tech debt - this MR deals with this debt, allowing to add new features easier.

3556223 Two Gradio interfaces that we currently have (script and tab) were improved a bit and moved to a separate file. These interfaces no longer rely on positioned arguments (I always got scared to touch them in any non-trivial way) - GradioComponentBundle now conveniently packs them and sends them in an easy package.

eb82cd4 Supplying custom depthmaps had a quirky logic, where run_depthmap would try to find the files itself. A more general logic was introduced: depthmap for any image may be provided (and so generating is not required). This reduces the number of edge cases, which is always good. [More edge cases = harder to figure our and keep in mind all the different ways in which stuff could break; less entangled = less complicated = easier to navigate and edit.] Also, not trying to load/save files helps in case we want to add new interfaces in the future.

c414acd Tweaked code to reduce duplication (Don't-repeat-yourself). Funnily enough, the line count got bigger. Welp, at least it is now easier to change save parameters.

f65d8a7 Loading and storing models in memory is best done outside of the big function - this is quite a separate task from everything else that the big function did. This approach makes the model-lifetime logic much more clear, thus allowing to modify it with less fear. It would be cool to implement a timer (or a callback?) that would unload depth and pix2pix models and reload SD model when the Depth tab is unselected - this would help us win a couple of seconds from every consecutive run with the same model.

There are still things to refactor (some of them are marked in the code itself), but I think it is a good idea to stop for now - endless refactoring starts to wear me down a bit. Improving the code further is still needed, this can be done while implementing new features. For example, before adding a CLI interface (a suggested feature), core should be disentangled from webui and gradio stuff.

I did some simple testing, but for such a behemoth of a MR... We could expect that every little thing that could break, broke. Even after the testing we probably will see a temporary influx of bugs. This is the price for having less bugs in the long run (especially the weird bugs).

@thygate I would be very grateful if you would go trough all the changed code and notify me of places that are important to improve before this is merged. If there is anything potentially scary, it'd be better fixed before the merge. I kindly ask you to give this MR a good test - it is highly likely that there is a regression or two lurking within the changes, that perhaps may be not so obvious to see.

@thygate
Copy link
Owner

thygate commented Jul 11, 2023

For starters, thanks for all this work !!

I'm doing some quick tests to see if anything is broken ..

I can only generate once using leres, every attempt afterwards fails until i change the model ..
Does not happen when computing on CPU.

edit: same for all midas models, only exception is zoedepth models.

DepthMap v0.4.0 (88aa86f3)
device: cuda
Loading model(s) ..
Computing output(s) ..
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: (False, 'u2net', False, False, 0, 1, False, 1, 'GPU', False, None, '', '', True, <PIL.Image.Image image mode=RGB size=1365x2048 at 0x1F310B49E70>, '0', False, False, False, None, False, False, False, False, True, False, 0, 448, 448, True, False, False, True, False, 0, 2.5, 'polylines_sharp', ['left-right', 'red-cyan-anaglyph'], 0, '∯background_removal∯background_removal_model∯boost∯clipdepth∯clipthreshold_far∯clipthreshold_near∯combine_output∯combine_output_axis∯compute_device∯custom_depthmap∯custom_depthmap_img∯depthmap_batch_input_dir∯depthmap_batch_output_dir∯depthmap_batch_reuse∯depthmap_input_image∯depthmap_mode∯gen_mesh∯gen_normal∯gen_stereo∯image_batch∯inpaint∯inpaint_vids∯invert_depth∯match_size∯mesh_occlude∯mesh_spherical∯model_type∯net_height∯net_width∯output_depth∯pre_depth_background_removal∯save_background_removal_masks∯save_outputs∯show_heat∯stereo_balance∯stereo_divergence∯stereo_fill∯stereo_modes∯stereo_separation') {}
    Traceback (most recent call last):
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 55, in f
        res = list(func(*args, **kwargs))
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 35, in f
        res = func(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\interface_webui.py", line 527, in run_generate
        outputs, mesh_fi, meshsimple_fi = core_generation_funnel(outpath, inputimages, inputdepthmaps, inputnames, inputs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\core.py", line 305, in core_generation_funnel
        raise e
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\core.py", line 170, in core_generation_funnel
        model_holder.get_raw_prediction(inputimages[count], net_width, net_height)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 268, in get_raw_prediction
        raw_prediction = estimateleres(img, self.depth_model, net_width, net_height)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 290, in estimateleres
        prediction = model.depth_model(img_torch)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\multi_depth_model_woauxi.py", line 31, in forward
        lateral_out = self.encoder_modules(x)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\network_auxi.py", line 96, in forward
        x = self.encoder(x)  # 1/32, 1/16, 1/8, 1/4
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\Resnext_torch.py", line 223, in forward
        return self._forward_impl(x)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\Resnext_torch.py", line 199, in _forward_impl
        x = self.conv1(x)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions-builtin\Lora\lora.py", line 415, in lora_Conv2d_forward
        return torch.nn.Conv2d_forward_before_lora(self, input)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

---
Traceback (most recent call last):
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1326, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1229, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1204, in validate_outputs
    raise ValueError(
ValueError: An event handler (f) didn't receive enough output values (needed: 5, received: 3).
Wanted outputs:
    [gallery, textbox, model3d, html, html]
Received outputs:
    [None, "", "<div class='error'>RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same</div><div class='performance'><p class='time'>Time taken: <wbr>0.45s</p><p class='vram'>Torch active/reserved: 2742/2764 MiB, <wbr>Sys VRAM: 4863/8192 MiB (59.36%)</p></div>"]

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 11, 2023

@thygate Oops, a logic error, forgot to reload the models. Now should be fixed.

@semjonsona
Copy link
Collaborator Author

@graemeniedermayer Oh! This sounds like a good idea. #282

@thygate
Copy link
Owner

thygate commented Jul 11, 2023

When using "match net size" and leres I get this error

DepthMap v0.4.0 (cc55be2f)
device: cuda
Loading model(s) ..
Computing output(s) ..
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: (False, 'u2net', False, False, 0, 1, False, 1, 'GPU', False, None, '', '', True, <PIL.Image.Image image mode=RGB size=1489x2514 at 0x1DCA543B9D0>, '0', False, False, False, None, False, False, False, True, True, False, 0, 512, 512, True, False, False, True, False, 0, 2.5, 'polylines_sharp', ['left-right', 'red-cyan-anaglyph'], 0, '∯background_removal∯background_removal_model∯boost∯clipdepth∯clipthreshold_far∯clipthreshold_near∯combine_output∯combine_output_axis∯compute_device∯custom_depthmap∯custom_depthmap_img∯depthmap_batch_input_dir∯depthmap_batch_output_dir∯depthmap_batch_reuse∯depthmap_input_image∯depthmap_mode∯gen_mesh∯gen_normal∯gen_stereo∯image_batch∯inpaint∯inpaint_vids∯invert_depth∯match_size∯mesh_occlude∯mesh_spherical∯model_type∯net_height∯net_width∯output_depth∯pre_depth_background_removal∯save_background_removal_masks∯save_outputs∯show_heat∯stereo_balance∯stereo_divergence∯stereo_fill∯stereo_modes∯stereo_separation') {}
    Traceback (most recent call last):
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 55, in f
        res = list(func(*args, **kwargs))
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 35, in f
        res = func(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\interface_webui.py", line 527, in run_generate
        outputs, mesh_fi, meshsimple_fi = core_generation_funnel(outpath, inputimages, inputdepthmaps, inputnames, inputs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\core.py", line 305, in core_generation_funnel
        raise e
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\core.py", line 170, in core_generation_funnel
        model_holder.get_raw_prediction(inputimages[count], net_width, net_height)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 282, in get_raw_prediction
        raw_prediction = estimateleres(img, self.depth_model, net_width, net_height)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 304, in estimateleres
        prediction = model.depth_model(img_torch)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\multi_depth_model_woauxi.py", line 32, in forward
        out_logit = self.decoder_modules(lateral_out)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\network_auxi.py", line 58, in forward
        x_4 = self.ffm1(features[1], x_8)  # 1/4
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\lib\network_auxi.py", line 209, in forward
        x = x + high_x
    RuntimeError: The size of tensor a (187) must match the size of tensor b (188) at non-singleton dimension 3

---
Traceback (most recent call last):
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1326, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1229, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1204, in validate_outputs
    raise ValueError(
ValueError: An event handler (f) didn't receive enough output values (needed: 5, received: 3).
Wanted outputs:
    [gallery, textbox, model3d, html, html]
Received outputs:
    [None, "", "<div class='error'>RuntimeError: The size of tensor a (187) must match the size of tensor b (188) at non-singleton dimension 3</div><div class='performance'><p class='time'>Time taken: <wbr>1.43s</p><p class='vram'>Torch active/reserved: 5274/6220 MiB, <wbr>Sys VRAM: 8192/8192 MiB (100.0%)</p></div>"]

when using leres and boost, seems to be for most models and boost :


DepthMap v0.4.0 (cc55be2f)
device: cuda
Loading model(s) ..
Loading model weights from  ./models/leres/res101.pth
initialize network with normal
loading the model from ./models/pix2pix\latest_net_G.pth
*** Error completing request
*** Arguments: (False, 'u2net', True, False, 0, 1, False, 1, 'GPU', False, None, '', '', True, <PIL.Image.Image image mode=RGB size=1489x2514 at 0x1DCA9B9FFD0>, '0', False, False, False, None, False, False, False, False, True, False, 0, 448, 448, True, False, False, True, False, 0, 2.5, 'polylines_sharp', ['left-right', 'red-cyan-anaglyph'], 0, '∯background_removal∯background_removal_model∯boost∯clipdepth∯clipthreshold_far∯clipthreshold_near∯combine_output∯combine_output_axis∯compute_device∯custom_depthmap∯custom_depthmap_img∯depthmap_batch_input_dir∯depthmap_batch_output_dir∯depthmap_batch_reuse∯depthmap_input_image∯depthmap_mode∯gen_mesh∯gen_normal∯gen_stereo∯image_batch∯inpaint∯inpaint_vids∯invert_depth∯match_size∯mesh_occlude∯mesh_spherical∯model_type∯net_height∯net_width∯output_depth∯pre_depth_background_removal∯save_background_removal_masks∯save_outputs∯show_heat∯stereo_balance∯stereo_divergence∯stereo_fill∯stereo_modes∯stereo_separation') {}
    Traceback (most recent call last):
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 55, in f
        res = list(func(*args, **kwargs))
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\modules\call_queue.py", line 35, in f
        res = func(*args, **kwargs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\interface_webui.py", line 527, in run_generate
        outputs, mesh_fi, meshsimple_fi = core_generation_funnel(outpath, inputimages, inputdepthmaps, inputnames, inputs)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\core.py", line 137, in core_generation_funnel
        model_holder.ensure_models(model_type, device, boost)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 60, in ensure_models
        self.reload()
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 250, in reload
        self.move_models_to(self.device)
      File "C:\Users\thyga\Desktop\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\scripts\depthmap_generation.py", line 257, in move_models_to
        self.pix2pix_model.to(device)
    AttributeError: 'Pix2Pix4DepthModel' object has no attribute 'to'

---
Traceback (most recent call last):
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1326, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1229, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "C:\Users\thyga\Desktop\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1204, in validate_outputs
    raise ValueError(
ValueError: An event handler (f) didn't receive enough output values (needed: 5, received: 3).
Wanted outputs:
    [gallery, textbox, model3d, html, html]
Received outputs:
    [None, "", "<div class='error'>AttributeError: &#x27;Pix2Pix4DepthModel&#x27; object has no attribute &#x27;to&#x27;</div><div class='performance'><p class='time'>Time taken: <wbr>4.07s</p><p class='vram'>Torch active/reserved: 6385/6404 MiB, <wbr>Sys VRAM: 8192/8192 MiB (100.0%)</p></div>"]

@semjonsona
Copy link
Collaborator Author

@thygate Sorry, I forgot to test unloading 😬 New version hopefully fixes

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 11, 2023

@thygate Can reproduce the net size issue in the main branch, too. This happens if image has a weird size. Do you think the code should simply round the net dimensions upwards to the next multiple of 32? (quite trivial, changed that already)

@thygate
Copy link
Owner

thygate commented Jul 11, 2023

Hehe, no worries, I'm very grateful for all the work you've put in this!

Ah and yes, I see it now, (187) vs (188), my bad, I was a bit too fast with that report, sorry.

Rounding the dimensions upward to the next multiple of 32 is a great idea and will solve this problem too.

@semjonsona
Copy link
Collaborator Author

😊

@graemeniedermayer
Copy link
Contributor

graemeniedermayer commented Jul 13, 2023

I'm testing it out! It's looking good. The only bug I found was also on the main branch.

Zoe depth and pre_depth_background_removal don't seem to work together. This stems from RGB vs RGBA. It's a little tricky. I think the best solution would be to explicitly convert to RGB in the batched_background_removal function, but the alpha channel is used later to create masks so I'll need to think about this a little bit.

@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 15, 2023

Found a problem with the script mode. I will fix it soon. Fixed

@semjonsona
Copy link
Collaborator Author

Remove background tested.

Does not work! Imports updated in the next commit.
* Reload model before generation, if it is offloaded to CPU
* Load model if boost got selected
* Do not try to offload pix2pix
* Net dimensions are multiple of 32 regardless of match size
* Change the default net size to default net size of the default model
* Fixed script mode
* UI fixes
@semjonsona semjonsona marked this pull request as ready for review July 18, 2023 07:11
@semjonsona
Copy link
Collaborator Author

Both simple and inpainted meshes seem to generate fine. Videos, too.

@semjonsona
Copy link
Collaborator Author

I've taken some time off (lots of stuff), but now I am back. I would like to get it merged - @thygate could you please do the last checks you have in mind and then Create a merge commit?

@semjonsona semjonsona mentioned this pull request Jul 18, 2023
@semjonsona
Copy link
Collaborator Author

semjonsona commented Jul 19, 2023

Whoops, there has been an issue with this I did not find up until now: if model offloading is enabled, it will be offloaded only once, and then won't offload. Now fixed.

@thygate thygate merged commit 88e11b8 into main Jul 19, 2023
@semjonsona semjonsona deleted the major-refactor branch July 20, 2023 06:59
@semjonsona
Copy link
Collaborator Author

And now we wait... Issues are bound to come, hopefully nothing that will be hiding for months like that one time I messed up a feature for 2+ months

@thygate
Copy link
Owner

thygate commented Jul 20, 2023

I checked all features and everything seems to be working for me, if anything is to pop up, I assume it will be small stuff, or problems with packages .. I went over all your commits and read all your comments, thanks again for the thorough job of refactoring the hot mess that it was.

I did notice that just having some other extensions installed can change the torch hub checkpoint path, so the zoedepth models were redownloaded.

@graemeniedermayer
Copy link
Contributor

And now we wait... Issues are bound to come, hopefully nothing that will be hiding for months like that one time I messed up a feature for 2+ months

It is much easier to work with! So if there's issues they might be easier to find. Also hopefully the update for sdxl 1.0 doesn't break anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants