Skip to content

Conversation

@leejet
Copy link
Owner

@leejet leejet commented Jan 1, 2024

Still need some work. When generating images larger than 512x512, it produces invalid images. This might be an internal issue within ggml. I haven't had time to pinpoint the exact cause yet.

.\bin\Release\sd.exe -m ..\..\stable-diffusion-webui\models\Stable-diffusion\v2-1_768-nonema-pruned.safetensors -p "a lovely cat"  -H 768 -W 768 -v
.\bin\Release\sd.exe -m ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_turbo_1.0_fp16.safetensors --vae ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors -p "a lovely cat" -v   -H 768 -W 768  --cfg-scale 1 --steps 1

output

@leejet leejet mentioned this pull request Jan 3, 2024
@FSSRepo
Copy link
Contributor

FSSRepo commented Jan 3, 2024

I will see if I can help with anything.

@FSSRepo
Copy link
Contributor

FSSRepo commented Jan 4, 2024

@leejet Do you get these errors on CPU or CUDA? I ran tests on both backends and obtained correct results.

If you see errors with cuda, in ggml-cuda.cu, in ggml_backend_cuda_buffer_set_tensor change cudaMemcpy order to before device syncronization:

static void ggml_backend_cuda_buffer_set_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor, const void * data, size_t offset, size_t size) {
    GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU);

    ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context;

    ggml_cuda_set_device(ctx->device);
    CUDA_CHECK(cudaMemcpy((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice));

    CUDA_CHECK(cudaDeviceSynchronize());

}

@leejet
Copy link
Owner Author

leejet commented Jan 5, 2024

@leejet Do you get these errors on CPU or CUDA? I ran tests on both backends and obtained correct results.

If you see errors with cuda, in ggml-cuda.cu, in ggml_backend_cuda_buffer_set_tensor change cudaMemcpy order to before device syncronization:

static void ggml_backend_cuda_buffer_set_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor, const void * data, size_t offset, size_t size) {
    GGML_ASSERT(tensor->backend == GGML_BACKEND_GPU);

    ggml_backend_buffer_context_cuda * ctx = (ggml_backend_buffer_context_cuda *)buffer->context;

    ggml_cuda_set_device(ctx->device);
    CUDA_CHECK(cudaMemcpy((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice));

    CUDA_CHECK(cudaDeviceSynchronize());

}

This issue only occurred on CUDA. Changing the cudaMemcpy order to before device synchronization fixed the problem.

@leejet leejet merged commit 2b6ec97 into master Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants