Skip to content

Conversation

@juancopi81
Copy link
Contributor

@juancopi81 juancopi81 commented Oct 27, 2022

[Community Pipeline] Based on issue #871 I thought it would be nice to have a pipeline with multilingual support and @patrickvonplaten said it was a good idea! :)

Followed the format of #897.

Code example (Also added in the PR):

from PIL import Image

import torch

from diffusers import DiffusionPipeline
from transformers import (
    pipeline,
    MBart50TokenizerFast,
    MBartForConditionalGeneration,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
device_dict = {"cuda": 0, "cpu": -1}

# helper function taken from: https://huggingface.co/blog/stable_diffusion
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

# Add language detection pipeline
language_detection_model_ckpt = "papluca/xlm-roberta-base-language-detection"
language_detection_pipeline = pipeline("text-classification",
                                       model=language_detection_model_ckpt,
                                       device=device_dict[device])

# Add model for language translation
trans_tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-one-mmt")
trans_model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-one-mmt").to(device)

diffuser_pipeline = DiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    custom_pipeline="multilingual_stable_diffusion",
    language_detection_pipeline=language_detection_pipeline,
    translation_model=trans_model,
    translation_tokenizer=trans_tokenizer,
    revision="fp16",
    torch_dtype=torch.float16,
)

diffuser_pipeline.enable_attention_slicing()
diffuser_pipeline = diffuser_pipeline.to(device)

prompt = ["a photograph of an astronaut riding a horse", 
          "Una casa en la playa",
          "Ein Hund, der Orange isst",
          "Un restaurant parisien"]

output = diffuser_pipeline(prompt)

images = output.images

grid = image_grid(images, rows=2, cols=2)

This example produces the following image:

img

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

juancopi81 and others added 23 commits October 27, 2022 13:58
* fix `upsample_nearest_nhwc` for large bsz

* fix `upsample_nearest_nhwc` for large bsz
* improve tests

* up

* finish

* upload

* add init

* up

* finish vae

* finish

* reduce loading time with device_map

* remove device_map from CPU

* uP
* [Tests] Speed up slow tests

* Up

* up
* up

* up

* up

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

* Apply suggestions from code review
* Update training and fine-tuning docs.

* Update examples README.

* Update README.

* Add Flax fine-tuning section.

* Accept suggestion

Co-authored-by: Anton Lozhkov <[email protected]>

* Accept suggestion

Co-authored-by: Anton Lozhkov <[email protected]>

Co-authored-by: Anton Lozhkov <[email protected]>
* add seed resizing to community examples

* actually add the file responsible for seed resizing
@juancopi81 juancopi81 closed this Oct 29, 2022
@juancopi81 juancopi81 deleted the multilingual_text_to_image_pipeline branch October 29, 2022 16:58
@juancopi81
Copy link
Contributor Author

Sorry I had some mistake so I am closing this pull request and go back to fix some issues.

PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants