LCM-LoRA speeds up text-to-image generation with Stable Diffusion models

LCM-LoRA is a universal Stable Diffusion acceleration module that can speed up the text-to-image (T2I) generation of Latent Consistency Models (LCMs), while maintaining the image quality.

This technique is achieved by applying Low-Rank Adaptation (LoRA) distillation to Stable Diffusion (SD) models, such as SD-V1.5, SSD-1B, and SDXL. Remarkably, LCM-LoRA can be directly plugged into various SD fine-tuned models or SD LoRAs, without training.

The LCM-LoRA module was developed by a research team from Hugging Face. The model is open-source and can be tried on the code repository.

The next picture demonstrates how LCM-LoRA can produce high-resolution images in very few steps. They used LCM-LoRA-SD-V1.5 to create 512×512 resolution images and LCM-LoRA-SDXL and LCM-LoRA-SSD-1B to create 1024×1024 resolution images in only 4 steps of sampling.

Images generated using Latent Consistency Models (LCMs) distilled from different pretrained diffusion models (source: paper)

LDMs & LCMs

Latent Diffusion Models (LDMs) are a type of generative models that can create realistic images by gradually adding and removing noise. These models are typically very large and complex, requiring significant time and computational resources to train.

Latent Consistency Models (LCMs) are derived from LDMs, but are much faster, being able to generate high-quality images from text in seconds, while using less memory and computational resources. LCMs only need about 32 hours of training on A100 GPU.

The picture below shows the inference time of DPM-Solver++ and LCM at 768×768 resolution, using a A800 GPU. The classifier-free guidance (CFG) scale is set to 8 and the batch size is set to 4. You can see that LCM can generate high-quality images in very short inference time.

The inference time of DPM-Solver++ and LCM (source: code repository)

LCMs offer several benefits over existing image generation models, such as faster inference, higher quality, and lower memory consumption.

LCM-LoRA

The LCM-LoRA technique is based on the principle of distillation that can speed up LDMs, ensuring the production of high-quality images while reducing the number of required sampling steps.

Distillation is a way of training a smaller and faster model from a larger and slower model, by keeping only the most important parts of the larger model. The distillation process of LCM transfers the knowledge of a large and complex model (such as SDXL or SSD-1B) into a smaller and simpler model that can generate images faster and with less memory.

LoRA (Low-Rank Adaptation) is used for faster fine-tuning LLMs by introducing low-rank matrices (obtained from breaking down the large matrix of weights into smaller ones) that capture the essence of the required weight adjustments. Instead of modifying the entire matrix of weights, LoRA focuses on these smaller matrices, significantly reducing the number of parameters to be trained.

LCM-LoRA mimics the behavior of pre-trained models like SDXL and SSD-1B, but with a more optimized approach, requiring fewer steps to achieve the same level of quality.

Overview of LCM-LoRA: a linear combination of the style and acceleration vectors (source: paper)

By incorporating LoRA into the LCM distillation process, the memory requirements for distillation are significantly reduced. This advancement allows for the training of larger models, such as SDXL and SSD-1B, even with limited resources.

There are two types of LoRA parameters that can be combined:

  • acceleration vector obtained through LCM-LoRA training
  • style vector obtained by finetuning on a particular style dataset

The model obtained by a linear combination of the acceleration vector and style vector (Customized LCM) can generate images in a particular painting style with minimal sampling steps, without additional training.

The LCM-LoRA model has much fewer trainable parameters than the original models (see the table below). It shows a reduction of 93.1%, 91.9%, and 94.4%, respectively.

Full parameter number and trainable parameter number with LoRA for SD-V1.5, SSD-1B and SDXL (source: paper)

Advantages of LCM-LoRA

  • Faster image generation: LCM-LoRA can generate high-quality images in fewer steps, significantly speeds up the image generation process, making it ideal for real-time applications.
  • Memory efficiency: LCM-LoRA’s compact nature significantly reduces memory consumption compared to traditional Stable-Diffusion models.
  • High Quality: LCM-LoRA not only accelerates image generation but also maintains the high-quality output characteristic of LDMs.
  • Universal applicability: LCM-LoRA can work with any Stable-Diffusion (SD) model, such as SD-V1.5, SSD-1B, and SDXL, without needing any extra training.

Conclusion

LCM-LoRA is a universal training-free Stable-Diffusion acceleration module that can work with any fine-tuned version of Stable-Diffusion models or LoRAs. It enables fast text-to-image generation with very few steps without sacrificing the image quality.

The model is open-source. You can try demos and run the model locally on the code repository.

Learn more:

Other popular posts