Flash Diffusion: Accelerating Image Generation with Diffusion Models

The latest research from Jasper signals a significant milestone for AI image generation & out-performs contributions from major players including MIT and Adobe.

‍
In the ever-evolving world of GenAI, image editing remains one of the hottest topics. With businesses and marketers leaning more and more on visual content, creating and editing top-notch images at scale has never been more critical.

Today, Jasper's Paris Research Lab released new research focused on an innovative new distillation method: Flash Diffusion. This new approach speeds up image generation and editing, enhances user experience, and cuts down on computing costs, all with extremely high quality outputs, while outperforming models from MIT, Adobe, and Bytedance.

The research paper, “Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation,” highlights traditional distillation methods that require extensive training time and computational resources. The new method identified by Jasper Research cuts down required training parameters by a minimum of 2.5x and up to 64x depending on the method, making training orders of magnitude faster.

The new method also speeds up inference by 500%, leading to a 5x reduction in GPU inference costs and unlocking real-time applications such as text-to-image, in-painting and image upscaling. While previous methods are mostly closed-source and difficult to reproduce, we've open-sourced the codebase to showcase reproducibility.

‍

*These samples were generated with a Flash Diffusion, drastically reducing the inference time and opening the door to real-time applications.*

‍

‍Understanding Diffusion Models in Image Generation

Before we get into the specifics of Flash Diffusion itself, let's make sure we're on the same page about the importance of diffusion models themselves.

Diffusion models are a class of generative models that have shown remarkable success in creating or editing high-quality images. At their core, these models generate images iteratively, where an image is gradually formed by sequentially adding details until the final picture emerges.

Most diffusion-based algorithms require 30-50 steps to generate high-quality and realistic images. Since the development of these algorithms, researchers have been focused on ways to reduce the number of steps while maintaining optimal quality.

Enter Flash Diffusion.

Introducing Flash Diffusion

Flash Diffusion is an innovative approach that dramatically speeds up an existing diffusion model, namely the Teacher, without compromising performance. It achieves this via a new diffusion model, the Student, through a process known as distillation. This process trains the new model to perform the same tasks with significantly fewer steps, by constraining the training with 3 different losses:

A distillation loss between Teacher and Student denoised samples
A GAN loss based on the renoised-previously denoised samples
A Distribution Matching loss (DM) of the renoised-previously denoised samples

We found out that adding these three loss terms deliver stable training for various tasks.

‍‍

Key Benefits:

Open source: Unlike other research done on the topic, this method is well documented and comes with open source to let other researchers reproduce the results.
Efficient Distillation Method: Unlike recent methods that can be time-consuming, resource-intensive and unstable to train, this new method requires only several hours of GPU training compared to weeks needed by competing methods, and its training is stable for various tasks.
State-of-the-Art Performance: The method achieves State-of-the-Art (SOTA) performance metrics, specifically in terms of Fréchet Inception Distance (FID) and CLIP Score. These metrics are the default for evaluating the quality and relevance of generated images. Remarkably, this new model maintains high performance with fewer trainable parameters. Despite the reduced number of sampling steps, the generated images maintain very high quality.
Versatility Across Various Tasks: Our research demonstrates the method's versatility through applications such as text-to-image generation, image inpainting, face swapping, and image upscaling. It supports different diffusion model backbones (SD1.5, SDXL, and Pixart-α) and adapters, all while drastically reducing the number of sampling steps required.

In practical perspectives, it means that inference is fastest by a 500% factor (5x in terms of speed). Which means cost reduction in GPU inference costs. It also unlocks real-time applications. While previous methods are mostly closed-source and hardly reproducible, Jasper open-sourced the codebase to showcase reproducibility. The new method shows better robustness and training stability. The number of training parameters is cut down by 2.5x minimum up to 64x (depending on the method), making the training much faster.

Real-World Applications: How to use Flash Diffusion

The versatility of this innovative distillation method extends its utility across several practical applications:

Text-to-Image

Generating images from textual descriptions has vast potential in marketing and creative industries. This method allows businesses to create tailored visuals based on specific textual inputs, offering unprecedented speed and cost reduction.

Inpainting

Inpainting involves filling in missing parts of an image. For digital marketers, this can mean swiftly repairing or enhancing photos without the need for extensive manual editing. This is especially useful to resize and edit the ratio of an image.

Here's a comparison between our method and the baseline:

Previous method (4 steps):

Our method (4 steps):

Image Upscaling

Upscaling low-resolution images to higher resolutions without losing quality is crucial for maintaining visual standards. This method ensures that such transformations are not only quick but also preserve the integrity of the original image.

Previous method (4 steps):

Our method (4 steps):

The Future of Image Generation

Looking ahead, the adoption of more efficient diffusion models in image generation is likely to become a common practice. The capacity to swiftly create high-quality images using fewer resources opens up new possibilities for industries reliant on visual content.

By embracing these advancements, marketers can elevate their strategies, delivering more engaging and personalized experiences to their audiences. Establishing a robust visual presence is crucial in today’s competitive landscape, and leveraging cutting-edge AI technology like this innovative distillation method is a step in the right direction.

Who is the Jasper Research Lab?

Led by Damien Henry, the team behind the Paris-based Jasper Research Lab have been developing various top-quality models for image editing since 2021. Their original work at Clipdrop saw the creation of algorithms for image editing tools including background removal, image relighting, background replacement, image outpainting, and more.

The Jasper Research Lab is currently hiring for several technical and scientific positions, including backend engineers and a product manager.

Interested in joining the team? Check out open roles here.