A new technique for language-guided image editing using diffusion models was announced on November 23rd, 2022 by a group of researchers from the University of Science and Technology of China and Microsoft Research Asia center.
This approach is more precise than previous ones and allows the users to edit a scene by painting with a “conditional image”. The reference image is automatically altered and merged into the source image, as we can see in the picture below:

Text-driven image editing is a technique that uses natural language to manipulate and edit images. This includes tasks such as changing the color or brightness of an image, cropping or resizing it, or adding/removing elements. The goal of text-driven image editing is to enable us to easily and quickly alter images using natural language commands, rather than having to manually learn complex editing software.
The new approach called exemplar-based image editing allows for semantic manipulation on the contents of the image in reference to an exemplar image which can be provided by users or retrieved from a database. The authors used a trained diffusion model in which the objects from the input images are randomly cropped and defined as reference images. During the training, the model is taught to copy/paste the reference image onto the original image.

The trained diffusion model can handle a wide variety of reference and source images, as shown in the paper released by the research team.
Check out the research paper and the GitHub link.