DragGAN: edit images by simply dragging some points on them

May 25, 2023

DragGAN (Drag Your GAN) is an interactive method for editing GAN-generated images by simply dragging some points on the images and moving them to the desired positions. It can also edit real images after embedding them into the latent space of the GAN.

Users are able to control various spatial features such as pose, shape, expression, and layout. They only have to click on the image and define some pairs of handle points and target points. Optionally, they may draw a mask to indicate which region of the image is flexible, while keeping the rest of the image unchanged.

DragGAN can manipulate any GAN-generated images, as well as real images after embedding them into the latent space of StyleGAN. This is done through GAN inversion, a technique that inverts a real image into the latent space of a pre-trained GAN model. The image can be accurately reconstructed from the inverted code by the generator.

DragGAN was developed by a research team from Max Planck Institute for Informatics, MIT CSAIL, University of Pennsylvania, and Google AR/VR. Their paper was accepted to SIGGRAPH 2023 conference on computer graphics.

DragGAN allows users to “drag” the content of any GAN-generated images. The model moves the handle points (red) to precisely reach their corresponding target points (blue)

The DragGAN algorithm is based on the StyleGAN2 architecture. It takes as inputs a GAN-generated image with some handle and target points, and an optional mask to focus on.

It then “drags” the image to the desired position by fine-tuning the positions of the handle and target points and adjusting the image according to the movement of the points.

Generative Adversarial Networks (GANs) are a type of neural network that can learn to create realistic images from random noise.

Conventional techniques for control in GANs often depend on manually labeled training data or pre-existing 3D models, which can be imprecise, inflexible, limiting their applicability.

DragGAN addresses these drawbacks by allowing users to interactively manipulate images through a dragging action, enabling them to relocate any points on the image to their desired positions.

DragGAN pipeline: while moving the handle points to the target points, the model tracks the object in the image using motion supervision and point tracking

The user chooses the handle points, the target points, and an optional mask. The method follows two primary steps to move the handle points towards the target points and track the object within the image:

The first step is motion supervision, which uses a loss function to guide the points to their targets and changes the image (I => I’) and its latent code (w => w’). The motion of the handle points not only changes the visual appearance of the object, but also alters the latent code that represents the image. As the handle points are moved, the overall image also changes. This change occurs because the handle points are connected to the object or image being manipulated.
The second step is point tracking, which updates the points’ positions and controls their movement. Point tracking essentially controls and refines the movement of the handle points, making sure that they follow the intended motion pattern accurately.

DragGAN method: in order to move the handle point 𝒑𝑖 to the target point 𝒕𝑖, the model moves a tiny patch near 𝒑𝑖 (red circle) towards 𝒕𝑖 (blue circle)

The optimization process is repeated until the handle points get to their target positions, usually after 30 to 200 iterations.

You have the flexibility to end the optimization whenever you want or to adjust the handle and target points for further image refinement until you achieve the desired results.

You can drag as many points as you want, but dragging too many points may cause unrealistic results, especially when the target positions are far away from the original positions or when they conflict with each other.

Video source: Project page. In this example, DragGAN is used to change the shape of an alga seen under a microscope.

Evaluation

The authors implemented their method in PyTorch and used Adam optimizer with various datasets, including FFHQ, AFHQCat , SHHQ, LSUN Car, and LSUN Cat.

The team performed qualitative and quantitative evaluation of their method on different tasks of image manipulation and point tracking and compare it with prior approaches.

The results show that DragGAN can_ manipulate images_ more accurately and naturally than UserControllableLT and can track the handle points better than PIPs and RAFT.

The authors demonstrate that the mask helps to better control the movable region. For example, with a mask over the dog’s head, only the head moves, while without the mask, the whole body moves.

Conclusion, future work

DragGAN opens up new possibilities for creative and interactive image synthesis using GANs. It can be used for various applications such as photo editing, animation, gaming, art, education, and more.

DragGAN is also fun and easy to use. You can simply drag any points on the image and see how the image changes in real time. You can also undo or redo your actions if you are not satisfied with the result.

One advantage of DragGAN is that it operates within the learned generative image space of the GAN. This means that manipulations performed by the system tend to produce realistic results, even in challenging scenarios.

Some challenges and limitations that could be addressed in future work are: improving the quality and diversity of the pre-trained GAN and handling large deformations or transformations.