FlowDrag improves geometric consistency by leveraging 3D mesh-guided deformation, ensuring accurate handle-to-target point alignment and preserving structural integrity during edits.
Current drag-based image editing methods suffer from geometric inconsistency problem because they rely primarily on optimizing user-defined points locally, neglecting global geometric context. This approach often leads to unnatural deformations and structural artifacts in edited images. Moreover, the absence of a reliable benchmark with ground truth data makes it challenging to quantitatively evaluate the accuracy of such editing results.
We propose FlowDrag, a Geometry-aware 3-Step Drag Editing Pipeline.
Existing drag-based editing benchmarks lack ground-truth (GT) images, which makes objective evaluation of edit accuracy challenging. To overcome this limitation, we introduce the VFD (Video Frame Drag) Dataset, comprising 250 carefully curated input and ground-truth image pairs extracted from consecutive video frames in DAVIS and Pexels datasets. Human annotators precisely label matching points between frames to define accurate drag directions, offering reliable references for geometry-aware performance evaluation.
Our method, FlowDrag, achieved the highest performance on DragBench in terms of the MD metric. Additionally, on the VFD benchmark, FlowDrag outperformed all other approaches across PSNR, 1-LPIPS, and MD metrics, demonstrating its superior accuracy and consistency in geometry-aware image editing tasks.
@inproceedings{kooflowdrag,
title={FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields},
author={Koo, Gwanhyeong and Yoon, Sunjae and Lee, Younghwan and Hong, Ji Woo and Yoo, Chang D},
booktitle={Forty-second International Conference on Machine Learning}
}