FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

KAIST

ECCV 2024
[Paper]            [Code]            [Poster]



Comparative results of non-rigid and rigid edits using FlexiEdit (ours), MasaCtrl, and Prompt-to-Prompt (P2P). FlexiEdit outperforms other methods in non-rigid edits by providing more flexibility in altering layouts and achieving more natural results in rigid edits.


Abstract

Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve altering the image's layout or structure. Our comprehensive analysis reveals that the high-frequency components of DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to these limitations. Addressing this, we introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent, specifically by reducing high-frequency components in targeted editing areas. FlexiEdit comprises two key components: (1) Latent Refinement, which modifies DDIM latent to better accommodate layout adjustments, and (2) Edit Fidelity Enhancement via Re-inversion, aimed at ensuring the edits more accurately reflect the input text prompts. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits, showcasing its enhanced capability through comparative experiments.


Method

(a) The pipeline of FlexiEdit. Our method utilizes the refined latent \( z^{’}_T \) to achieve \( I_{mid} \), which significantly alters the original image’s layout. Following re-inversion over a duration of \( t_R \), features from the original image are injected during the resampling process, resulting in the final edited image, \( I_{tar} \). (b) The refinement process within the edited region of the latent entails reducing high-frequency components by a factor of \( \alpha \) while incorporating Gaussian noise proportional to \( (1 - \alpha) \).


Frequency Components Analysis of DDIM Latent


(a), (b) Show the PSNR and LPIPS results of reconstructing \( z^{H, \alpha}_T \) , and \(z^{L, \alpha}_T \) in comparison to the original image.
(c) visualizes the reconstruction outcome across different alpha values, indicating that high-frequency components play a more significant role in forming the object's layout than low-frequency components.

We hypothesized that current image editing models struggle to change the original image's layout because they use DDIM latent. To explore this, we analyzed the frequency components of DDIM latent (\( z_T \)). Using 2D Gaussian filters, we separated the low and high-frequency components of \( z_T \). We then adjusted these components using a scalar \( \alpha \) (0 to 1), creating \( z^{L, \alpha}_T \) and \( z^{H, \alpha}_T \) for low and high-frequency adjustments, respectively. Our experiments showed that reducing high-frequency components (\( z^{H, \alpha}_T \)) significantly degrades the image structure, while reducing low-frequency components (\( z^{L, \alpha}_T \)) has minimal impact. This indicates that high-frequency elements are crucial for maintaining the original image's attributes and layout.


Non-rigid Editing Results


Rigid Editing Results

Paper


FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Gwanhyeong Koo, Sunjae Yoon, Ji Woo Hong, and Chang D. Yoo

European Conference on Computer Vision (ECCV) 2024

description arXiv version
insert_comment BibTeX
integration_instructions Code

Citation


If you find our project helpful, please consider leaving a star or cite our paper :)
@article{koo2024flexiedit,
    title={FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing},
    author={Koo, Gwanhyeong and Yoon, Sunjae and Hong, Ji Woo and Yoo, Chang D},
    journal={arXiv preprint arXiv:2407.17850},
    year={2024}}
        

Acknowledgements


This work was supported by Institute for Information & communications Technology Planning & Evaluation (IITP) 
grant funded by the Korea government(MSIT) (No. 2021-0-01381, Development of Causal AI through Video Understanding 
and Reinforcement Learning, and Its Applications to Real Environments).