As online shopping continues to grow, the demand for Virtual Try-On (VTON) technology has surged, allowing customers to visualize products on themselves by overlay- ing product images onto their own photos. An essential yet challenging condition for effective VTON is pose con- trol, which ensures accurate alignment of products with the user's body while supporting diverse orientations for a more immersive experience. However, incorporating pose conditions into VTON models presents several challenges, including selecting the optimal pose representation, inte- grating poses without additional parameters, and balanc- ing pose preservation with flexible pose control. In this work, we build upon CatVTON (titled ”Concate- nation is All you Need”), an efficient VTON model that concatenates the reference image condition without exter- nal encoder, control network, or complex attention layers. We investigate methods to incorporate pose control into this pure-concatenation paradigm by spatially concatenat- ing pose data, comparing performance using pose maps and skeletons. Our experiments reveal that pose stitching with pose maps yields the best results, enhancing both pose preservation and output realism. Additionally, we intro- duce a mixed-mask training strategy using fine-grained and bounding box masks, allowing the model to support flexible product integration across varied poses and conditions. Our contributions are threefold: 1) We explore differ- ent configurations for integrating pose representations into VTON models, 2) We propose a lightweight, parameter- efficient approach for adding pose control, and 3) We en- able flexible pose generation through mixed-mask training. Evaluations on public datasets VITONHD and DressCode demonstrate that our method improves pose preservation and outperforms state-of-the-art models with more complex conditioning frameworks, advancing VTON's adaptability and realism for diverse real-world applications.
Illustration of different model configurations of adding pose conditions.
@article{Pose Conditioning,
author = {Shuwen Qiu, Qi Li, Amir Tavanaei, Julien Han, Kee Kiat Koo, Karim Bouyarmane},
title = {Is Concatenation Really All You Need: Efficient Concatenation-Based Pose Conditioning and Pose Control for Virtual Try On},
year = {2024},
}