Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Abstract

We present a novel unsupervised domain adaption method for person re-identification~(reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning~(CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain dataset into multiple subsets based on the camera labels, and initially train our model with a single subset~(i.e., images captured by a single camera). We then gradually exploit more subsets for training, according to a curriculum sequence obtained with a camera-driven scheduling rule. The scheduler considers maximum mean discrepancies~(MMD) between each subset and the source domain dataset, such that the subset closer to the source domain is exploited earlier within the curriculum. For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way. We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs. To address the camera bias problem, we also introduce a camera-diversity~(CD) loss encouraging person images of the same pseudo label, but captured across various cameras, to involve more for discriminative feature learning, providing person representations robust to inter-camera variations. Experimental results on standard benchmarks, including real-to-real and synthetic-to-real scenarios, demonstrate the effectiveness of our framework.

Approach

An overview of our framework. We divide target images into multiple subsets based on camera labels. The camera-driven scheduler takes the subsets of the target domain, along with source images as inputs, to establish a curriculum sequence. We train our model progressively with CD loss for a target domain, along with the cross-entropy term for a source domain.

Paper

G. Lee, S. Lee, D. Kim, Y. Shin, Y. Yoon, B. Ham
Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification
In International Conference on Computer Vision (ICCV), 2023
[ArXiv][Bibtex]

Acknowledgements

This work was partly supported by the IITP and NRF grants funded by the Korea government(MSIT) (No.RS-2022-00143524, Development of Fundamental Technology and Integrated Solution for Next Generation Automatic Artificial Intelligence System, No.2023R1A2C2004306), and the Yonsei Signature Research Cluster Program of 2023 (2023-22-0008).