GrowTAS: Progressive Expansion from Small to Large Subnets for Efficient ViT Architecture Search

WACV 2026

Yonsei University
Top: We first train a small subnet and then evaluate 100 larger subnets derived from it without further training. To be specific, the large subnets are obtained by appending randomly initialized weights to those of the trained one. We can see that the large subnets achieve test accuracies comparable to the trained one (marked by the red dashed line) even with the random initialization. Bottom: We first train a large subnet and then evaluate 100 smaller subnets derived from it without further training. Specifically, we obtain the small variants by cropping a subset of weights from the trained subnet randomly. We can see that the small subnets fail to maintain the performance of the well-trained one.

Abstract

Transformer architecture search (TAS) aims to automatically discover efficient vision transformers (ViTs), reducing the need for manual design. Existing TAS methods typically train an over-parameterized network (i.e., a supernet) that encompasses all candidate architectures (i.e., subnets). However, subnets partially share weights within the supernet, which leads to interference that degrades the smaller subnets severely. We have found that well-trained small subnets can serve as a good foundation for training larger ones. Motivated by this, we propose a progressive training framework, dubbed GrowTAS, that begins with training small subnets and incorporates larger ones gradually. This enables reducing the interference and stabilizing training. We also introduce GrowTAS+ that fine-tunes a subset of weights only to further enhance the performance of large subnets. Extensive experiments on ImageNet and several transfer learning benchmarks, including CIFAR-10/100, Flowers, CARS, and INAT-19, demonstrate the effectiveness of our approach over current TAS methods.

Results

Quantitative results of transfer learning across multiple datasets: CIFAR-10/100, Flowers, Cars, and INAT-19. We report top-1 accuracy (%) and the number of model parameters. For GrowTAS-S (ours), we report mean ± std across 3 runs.
We show in this table the results of GrowTAS on various downstream classification tasks, including CIFAR-10/100, Flowers, Cars, and iNat-19. We can see that GrowTAS-S achieves highly competitive or state-of-the-art performance across all benchmarks. These datasets span a broad spectrum of domains, covering general objects, fine-grained categories, and large-scale species classification. This suggests that our progressive subnet training strategy effectively preserves transferable features, demonstrating its generalization ability across diverse domains.

Paper

H. Lee, Y. Oh, J. Jeon, D. Baek, and B. Ham
GrowTAS: Progressive Expansion from Small to Large Subnets for Efficient ViT Architecture Search
In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2026
[arXiv][Code]

Acknowledgements

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grants funded by the Korea government (MSIT) (No.RS-2022-00143524, Development of Fundamental Technology and Integrated Solution for Next-Generation Automatic Artificial Intelligence System, No.RS-2025-09942968, AI Semiconductor Innovation Lab (Yonsei University)), and the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIT) (RS-2025-02216328).