ChimeraLoRA: Multi-Head LoRA-Guided Synthetic Datasets

Hoyoung Kim1,  Minwoo Jang1,  Jabin Koo2,  Sangdoo Yun3,  Jungseul Ok1,2
1Graduate School of AI, POSTECH  2Dept. of CSE, POSTECH  3NAVER AI Lab 

Abstract

Beyond general recognition tasks, specialized domains including privacy-constrained medical applications and fine-grained settings often encounter data scarcity, especially for tail classes. To obtain less biased and more reliable models under such scarcity, practitioners leverage diffusion models to generate underrepresented data. Specifically, recent studies fine-tune pretrained diffusion models with LoRA on few-shot real sets to synthesize additional images. While a single LoRA trained on a single image captures fine-grained details, it offers limited diversity, as class-wise LoRA trained over all shots of a diverse images as it encodes class priors yet tends to overlook fine details. To combine both benefits, we separate the adapter into class-shared LoRA A for class-level priors and per-image LoRAs B for image-specific characteristics. To expunge different data shifts in the shared LoRA A, we propose a semantic boosting by preserving class bounding boxes during training. For generation, we compose A with a mixture of B using coefficients drawn from a Dirichlet distribution. Across diverse datasets, our synthesized images exhibit both diverse and detail-rich while staying aligned with the few-shot real distribution, yielding robust gains in downstream classification accuracy.

Method Overview

Left: During training on few-shot images, we fine-tune the multi-head LoRA while preserving bounding boxes obtained from Grounded-SAM. Right: We merge LoRA heads using weights sampled from a Dirichlet distribution to obtain diverse synthetic images.

Qualitative Results

Synthetic images generated with LoRA-based methods. For the camera class, LoFT (image-wise LoRA) shows low diversity with near duplicate single viewpoint shots, while DataDream (class-wise LoRA) increases diversity but fails to render a camera. Our multi-head LoRA produces accurate cameras across varied viewpoints.