Harada-Kurose-Mukuta Lab.

Abstracts of papers published in 2025

The IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025

A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains
Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada
Zhang_CVPR2025.png  Capturing high-quality photographs across diverse real-world lighting conditions is challenging, as both natural lighting (e.g., low-light) and camera exposure settings (e.g., exposure time) strongly influence image quality. This difficulty intensifies in multi-view scenarios, where each viewpoint can have distinct lighting and image signal processor (ISP) settings, causing photometric inconsistencies between views. These lighting degradations and view variations significant challenges to both NeRF- and 3D Gaussian Splatting (3DGS)-based novel view synthesis (NVS) frameworks. To address this, we introduce Luminance-GS, a novel approach to achieve high-quality novel view synthesis results under diverse and challenging lighting conditions using 3DGS. By adopting per-view color space mapping and view adaptive curve adjustments, Luminance-GS achieves state-of-the-art (SOTA) results across various lighting conditions—including low-light, overexposure, and varying exposure—without altering the original 3DGS explicit representation. Compared to previous NeRF- and 3DGS-based baselines, Luminance-GS provides real-time rendering speed with improved reconstruction quality. We would release the source code.

Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
Cui_CVPR2025.png  Capturing high-quality photographs across diverse real-world lighting conditions is challenging, as both natural lighting (e.g., low-light) and camera exposure settings (e.g., exposure time) strongly influence image quality. This difficulty intensifies in multi-view scenarios, where each viewpoint can have distinct lighting and image signal processor (ISP) settings, causing photometric inconsistencies between views. These lighting degradations and view variations significant challenges to both NeRF- and 3D Gaussian Splatting (3DGS)-based novel view synthesis (NVS) frameworks. To address this, we introduce Luminance-GS, a novel approach to achieve high-quality novel view synthesis results under diverse and challenging lighting conditions using 3DGS. By adopting per-view color space mapping and view adaptive curve adjustments, Luminance-GS achieves state-of-the-art (SOTA) results across various lighting conditions—including low-light, overexposure, and varying exposure—without altering the original 3DGS explicit representation. Compared to previous NeRF- and 3DGS-based baselines, Luminance-GS provides real-time rendering speed with improved reconstruction quality. We would release the source code.

The Thirteenth International Conference on Learning Representations, ICLR 2025

T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Nabarun Goswami, Hanqin Wang, Tatsuya Harada
Nabarun_ICLR2025.png  We introduce T2V2 (Text to Voice and Voice to Text), a unified non-autoregressive model capable of performing both automatic speech recognition (ASR) and text-to-speech (TTS) synthesis within the same framework. T2V2 uses a shared Conformer backbone with rotary positional embeddings to efficiently handle these core tasks, with ASR trained using Connectionist Temporal Classification (CTC) loss and TTS using masked language modeling (MLM) loss. The model operates on discrete tokens, where speech tokens are generated by clustering features from a self-supervised learning model. To further enhance performance, we introduce auxiliary tasks: CTC error correction to refine raw ASR outputs using contextual information from speech embeddings, and unconditional speech MLM, enabling classifier free guidance to improve TTS. Our method is self-contained, leveraging intermediate CTC outputs to align text and speech using Monotonic Alignment Search, without relying on external aligners. We perform extensive experimental evaluation to verify the efficacy of the T2V2 framework, achieving state-of-the-art performance on TTS task and competitive performance in discrete ASR.