Harada-Osa-Kurose-Mukuta Lab.

Abstracts of papers published in 2023

Pattern Recognition

Correlated and individual feature learning with contrast-enhanced MR for malignancy characterization of hepatocellular carcinoma
Yunling Li, Shangxuan Li, Hanqiu Ju, Tatsuya Harada, Honglai Zhang, Ting Duan, Guangyi Wang, Lijuan Zhang, Lin Gu, Wu Zhou
XinyueHu23 Malignancy characterization of hepatocellular carcinoma (HCC) is of great importance in patient management and prognosis prediction. In this study, we propose an end-to-end correlated and individual feature learning framework to characterize the malignancy of HCC from Contrast-enhanced MR. From the phases of pre-contrast, arterial and portal venous, our framework simultaneously and explicitly learns both the shareable and phase-specific features that are discriminative to malignancy grades. We evaluate our method on the Contrast enhanced MR of 112 consecutive patients with 117 histologically proven HCCs. Experimental results demonstrate that arterial phase yields better results than portal vein and pre-contrast phase. Furthermore, phase specific components show better discriminant ability than the shareable components. Finally, combining the extracted shareable and individual features components has yielded significantly better performance than traditional feature fusion methods. We also conduct t-SNE analysis and feature scoring analysis to qualitatively assess the effectiveness of the proposed method for malignancy characterization.

SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Xinyue Hu, Lin Gu, Qiyuan An, Zhang Mengliang, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu
XinyueHu23 We propose a novel Chest-Xray Different Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 698,739 QA pairs on 109,790 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code will be released upon publication. We believe this work would further push forward the medical vision language model.

IEEE International Symposium on Biomedical Imaging (ISBI)

Domain Adaptive Multiple Instance Learning for Instance-level Prediction of Pathological Images
Shusuke Takahama, Yusuke Kurose, Yusuke Mukuta, Hiroyuki Abe, Akihiko Yoshizawa, Tetsuo Ushiku, Masashi Fukayama, Masanobu Kitagawa, Masaru Kitsuregawa, Tatsuya Harada
takahama23 Pathological image analysis is a crucial diagnosis to detect abnormalities such as cancer by observing cell images. Many studies have attempted to apply image recognition technology to pathological images to assist in diagnosis. Learning an accurate classification model requires a large number of labels, but the annotation requires a great deal of effort and a high level of expertise. In this study, to obtain an accurate model while reducing the cost of labeling, we adopted "Multiple instance learning", which can learn a model with only coarse labels. In addition, we proposed a new pipeline that incorporates "Domain adaptation" and "Pseudo-labeling", which are transfer learning methods that take advantage of information from other datasets. We conducted experiments on our own pathological image datasets of the stomach and colon and confirmed that our method can find abnormalities with high accuracy while keeping the labeling cost low. (arXiv)

2nd Workshop on Learning with Limited Labelled Data for Image and Video Understanding (CVPR2023 workshop)

Zero-shot Object Classification with Large-scale Knowledge Graph
Kohei Shiba, Yusuke Mukuta, Tatsuya Harada
compass Zero-shot learning is research for predicting unseen categories, and can solve problems such as dealing with unseen categories that were not anticipated at the time of training and the lack of labeled datasets. One of the methods for zero-shot object classification is using a knowledge graph. We use a large-scale knowledge graph to enable classification of a larger number of categories and to achieve more accurate recognition. We propose a method to extract useful graph information by positional relationships and the types of edges. We classify images that were unclassifiable in existing research and show that the proposed data extraction method improves performance.

Computer Speech & Language

COMPASS: A creative support system that alerts novelists to the unnoticed missing contents
Yusuke Mori, Hiroaki Yamane, Ryohei Shimizu, Yusuke Mukuta, Tatsuya Harada
compass Writing a story is not easy, and even professional creators do not always write a perfect story at once. Sometimes, objective advice from an editor can help a creator realize what needs to be improved. We proposed COMPASS, a creative writing support system that suggests completing the missing information that storytellers unintentionally omitted. We conducted a user study of four professional creators who use Japanese and confirmed the system's usefulness. We hope this effort will be a basis for further research on collaborative creation between creators and AI. (paper(open access))

Association for the Advancement of Artificial Intelligence (AAAI 2023)

People taking photos that faces never share: Privacy Protection and Fairness Enhancement from Camera to User
Junjie Zhu, Lin Gu, Xiaoxiao Wu, Zheng Li, Tatsuya Harada, Yingying Zhu
detection The soaring number of personal mobile devices and public cameras poses a threat to fundamental human rights and ethical principles. For example, the stolen of private information such as face image by malicious third parties will lead to catastrophic consequences. By manipulating appearance of face in the image, most of existing protection algorithms are effective but irreversible. Here, we propose a practical and systematic solution to invertiblely protect face information in the full-process pipeline from camera to final users. Specifically, We design a novel lightweight Flow-based Face Encryption Method (FFEM) on the local embedded system privately connected to the camera, minimizing the risk of eavesdropping during data transmission. FFEM uses a flow-based face encoder to encode each face to a Gaussian distribution and encrypts the encoded face feature by random rotating the Gaussian distribution with the rotation matrix is as the password. While encrypted latent-variable face images are sent to users through public but less reliable channels, password will be protected through more secure channels through technologies such as asymmetric encryption, blockchain, or other sophisticated security schemes. User could select to decode an image with fake faces from the encrypted image on the public channel. Only trusted users are able to recover the original face using the encrypted matrix transmitted in secure channel. More interestingly, by tuning Gaussian ball in latent space, we could control the fairness of the replaced face on attributes such as gender and race. Extensive experiments demonstrate that our solution could protect privacy and enhance fairness with minimal effect on high-level downstream task.

Winter Conference on Applications of Computer Vision (WACV 2023)

Backprop Induced Feature Weighting for Adversarial Domain Adaptation with Iterative Label Distribution Alignment
Thomas Westfechtel, Hao-Wei Yeh, Qier Meng, Yusuke Mukuta, Tatsuya Harada
detection The requirement for large labeled datasets is one of the limiting factors for training accurate deep neural networks. Unsupervised domain adaptation tackles this problem of limited training data by transferring knowledge from one domain, which has many labeled data, to a different domain for which little to no labeled data is available. One common approach is to learn domain-invariant features for example with an adversarial approach. Previous methods often train the domain classifier and label classifier network separately, where both classification networks have little interaction with each other. In this paper, we introduce a classifier based backprop induced weighting of the feature space. This approach has two main advantages. Firstly, it lets the domain classifier focus on features that are important for the classification and, secondly, it couples the classification and adversarial branch more closely. Furthermore, we introduce an iterative label distribution alignment method, that employs results of previous runs to approximate a class-balanced dataloader. We conduct experiments and ablation studies on three benchmarks Office-31, OfficeHome and DomainNet to show the effectiveness of our proposed algorithm.

K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara, Tatsuya Harada
detection Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition. To introduce a knowledge acquisition perspective into VQG, we constructed a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also developed a new VQG model that can encode and use knowledge as the target for a question. The experiment results show that our model outperforms existing models on the K-VQG dataset.