LLM-Enhanced Information Mining for Medical Visual Question Answering, Workshop on Large Language Model Using Multi-modal Data for User Modeling, Sydney, Australia, 2025. - HES SO Valais Publications

Deutsch, English, Nederlands, Norsk, Português, <mehr...>

Art der Publikation:	Artikel
Zitat:
Publication status:	Published
Zeitschrift:	WWW '25: The ACM Web Conference 2025
Jahr:	2025
Monat:	Mai
Seiten:	2297-2305
ISSN:	979-8-4007-1331-6/2025/04
URL:	https://dl.acm.org/doi/10.1145...
DOI:	https://doi.org/10.1145/3701716.3717556
Abriss:	Medical visual question answering (Med-VQA) focuses on analyzing medical images to accurately respond to clinicians' specific questions. Although integrating prior knowledge can enhance VQA reasoning, current methods often struggle to extract relevant information from the vast and complex medical knowledge base, thereby limiting the models' ability to learn domain-specific features. To overcome this limitation, our study presents a novel information mining approach that leverages large language models (LLMs) to efficiently retrieve pertinent data. Specifically, we design a latent knowledge generation module that employs LLMs to separately extract and filter information from questions and answers, enhancing the model's inference capabilities. Furthermore, we propose a multi-level prompt fusion module in which an initial prompt interacts with the extracted latent knowledge to draw clinically relevant details from both unimodal and multimodal features. Experimental results demonstrate that our approach outperforms current state-of-the-art models on multiple Med-VQA benchmark datasets.
Schlagworte:	Knowledge Injection, Large Language Models, Medical Visual Question Answering, Multimodal Learning
Autoren	Müller, Henning Ma, Ao Li, Zhiyuan Liang, Zhuonan Gu, Tiancheng Fan, Jianan Long, Jieting Cai, Weidong
Hinzugefügt von:	[]
Gesamtbewertung:	0
Anhänge
202505_Seek Inner LLM-Enhanced...
Notizen

Themen

Ausführdauer: 0.6980 Sekunden.