LLM-Enhanced Information Mining for Medical Visual Question Answering, Workshop on Large Language Model Using Multi-modal Data for User Modeling, Sydney, Australia, 2025.
| Art der Publikation: | Artikel |
| Zitat: | |
| Publication status: | Published |
| Zeitschrift: | WWW '25: The ACM Web Conference 2025 |
| Jahr: | 2025 |
| Monat: | Mai |
| Seiten: | 2297-2305 |
| ISSN: | 979-8-4007-1331-6/2025/04 |
| URL: | https://dl.acm.org/doi/10.1145... |
| DOI: | https://doi.org/10.1145/3701716.3717556 |
| Abriss: | Medical visual question answering (Med-VQA) focuses on analyzing medical images to accurately respond to clinicians' specific questions. Although integrating prior knowledge can enhance VQA reasoning, current methods often struggle to extract relevant information from the vast and complex medical knowledge base, thereby limiting the models' ability to learn domain-specific features. To overcome this limitation, our study presents a novel information mining approach that leverages large language models (LLMs) to efficiently retrieve pertinent data. Specifically, we design a latent knowledge generation module that employs LLMs to separately extract and filter information from questions and answers, enhancing the model's inference capabilities. Furthermore, we propose a multi-level prompt fusion module in which an initial prompt interacts with the extracted latent knowledge to draw clinically relevant details from both unimodal and multimodal features. Experimental results demonstrate that our approach outperforms current state-of-the-art models on multiple Med-VQA benchmark datasets. |
| Schlagworte: | Knowledge Injection, Large Language Models, Medical Visual Question Answering, Multimodal Learning |
| Autoren | |
| Hinzugefügt von: | [] |
| Gesamtbewertung: | 0 |
|
Anhänge
|
|
|
Notizen
|
|
|
|
|
|
Themen
|
|
|
|
|
