The 27th LLM-jp Meeting

The 27th LLM-jp meeting was held on March 17, 2026, at the National Institute of Informatics and online.

Papers

Evaluation and Tuning WG>

  • Generation of Instruction and Preference Dataset for Improving Japanese Instruction Following in LLMs (Moriyama) [PDF]
  • Construction of JMultiWOZ-TC Evaluation Data for Tool Invocation by AI Agents (Shimizu) [PDF]
  • Which Feedback Works for Whom? Differential Effects of LLM-Generated Feedback Elements Across Learner Profiles (Furuhashi) [PDF]
  • Typology of Perceived “Oddness” in LLM-Generated Stories for Elementary School Kanji Learning (Takami) [PDF]
  • Analyzing Training Data Contributions in LLM Pretraining via Parameter-Space Distance (Nishida) [PDF]

<Principal Elucidation WG>

  • Bottom-Up Interpretation of Language Model Training Dynamics via Loss Curve Clustering (Aoki) [PDF]
  • Exclusive Unlearning (Sasaki) [PDF]
  • Investigating Internal Operations for Long Distance Dependencies in Language Models (Kimura) [PDF]

<Multi-modal WG>

  • JAMMEval: Improving the Reliability of Japanese VQA Evaluation Datasets through Re-annotation (Sugiura) [PDF]
  • Omni-JDocVQA: A Japanese Benchmark for Visual Document Understanding across Diverse Document Types (Kajikawa) [PDF]
  • Verification of Japanese Pre-training for LayoutLMv3 (Yanagisawa) [PDF]
  • ABMamba: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning (Yashima) [PDF]
  • JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding (Maeda) [PDF]

<Model Building WG>

  • Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks (Nakamura) [PDF]

<Corpus Construction WG>

  • Fact-Checking of LLM-Generated Texts (Kiyomaru, Masano) [PDF]
  • Scaling Data-Constrained Language Models with Synthetic Data (Kiyomaru) [PDF]
  • Improving the Accuracy of Sensitive Personal Information Detection in Large-Scale Corpora (Minamoto) [PDF]
  • Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs (Nakamura)[PDF]

<Safety WG>

  • Enhancing Multi-turn Safety in Japanese and English LLMs using GRPO (Sata) [PDF]
  • Comparing Human and Automated Red Teaming for Multi-Turn Conversational Safety Evaluation (Semitsu) [PDF]

<Academic Domain WG>

  • Tracing Multilingual Knowledge Acquisition Dynamics in Domain Adaptation: A Case Study of Biomedical Adaptation (Zhao) [PDF]

<Dialogue WG>

  • Effects of Dialogue Corpora Properties on Fine-Tuning a Moshi-Based Spoken Dialogue Model (Abe) [PDF]
  • Construction of a Large-Scale Audio Acoustic Dataset Using Common Crawl (Asai) [PDF]

Participants

on-site 24 and online 54