2nd MLC-SLM Challenge Launches, Advancing Multilingual Conversational Speech Understanding

LOS ANGELES, CA, UNITED STATES, April 13, 2026 /EINPresswire.com/ -- The 2nd Multilingual Conversational Speech Language Model (MLC-SLM) Challenge has officially opened for registration, inviting research teams and practitioners worldwide to participate. Built on a multilingual conversational speech training set covering 𝟭𝟰 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 and approximately 𝟮,𝟭𝟬𝟬 𝗵𝗼𝘂𝗿𝘀 of data, this year’s challenge focuses on key tasks including 𝘀𝗽𝗲𝗮𝗸𝗲𝗿 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻, 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝘀𝗽𝗲𝗲𝗰𝗵 𝗿𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻 (𝗔𝗦𝗥), 𝗮𝗻𝗱 𝗱𝗶𝗮𝗹𝗼𝗴𝘂𝗲 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴, further pushing speech language model research from simple transcription toward deeper conversational understanding.

𝗧𝗮𝗿𝗴𝗲𝘁𝗶𝗻𝗴 𝗥𝗲𝗮𝗹-𝗪𝗼𝗿𝗹𝗱 𝗠𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝘀

As speech language models continue to evolve, real-world multilingual conversations are becoming an increasingly important research direction. Unlike conventional ASR tasks, these scenarios involve multiple speakers, multi-turn interactions, and more complex acoustic and semantic information. Systems are expected not only to transcribe speech accurately, but also to determine who spoke when and ultimately understand the conversation as a whole.

The 2nd MLC-SLM Challenge is designed around this shift, focusing on multilingual conversational speech tasks that are closer to real application settings and providing an open benchmark and international platform for Speech LLM research.

𝗘𝘅𝗽𝗮𝗻𝗱𝗲𝗱 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗗𝗮𝘁𝗮: 𝗔𝗿𝗼𝘂𝗻𝗱 𝟮,𝟭𝟬𝟬 𝗛𝗼𝘂𝗿𝘀 𝗔𝗰𝗿𝗼𝘀𝘀 𝟭𝟰 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀

One of the most significant highlights of this year’s challenge is the dataset. The training set contains approximately 2,100 hours of multilingual conversational speech spanning 𝟭𝟰 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀: 𝗘𝗻𝗴𝗹𝗶𝘀𝗵, 𝗙𝗿𝗲𝗻𝗰𝗵, 𝗚𝗲𝗿𝗺𝗮𝗻, 𝗜𝘁𝗮𝗹𝗶𝗮𝗻, 𝗣𝗼𝗿𝘁𝘂𝗴𝘂𝗲𝘀𝗲, 𝗦𝗽𝗮𝗻𝗶𝘀𝗵, 𝗝𝗮𝗽𝗮𝗻𝗲𝘀𝗲, 𝗞𝗼𝗿𝗲𝗮𝗻, 𝗥𝘂𝘀𝘀𝗶𝗮𝗻, 𝗧𝗵𝗮𝗶, 𝗩𝗶𝗲𝘁𝗻𝗮𝗺𝗲𝘀𝗲, 𝗧𝗮𝗴𝗮𝗹𝗼𝗴, 𝗨𝗿𝗱𝘂, 𝗮𝗻𝗱 𝗧𝘂𝗿𝗸𝗶𝘀𝗵.

Among them, English contributes around 500 hours and includes diverse regional varieties such as US, UK, Australian, Indian, and Philippine English, while each of the other languages contributes roughly 100 hours. This expansion strengthens the challenge’s foundation for multilingual conversational speech research in terms of scale, language coverage, and regional diversity.

𝗡𝗮𝘁𝘂𝗿𝗮𝗹 𝗧𝘄𝗼-𝗦𝗽𝗲𝗮𝗸𝗲𝗿 𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻𝘀 𝗖𝗼𝗹𝗹𝗲𝗰𝘁𝗲𝗱 𝗶𝗻 𝗥𝗲𝗮𝗹𝗶𝘀𝘁𝗶𝗰 𝗦𝗲𝘁𝘁𝗶𝗻𝗴𝘀

The dataset is designed to better reflect real application scenarios. All recordings are natural two-speaker conversations, where participants discuss randomly assigned topics in a meaningful and fluent way. The audio was collected in quiet indoor environments using consumer devices such as iPhones, making the data closer to real-world collection conditions.

The dataset also includes real-time timestamps and speaker labels to support system development. In addition, Track 1 and Track 2 share the same training set, encouraging participants to explore unified modeling approaches across recognition, diarization, and conversational understanding.

𝗧𝘄𝗼 𝗖𝗼𝗿𝗲 𝗧𝗮𝘀𝗸𝘀: 𝗙𝗿𝗼𝗺 “𝗪𝗵𝗼 𝗦𝗽𝗼𝗸𝗲” 𝘁𝗼 “𝗪𝗵𝗮𝘁 𝗪𝗮𝘀 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗼𝗼𝗱”

The challenge includes two main tasks.
Track 1: Multilingual Conversational Speech Diarization and Recognition
Track 2:Multilingual Conversational Speech Understanding

Unlike traditional speech benchmarks that focus primarily on transcription, the 2nd MLC-SLM Challenge places greater emphasis on multilingual, multi-speaker, and dialogue-level understanding. The evaluation setting does not provide prior information such as pre-segmented utterances or speaker labels, making the tasks closer to real deployment conditions.

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗼𝗻 𝘁𝗵𝗲 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗜𝗺𝗽𝗮𝗰𝘁 𝗼𝗳 𝘁𝗵𝗲 𝗙𝗶𝗿𝘀𝘁 𝗘𝗱𝗶𝘁𝗶𝗼𝗻

The new edition builds on the success of the inaugural MLC-SLM Challenge, which was held as a satellite event of Interspeech 2025. The first challenge attracted 78 teams from 13 countries and regions, generated 489 valid leaderboard submissions across two tracks, and received 14 high-quality technical reports. 𝗜𝘁𝘀 𝘀𝘂𝗺𝗺𝗮𝗿𝘆 𝗽𝗮𝗽𝗲𝗿 𝗵𝗮𝘀 𝗮𝗹𝘀𝗼 𝗯𝗲𝗲𝗻 𝗮𝗰𝗰𝗲𝗽𝘁𝗲𝗱 𝗯𝘆 𝗜𝗖𝗔𝗦𝗦𝗣 𝟮𝟬𝟮𝟲, further demonstrating the challenge’s academic value and growing international visibility.

𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝘁𝗵𝗲 𝟮𝗻𝗱 𝗠𝗟𝗖-𝗦𝗟𝗠 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝗶𝘀 𝗻𝗼𝘄 𝗼𝗽𝗲𝗻

● March 30, 2026: Registration opens
● April 10, 2026: Training data release
● April 24, 2026: Development set and baseline system release
● June 15, 2026: Evaluation set release and leaderboard open
● June 25, 2026: Leaderboard freeze and paper submission portal opens (CMT system)
● July 10, 2026: Paper submission deadline
● July 20, 2026: Notification of acceptance
● October 2, 2026: Workshop date

By offering open data, realistic tasks, and an international exchange platform, the challenge aims to bring together more research teams to advance multilingual conversational speech language modeling. The launch of the second edition also provides a new benchmark for pushing speech language models from simply “hearing clearly” toward genuinely “understanding” conversations.

Registration Links: https://forms.gle/jfAZ95abGy4ZiNHo7
Official Website: https://www.nexdata.ai/competition/mlc-slm

Nexdata
MLC-SLM Competition Committee
mlc-slmw@nexdata.ai
Visit us on social media:
LinkedIn
Facebook
YouTube
X

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.