To the central content area
Toggle Dark/Light Mode Dark Mode
:::

MODA Partners with Academia Sinica to Release High-Quality Research Corpora, Advancing Diverse AI Applications

The “Taiwan Sovereign AI Training Corpus,” established by the Ministry of Digital Affairs (MODA) in collaboration with Academia Sinica, has released a number of representative research and popular science text resources covering diverse fields such as academic research, policy analysis, history and culture, and science communication. Comprising more than 6.2 million tokens, the corpus combines professional depth with significant knowledge value. These corpora not only demonstrate Academia Sinica’s extensive research capabilities, but also support diverse professional applications of AI models across specialized fields.

The Ministry of Digital Affairs noted that domain-specific knowledge corpora not only effectively enhance AI models’ understanding and application capabilities in specific scenarios and specialized fields, but also help strengthen reasoning ability and improve response accuracy. For example, they can be used to build retrieval-augmented generation (RAG) knowledge bases, develop domain-specific question-answering systems, fine-tune models to strengthen understanding in specialized fields, and support tasks such as summarization, classification, and knowledge extraction, thereby further advancing the development of more in-depth and professional AI applications.

The corpora released by Academia Sinica this time include: “Policy Recommendations,” covering a wide range of proposals on agriculture, technology, financial reform, and other areas, providing in-depth analysis and forward-looking perspectives on key issues; “Research Highlights,” which brings together major research highlights from Taiwan across fields such as the humanities and social sciences, mathematics and natural sciences, and life sciences, enabling AI models to more accurately understand the knowledge background and context of different disciplines; “Research with Insights” and “Popular Science Lectures and Activities,” which communicate scientific knowledge in a vivid and accessible manner, transforming complex content into easily understandable expressions and serving as valuable materials for AI to learn diverse tones and approaches to knowledge translation; “Special Collections from the Institute of Taiwan History,” which contain rich local historical and cultural memory, helping supplement AI models’ understanding of Taiwan’s historical perspectives; The “PPRI Newsletter,” which supplements perspectives on research ethics and institutional frameworks, enhancing AI models’ ability to make judgments and respond appropriately to ethical issues.

According to MODA, since the launch of the “Taiwan Sovereign AI Training Corpus” at the end of last year (2025), more than 3,000 datasets comprising over 1.2 billion tokens have been made available on the platform. To further enrich the corpus content, the Ministry will continue working with government agencies and academic research institutions to expand the release of text resources with distinctive Taiwanese characteristics and professional value, jointly strengthening the foundation for the development of Taiwan’s sovereign AI. AI model developers are welcome to apply for access to the corpus database (https://taic.moda.gov.tw)and obtain the latest datasets through the platform, joining in the effort to expand the possibilities of diverse AI applications.
 

Go Top