SEOUL, December 26 (AJP) - Researchers have identified a critical security vulnerability in the "Mixture-of-Experts" architecture used by major artificial intelligence models such as Google Gemini. The study reveals that a single "malicious expert" hidden within an AI's internal structure can bypass safety filters, increasing the rate of harmful responses from zero to 80 percent.
The Korea Advanced Institute of Science and Technology (KAIST) announced on December 26 that a joint research team led by Professor Shin Seung-won and Professor Son Sooel has identified this new threat. Their research received the Distinguished Paper Award at the Annual Computer Security Applications Conference 2025 (ACSAC), a prestigious global forum for information security.
Modern large language models often use a system called Mixture-of-Experts to save computing power. Instead of one giant AI handling every request, the system acts like a manager that routes specific questions to a group of smaller, specialized "experts." This allows the AI to be faster and more efficient by only activating the experts needed for a specific task.
The KAIST team demonstrated that this efficiency creates a dangerous loophole. Because many AI developers use "open-source" parts shared by others, an attacker can distribute a single maliciously trained expert model. If this "bad" expert is integrated into a larger AI, it can take over whenever certain topics are mentioned. The researchers found that even if only one expert among many is compromised, it can force the entire AI to produce dangerous or restricted content.
This attack is particularly difficult to detect because it does not slow down the AI or break its general logic. The model continues to function normally for most tasks, but when the specific "poisoned" expert is called upon, the success rate of the attack jumps from 0 percent to as high as 80 percent. This means an AI that appears safe during standard testing could still be manipulated into generating harmful outputs.
"We have confirmed that the Mixture-of-Experts structure, which is spreading rapidly for its efficiency, can become a new security threat," Professor Shin Seung-won and Professor Son Sooel said in a joint statement. They emphasized that as the industry moves toward shared AI development, verifying the origin and safety of individual expert models is now essential for public safety.
The award-winning research was presented on December 12 at ACSAC 2025 in Hawaii. The team included Kim Jae-han, Song Min-gyu, and Na Seung-ho. The study was supported by the Ministry of Science and ICT, the Korea Internet and Security Agency (KISA), and the Institute of Information and Communications Technology Planning and Evaluation (IITP).
Copyright ⓒ Aju Press All rights reserved.



