Background
Text classification models have become increasingly prevalent in cybersecurity applications, but remain susceptible to adversarial examples (e.g., carefully crafted sentences with human-unrecognizable changes to the inputs, that are misclassified). Adversarial attacks provide profound insights into the classifiers’ vulnerabilities, and are key to reinforcing their robustness and reliability.
Depending on the information available to the adversary, attacks can be conducted under black-box settings, which can only access the classifier feedback to queries. This setting is more feasible for real-world applications, as no prior knowledge of the classifier is given. There is a growing need to develop a score-based black-box sentence-level attack in order to identify the extent of the threat to text classification models, and better immunize them to attacks in all black-box settings.
Invention Description
Researchers at Arizona State University have developed a novel black-box, sentence-level attack leveraging classifier class probabilities to craft stronger adversarial text examples. This technology models adversarial sentence candidates as continuous distributions, enabling efficient search guided by rich class probability information. Extensive evaluations demonstrate superior attack success across multiple classifiers and benchmark datasets, highlighting the practical importance of utilizing class probabilities for robust adversarial attack generation in real-world text classification systems.
Potential Applications:
	- Security testing & robustness evaluation for online text classification services
- Development of more resilient natural language process models against adversarial attacks
- Enhancement of AI safety & trustworthiness in text-based AI applications
Benefits and Advantages:
	- Effective – uses class probabilities for black-box sentence-level attacks
- Widely applicable – can be used with a variety of classifiers & benchmark datasets
- Improves robustness & reliability – provides specific insights to enhance text classifiers
- Delivers stronger, more successful attacks – fully exploits classifier feedback
- Improved search parameters – transforms discrete adversarial candidate search into continuous parameter optimization
Related Publication: Exploiting Class Probabilities for Black-box Sentence-level Attacks