Adversarial Text Purification: Large Language Model Approach for Defense

Description:

Background

Adversarial purification is a defense mechanism for safe-guarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier. These techniques analyze and eliminate adversarial perturbations from the attacked inputs, and help to restore purified samples that retain similarity to the attacked ones and are correctly classified by the classifier. However, because of the challenges associated with characterizing noise perturbations for discrete inputs, adversarial text purification methods have not been widely used.

Invention Description

Researchers at Arizona State University have developed a new approach to defending text classifiers against adversarial attacks by utilizing the generative capabilities of Large Language Models (LLMs). This approach bypasses the need to directly characterize discrete noise perturbations in text, instead employing prompt engineering to guide LLMs in generating purified versions of attacked text that are semantically similar to the original inputs and correctly classified. This method offers a robust solution to a previously challenging aspect of text classification security.

Potential Applications:

  • Enhanced security for NLP-based applications (e.g., spam detection, sentiment analysis, content moderation)
  • Robust AI-driven text analysis tools for cybersecurity
  • Improved accuracy and reliability of automated text classification services (e.g., finance, healthcare, legal)

Benefits and Advantages:

  • Improved accuracy – classifier accuracy under attack improved by over 65% on average, outperforming existing methods
  • Simplifies purification process – eliminates need for explicit characterization of adversarial perturbations
  • Effective – utilizes generative power of LLMs to restore attacked text to purified state effectively
  • Robust – does not require prior knowledge of the specific type of adversarial attack or classifier training

Related Publication: Adversarial Text Purification: A Large Language Model Approach for Defense

Direct Link:
http://skysong.technologypublisher.com/tech?title=Adversarial_Text_Purificati on%3a_Large_Language_Model_Approach_for_Defense

Search Inventions

Looking for a technology or invention to commercialize? Arizona State University has more than 300 technologies available for licensing. Start your search here or submit your own invention.