An Adversarial Benchmark for Fake News Detection Models

AAAI-22 AdvML Workshop ShortPaper

Venue: AdvML
Type: Workshop
Interpretability
Authors
Affiliation

Lorenzo Jaime Yu Flores

Yale University

Sophie Hao

Yale University

Published

February 28, 2022

Abstract
With the proliferation of online misinformation, fake news detection has gained importance in the artificial intelligence community. In this paper, we propose an adversarial benchmark that tests the ability of fake news detectors to reason about real-world facts. We formulate adversarial attacks that target three aspects of “understanding”: compositional semantics, lexical relations, and sensitivity to modifiers. We test our benchmark using BERT classifiers fine-tuned on the LIAR (Wang 2017) and Kaggle Fake-News datasets (UTK Machine Learning Club 2017), and show that both models fail to respond to changes in compositional and lexical meaning. Our results strengthen the need for such models to be used in conjunction with other fact checking methods.