Introducing ArPro, a New Dataset for Fine-Grained Propaganda Detection

The spread of propaganda in today’s media landscape has reached alarming levels, making the need for effective detection tools more urgent than ever. While significant strides have been made in detecting propaganda, most efforts have focused on English content, leaving other languages underserved. Non-English initiatives often produce small, skewed datasets, hampering the development of effective models.

To address this, we’ve created ArPro, the largest propaganda dataset to date, featuring 8,000 newspaper paragraphs labeled at the text span level across 23 propaganda techniques. This dataset sets a new standard for fine-grained propaganda detection.

Our research also marks the first evaluation of GPT-4’s performance on such a task. While GPT-4 excels in many areas, it struggles with the nuanced task of identifying specific propaganda techniques, especially compared to models fine-tuned on ArPro. Furthermore, GPT-4’s performance falters across multiple languages, highlighting the need for language-specific models.

We have made ArPro publicly available, aiming to support the research community in developing more sophisticated tools to combat propaganda. The fight against misinformation is ongoing, and resources like ArPro are crucial in ensuring that truth and accuracy prevail in the digital age.

Can GPT-4 Identify Propaganda?

Introducing ArPro, a New Dataset for Fine-Grained Propaganda Detection

Enjoy Reading This Article?