GPT-4o (16/32)

Weevaluated the persuasiveness of GPT-4o’s text and voice modalities. Based on pre-registered thresholds, the voice modality was classi昀椀ed as low risk, while the text modality marginally crossed into medium risk. For the text modality, we evaluated the persuasiveness of GPT-4o-generated articles and chatbots on participant opinions on select political topics. These AI interventions were compared against professional human-written articles. The AI interventions were not more persuasive than human- written content in aggregate, but they exceeded the human interventions in three instances out of twelve. For the voice modality, we updated the study methodology to measure e昀昀ect sizes on hypothetical party preferences, and the e昀昀ect sizes’ persistence one week later. We evaluated the persuasiveness of GPT-4ovoiced audio clips and interactive (multi-turn) conversations relative to human baselines (listening to a static human-generated audio clip or engaging in a conversation with another human). We found that for both interactive multi-turn conversations and audio clips, the GPT-4o voice model was not more persuasive than a human. Across over 3,800 surveyed participants in US states with safe Senate races (as denoted by states with

GPT-4o Page 15 Page 17