GPT-4o (9/32) — OpenAI

Safety Behavior: We evaluate on an internal dataset of conversations and evaluate the consis- tency of the model’s adherence and refusal behavior across di昀昀erent user voices. Overall, we do not 昀椀nd that the model behavior varies across di昀昀erent voices. 9

GPT-4o Page 8 Page 10