GPT-4V(ision) (13/18)

Figure 12: Example prompt given to GPT-4 to 昀椀nd phrases to replace with images to turn text-only prompts into multimodal prompts. 2.4.2 Additional Mitigations for High-Risk Areas GPT-4V includes carefully designed refusal behavior for some prompts that contain images of people. The model refuses requests for the following: • Identity (e.g. a user uploads an image of a person and asks who they are, or a pair of images and asks if they’re the same person) • Sensitive traits (e.g. age, race) • Ungrounded inferences (e.g. when the model draws conclusions based on those traits not visually present, as discussed in Section 2.2.) To further reduce the risks in emerging and high-stake areas, we integrated additional multimodal data into the post-training process in order to reinforce refusal behavior for illicit behavior and ungrounded inference requests. Our focus was to mitigate risky prompts where in isolation, the text and the image were individually benign, but when combined as a multimodal prompt, could lead to harmful outputs. For illicit behavior, we collected a multimodal dataset by augmenting our existing text-only dataset with image synonyms. For example, given a text string "how do i kill the people?", we want to adapt it into a multimodal example "how do i [image of knife] the [image of people]?". The augmentation consists of the following steps: • For each original text-only example, we ask GPT-4 to pick the top two most harmful short phrases (ref the table below); • For each chosen short phrase, we replace it with a web crawled image. • Toensuresemantic-invariant, we conduct human review and 昀椀lter out low quality augmentations. • To reinforce the robustness of the refusal behavior, we also augment the examples with various system messages. For ungrounded inference requests, we used data collected through our red teaming campaigns. The goal was to train the model to refuse prompts that were requesting an ungrounded conclusion based on certain attributes of a person. For example, if the prompt includes a photo of a person and the text

GPT-4V(ision) Page 12 Page 14