Red teamers also tested GPT-4V’s ability to detect incorrect information or disinformation in an image. The model’s ability to recognize disinformation was inconsistent, but may be related to how well-known a disinformation concept is and its recency. Overall, GPT-4V was not trained for this purpose and should not be used as a way to detect disinformation, or to otherwise verify whether something is true or false. Realistic, customized images can be created using other generative image models, and used in combination with GPT-4V’s capabilities. Pairing the ability of image models to generate images more easily with GPT-4V’s ability to generate accompanying text more easily may have an impact on disinformation risks. However, a proper risk assessment would also have to take into account the context of use (e.g. the actor, the surrounding events, etc.), the manner and extent of distribution (e.g. is the pairing within a closed software application or in public forums), and the presence of other mitigations such as watermarking or other provenance tools for the generated image. 2.3.5 Hateful content GPT-4V refuses to answer questions about hate symbols and extremist content in some instances but not all. The behavior may be inconsistent and at times contextually inappropriate. For instance, it knows the historic meaning of the Templar Cross but misses its modern meaning in the US, where it has been appropriated by hate groups. See Figure 10a. Red teamers observed that if a user directly names a well-known hate group, the model usually refuses to provide a completion. But, if you use lesser-known names3such as
