safe usage. • Build evaluations, mitigations, and approach deployment with real-world usage in mind: Context of use such as who the users are, what the specific use case is, where the model is being deployed, etc., is critical to mitigating actual harms associated with language models and ensuring their deployment is as beneficial as possible. It’s particularly important to account for real-world vulnerabilities, humans roles in the deployment context, and adversarial attempts. We especially encourage the development of high quality evaluations and testing of model mitigations on datasets in multiple languages. • Ensure that safety assessments cover emergent risks: As models get more capable, we should be prepared for emergent capabilities and complex interactions to pose novel safety issues. It’s important to develop evaluation methods that can be targeted at advanced capabilities that could be particularly dangerous if they emerged in future models, while also being open-ended enough to detect unforeseen risks. • Becognizant of, and plan for, capability jumps “in the wild”: Methods like fine-tuning and chain-of-thought prompting could lead to capability jumps in the same base model. This should be accounted for explicitly in internal safety testing procedures and evaluations. And a precautionary principle should be applied: above a safety critical threshold, assurance of sufficient safety is required. The increase in capabilities and adoption of these models have made the challenges and conse- quences of those challenges outlined in this card imminent. As a result, we especially encourage more research into: • Economic impacts of AI and increased automation, and the structures needed to make the transition for society smoother • Structures that allow broader public participation into decisions regarding what is considered the “optimal” behavior for these models • Evaluations for risky emergent behaviors, such as situational awareness, persuasion, and long-horizon planning • Interpretability, explainability, and calibration, to address the current nature of “black-box” AI models. We also encourage research into effective means of promoting AI literacy to aid appropriate scrutiny to model outputs. As we see above, both improved language model capabilities and limitations can pose significant challenges to the responsible and safe societal adoption of these models. To ensure that we are all well-prepared for the pace of progress, we need more research emphasis on areas such as AI literacy, economic and social resilience, and anticipatory governance.[11] It is very important that OpenAI, other labs, and academia further develop effective evaluation tools and technical improvements in model safety. Progress has been made in the last few years, and more investment in safety will likely produce more gains. Weencourage readers interested in this topic to read our work on language model impacts in areas such as disinformation, misuse, education, and economy and labor market. 69
