How truthful is GPT-3? A benchmark for language models – AI Alignment Forum

This is an edited excerpt of a new ML paper (pdf, code) by Stephanie Lin (FHI Oxford), Jacob Hilton (OpenAI) and Owain Evans (FHI Oxford). The paper is under review at NeurIPS. TITLE: TRUTHFULQA: MEASURING HOW MODELS MIMIC HUMAN FALSEHOODS ABSTRACT We propose a benchmark to measure whether a language model is truthful in generatingContinue reading “How truthful is GPT-3? A benchmark for language models – AI Alignment Forum”