May 31, 2023
A new AI lie detector reveals their “inner thoughts”
Posted by Shailesh Prasad in category: robotics/AI
“Wish I had this to cite,” lamented Jacob Andreas, a professor at MIT, who had just published a paper exploring the extent to which language models mirror the internal motivations of human communicators.
Jan Leike, the head of alignment at OpenAI, who is chiefly responsible for guiding new models like GPT-4 to help, rather than harm, human progress, responded to the paper by offering Burns a job, which Burns initially declined, before a personal appeal from Sam Altman, the cofounder and CEO of OpenAI, changed his mind.
Continue reading “A new AI lie detector reveals their ‘inner thoughts’” »