Paper page — Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Apr 8, 2024

Paper page — Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Posted by Cecile G. Tamura in category: education

From Microsoft.

Direct Nash Optimization.

Teaching Language Models to Self-Improve with General Preferences.

This paper studies post-training large language models (LLMs) using #preference feedback from a powerful oracle to help a model iteratively improve over…

Join the discussion on this paper page.

0 comments

Comments are closed.

GETAS THREAT LEVEL: ELEVATED
FACEBOOK: 13,958 MEMBERS
LINKEDIN: 2,066 MEMBERS
TWITTER FEED: 31,498 MEMBERS
GETTR FEED: 39,482 MEMBERS

LIFEBOAT NEWS: 3,404 SUBSCRIBERS
GETAS ALERTS: 574 SUBSCRIBERS
BLOG: 122,453 POSTS
DONORS: 6,001

BOARDS: 2,941 MEMBERS
REPORTS: 74
PROGRAMS: 25
FORUMS: 24
QUOTES: 136

FIGHT AIDS: 3 MEMBERS
FOLDING@HOME: 15 MEMBERS
ROSETTA@HOME: 44 MEMBERS

Lifeboat Foundation

Safeguarding Humanity

Blog

Apr 8, 2024

Paper page — Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Posted by Cecile G. Tamura in category: education

Comments are closed.

Categories

Top 30 Authors

All Authors

Lifeboat Foundation

Safeguarding Humanity

Blog

Apr 8, 2024

Paper page — Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Posted by Cecile G. Tamura in category: education

Comments are closed.

Tag cloud

Categories

Top 30 Authors

All Authors

Blogroll