René's URL Explorer Experiment


Title: GitHub - uclaml/SPPO: The official implementation of Self-Play Preference Optimization (SPPO)

Open Graph Title: GitHub - uclaml/SPPO: The official implementation of Self-Play Preference Optimization (SPPO)

X Title: GitHub - uclaml/SPPO: The official implementation of Self-Play Preference Optimization (SPPO)

Description: The official implementation of Self-Play Preference Optimization (SPPO) - uclaml/SPPO

Open Graph Description: The official implementation of Self-Play Preference Optimization (SPPO) - uclaml/SPPO

X Description: The official implementation of Self-Play Preference Optimization (SPPO) - uclaml/SPPO

Opengraph URL: https://github.com/uclaml/SPPO

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:e20d0509-b215-442e-e3d6-f1e56eb625ac
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-id9CE8:9A1E2:8BA693:B822D8:698D61D6
html-safe-nonce5ea8d16bc4849ce2f080b623d299a6ff31d7c1d8001f9a3b14d3d9f55a097e2b
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5Q0U4OjlBMUUyOjhCQTY5MzpCODIyRDg6Njk4RDYxRDYiLCJ2aXNpdG9yX2lkIjoiODExNjA1MTg4OTcyMjU4MTQ2MiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac7ebec7599db96e1293d62d1191c5347a470ed7d5628ad693b3499aa29d6874f7
hovercard-subject-tagrepository:814496416
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/uclaml/SPPO
twitter:imagehttps://opengraph.githubassets.com/3539b4766923d6cb8520866cb40b03a90baaabbcd42d44b9fc6b481742d6500d/uclaml/SPPO
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/3539b4766923d6cb8520866cb40b03a90baaabbcd42d44b9fc6b481742d6500d/uclaml/SPPO
og:image:altThe official implementation of Self-Play Preference Optimization (SPPO) - uclaml/SPPO
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
Nonec0818105fa276287e9369cfdefa0a0fa7953719791ceff9b94d69623c0a4fe8a
turbo-cache-controlno-preview
go-importgithub.com/uclaml/SPPO git https://github.com/uclaml/SPPO.git
octolytics-dimension-user_id22385378
octolytics-dimension-user_loginuclaml
octolytics-dimension-repository_id814496416
octolytics-dimension-repository_nwouclaml/SPPO
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id814496416
octolytics-dimension-repository_network_root_nwouclaml/SPPO
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
releasea95a17cc440c14d4fcddc0641bc1136fa8d908f0
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/uclaml/SPPO#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fuclaml%2FSPPO
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fuclaml%2FSPPO
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=uclaml%2FSPPO
Reloadhttps://patch-diff.githubusercontent.com/uclaml/SPPO
Reloadhttps://patch-diff.githubusercontent.com/uclaml/SPPO
Reloadhttps://patch-diff.githubusercontent.com/uclaml/SPPO
uclaml https://patch-diff.githubusercontent.com/uclaml
SPPOhttps://patch-diff.githubusercontent.com/uclaml/SPPO
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fuclaml%2FSPPO
Fork 47 https://patch-diff.githubusercontent.com/login?return_to=%2Fuclaml%2FSPPO
Star 582 https://patch-diff.githubusercontent.com/login?return_to=%2Fuclaml%2FSPPO
uclaml.github.io/SPPO/https://uclaml.github.io/SPPO/
Apache-2.0 license https://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/LICENSE
582 stars https://patch-diff.githubusercontent.com/uclaml/SPPO/stargazers
47 forks https://patch-diff.githubusercontent.com/uclaml/SPPO/forks
Branches https://patch-diff.githubusercontent.com/uclaml/SPPO/branches
Tags https://patch-diff.githubusercontent.com/uclaml/SPPO/tags
Activity https://patch-diff.githubusercontent.com/uclaml/SPPO/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2Fuclaml%2FSPPO
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fuclaml%2FSPPO
Code https://patch-diff.githubusercontent.com/uclaml/SPPO
Issues 14 https://patch-diff.githubusercontent.com/uclaml/SPPO/issues
Pull requests 1 https://patch-diff.githubusercontent.com/uclaml/SPPO/pulls
Actions https://patch-diff.githubusercontent.com/uclaml/SPPO/actions
Projects 0 https://patch-diff.githubusercontent.com/uclaml/SPPO/projects
Security 0 https://patch-diff.githubusercontent.com/uclaml/SPPO/security
Insights https://patch-diff.githubusercontent.com/uclaml/SPPO/pulse
Code https://patch-diff.githubusercontent.com/uclaml/SPPO
Issues https://patch-diff.githubusercontent.com/uclaml/SPPO/issues
Pull requests https://patch-diff.githubusercontent.com/uclaml/SPPO/pulls
Actions https://patch-diff.githubusercontent.com/uclaml/SPPO/actions
Projects https://patch-diff.githubusercontent.com/uclaml/SPPO/projects
Security https://patch-diff.githubusercontent.com/uclaml/SPPO/security
Insights https://patch-diff.githubusercontent.com/uclaml/SPPO/pulse
Brancheshttps://patch-diff.githubusercontent.com/uclaml/SPPO/branches
Tagshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tags
https://patch-diff.githubusercontent.com/uclaml/SPPO/branches
https://patch-diff.githubusercontent.com/uclaml/SPPO/tags
28 Commitshttps://patch-diff.githubusercontent.com/uclaml/SPPO/commits/main/
https://patch-diff.githubusercontent.com/uclaml/SPPO/commits/main/
imageshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/images
imageshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/images
models_configshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/models_configs
models_configshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/models_configs
recipeshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/recipes
recipeshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/recipes
scriptshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/scripts
scriptshttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/scripts
sppohttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/sppo
sppohttps://patch-diff.githubusercontent.com/uclaml/SPPO/tree/main/sppo
.gitignorehttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/.gitignore
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/.pre-commit-config.yaml
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/.pre-commit-config.yaml
LICENSEhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/LICENSE
LICENSEhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/LICENSE
README.mdhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/README.md
README.mdhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/README.md
run_sppo_gemma-2-27b.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_gemma-2-27b.sh
run_sppo_gemma-2-27b.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_gemma-2-27b.sh
run_sppo_gemma-2.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_gemma-2.sh
run_sppo_gemma-2.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_gemma-2.sh
run_sppo_llama-3.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_llama-3.sh
run_sppo_llama-3.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_llama-3.sh
run_sppo_mistral.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_mistral.sh
run_sppo_mistral.shhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/run_sppo_mistral.sh
setup.cfghttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/setup.cfg
setup.cfghttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/setup.cfg
setup.pyhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/setup.py
setup.pyhttps://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/setup.py
READMEhttps://patch-diff.githubusercontent.com/uclaml/SPPO
Apache-2.0 licensehttps://patch-diff.githubusercontent.com/uclaml/SPPO
https://patch-diff.githubusercontent.com/uclaml/SPPO#sppo-self-play-preference-optimization-for-language-model-alignment
https://camo.githubusercontent.com/98f8972137daf28d0022ccc7288370d1b642d38394fc33c3e7058d58b7e7c988/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4d6f64656c2d4d69737472616c2d2d37422d2d496e7374727563742d2d76302e322d677265656e
https://camo.githubusercontent.com/616514a75eff24813f610464b65c22a99509ce39144a899c5c5fc8608e6aa4fe/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4d6f64656c2d4c6c616d612d2d332d2d38422d2d496e7374727563742d677265656e
https://camo.githubusercontent.com/173a936d3285699a27cb95f154f97dc07ae12fe014056036f4f0477558e95c51/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5461736b2d416c706163614576616c5f322e302d726564
https://camo.githubusercontent.com/aef22c467ad559d6997a9f2a14012a833dc5f0323950e58e275491955617921b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5461736b2d4f70656e5f4c4c4d5f4c6561646572626f6172642d726564
https://camo.githubusercontent.com/e6dd5037fe43763569c89bb0ab419bb1940acdc2fd7930f9115723a70409e92f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5461736b2d4d542d2d42656e63682d726564
Self-Play Preference Optimization for Language Model Alignmenthttps://arxiv.org/abs/2405.00675
Yue Wuhttps://yuewu.us/
Zhiqing Sunhttps://www.cs.cmu.edu/~zhiqings/
Huizhuo Yuanhttps://scholar.google.com/citations?user=8foZzX4AAAAJ
Kaixuan Jihttps://scholar.google.com/citations?user=FOoKDukAAAAJ
Yiming Yanghttps://www.cs.cmu.edu/~yiming/
Quanquan Guhttps://web.cs.ucla.edu/~qgu/
Webpagehttps://uclaml.github.io/SPPO/
Huggingfacehttps://huggingface.co/papers/2405.00675
Paperhttps://arxiv.org/abs/2405.00675
https://patch-diff.githubusercontent.com/uclaml/SPPO#-news
Gemma-2-9B-It-SPPO-Iter3https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
gemma-2-9b-ithttps://huggingface.co/google/gemma-2-9b-it
https://arxiv.org/abs/2405.00675https://arxiv.org/abs/2405.00675
https://patch-diff.githubusercontent.com/uclaml/SPPO#table-of-content
About SPPOhttps://patch-diff.githubusercontent.com/uclaml/SPPO#about-sppo
Released Modelshttps://patch-diff.githubusercontent.com/uclaml/SPPO#released-models
Environment Setuphttps://patch-diff.githubusercontent.com/uclaml/SPPO#environment-setup
Training Scriptshttps://patch-diff.githubusercontent.com/uclaml/SPPO#training-scripts
Evaluationhttps://patch-diff.githubusercontent.com/uclaml/SPPO#evaluation
Troubleshoothttps://patch-diff.githubusercontent.com/uclaml/SPPO#troubleshoot
Citationhttps://patch-diff.githubusercontent.com/uclaml/SPPO#citation
Acknowledgementshttps://patch-diff.githubusercontent.com/uclaml/SPPO#acknowledgements
https://patch-diff.githubusercontent.com/uclaml/SPPO#about-sppo
https://patch-diff.githubusercontent.com/uclaml/SPPO/blob/main/images/table.png
herehttps://arxiv.org/abs/2405.00675
https://patch-diff.githubusercontent.com/uclaml/SPPO#base-models-and-released-models
Mistral-7B-Instruct-v0.2https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Mistral-7B-SPPO Iter1https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1
Mistral-7B-SPPO Iter2https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2
Mistral-7B-SPPO Iter3https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3
Llama-3-8B-Instructhttps://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Llama-3-8B-SPPO Iter1https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1
Llama-3-8B-SPPO Iter2https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2
Llama-3-8B-SPPO Iter3https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
Gemma-2-9B-Ithttps://huggingface.co/google/gemma-2-9b-it
Gemma-2-9B-SPPO Iter1https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1
Gemma-2-9B-SPPO Iter2https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2
Gemma-2-9B-SPPO Iter3https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
https://patch-diff.githubusercontent.com/uclaml/SPPO#environment-setup
https://patch-diff.githubusercontent.com/uclaml/SPPO#training-scripts
https://patch-diff.githubusercontent.com/uclaml/SPPO#breakdown-of-scripts
https://patch-diff.githubusercontent.com/uclaml/SPPO#evaluation
AlpacaEval 2https://github.com/tatsu-lab/alpaca_eval
MT-Benchhttps://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge
HuggingFace Open LLM Leaderboardhttps://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
https://patch-diff.githubusercontent.com/uclaml/SPPO#troubleshoot
https://patch-diff.githubusercontent.com/uclaml/SPPO#star-history
https://star-history.com/#uclaml/SPPO&Date
https://patch-diff.githubusercontent.com/uclaml/SPPO#citation
https://patch-diff.githubusercontent.com/uclaml/SPPO#acknowledgements
The Alignment Handbookhttps://github.com/huggingface/alignment-handbook
PairRMhttps://github.com/yuchenlin/LLM-Blender
vllmhttps://github.com/vllm-project/vllm
uclaml.github.io/SPPO/https://uclaml.github.io/SPPO/
deep-learning https://patch-diff.githubusercontent.com/topics/deep-learning
fine-tuning https://patch-diff.githubusercontent.com/topics/fine-tuning
self-play https://patch-diff.githubusercontent.com/topics/self-play
large-language-models https://patch-diff.githubusercontent.com/topics/large-language-models
rlhf https://patch-diff.githubusercontent.com/topics/rlhf
Readme https://patch-diff.githubusercontent.com/uclaml/SPPO#readme-ov-file
Apache-2.0 license https://patch-diff.githubusercontent.com/uclaml/SPPO#Apache-2.0-1-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/uclaml/SPPO
Activityhttps://patch-diff.githubusercontent.com/uclaml/SPPO/activity
582 starshttps://patch-diff.githubusercontent.com/uclaml/SPPO/stargazers
28 watchinghttps://patch-diff.githubusercontent.com/uclaml/SPPO/watchers
47 forkshttps://patch-diff.githubusercontent.com/uclaml/SPPO/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fuclaml%2FSPPO&report=uclaml+%28user%29
Releaseshttps://patch-diff.githubusercontent.com/uclaml/SPPO/releases
Packages 0https://patch-diff.githubusercontent.com/users/uclaml/packages?repo_name=SPPO
Contributors 24https://patch-diff.githubusercontent.com/uclaml/SPPO/graphs/contributors
Please reload this pagehttps://patch-diff.githubusercontent.com/uclaml/SPPO
+ 10 contributorshttps://patch-diff.githubusercontent.com/uclaml/SPPO/graphs/contributors
Python 91.8% https://patch-diff.githubusercontent.com/uclaml/SPPO/search?l=python
Shell 8.2% https://patch-diff.githubusercontent.com/uclaml/SPPO/search?l=shell
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.