René's URL Explorer Experiment


Title: reward-model · GitHub Topics · GitHub

Open Graph Title: Build software better, together

X Title: GitHub

Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Open Graph Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

X Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Opengraph URL: https://github.com

X: github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/topics/:topic_name(.:format)
route-controllertopics
route-actionshow
fetch-noncev2:7ddfab90-466d-2e7a-0120-b6f83e3f9e40
current-catalog-service-hash82c569b93da5c18ed649ebd4c2c79437db4611a6a1373e805a3cb001c64130b7
request-idE402:6A780:3813891:4A15717:698C5B62
html-safe-noncefdfa7f09dbf1e73e11d9f19da29eea9930eb882edb8bc77148cdf4ef77182b04
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFNDAyOjZBNzgwOjM4MTM4OTE6NEExNTcxNzo2OThDNUI2MiIsInZpc2l0b3JfaWQiOiI4NTg2NTY1MTA2MDI3NzQ4MTk0IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmacf081d3da36ab0f2facb95b5c18b86cbd6b4f6f0a874b12f735359b2f5cd23d45
github-keyboard-shortcutscopilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/topics/reward-model
og:site_nameGitHub
og:imagehttps://github.githubassets.com/assets/github-octocat-13c86b8b336d.png
og:image:typeimage/png
og:image:width1200
og:image:height620
twitter:site:id13334762
twitter:creatorgithub
twitter:creator:id13334762
twitter:cardsummary_large_image
twitter:imagehttps://github.githubassets.com/assets/github-logo-55c5b9a1fe52.png
twitter:image:width1200
twitter:image:height1200
hostnamegithub.com
expected-hostnamegithub.com
None640eeb7b6ff4d8d106235d228c0c286e82592d4d2403227b5b2b4fc5832297a4
turbo-cache-controlno-preview
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release3d444f0a47beeeac94cddbb51c91ab408befe8d4
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/topics/reward-model#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Ftopics%2Freward-model
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Ftopics%2Freward-model
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2Ftopics%2Freward-model&source=header
Reloadhttps://patch-diff.githubusercontent.com/topics/reward-model
Reloadhttps://patch-diff.githubusercontent.com/topics/reward-model
Reloadhttps://patch-diff.githubusercontent.com/topics/reward-model
Explorehttps://patch-diff.githubusercontent.com/explore
Topicshttps://patch-diff.githubusercontent.com/topics
Trendinghttps://patch-diff.githubusercontent.com/trending
Collectionshttps://patch-diff.githubusercontent.com/collections
Eventshttps://patch-diff.githubusercontent.com/events
GitHub Sponsorshttps://patch-diff.githubusercontent.com/sponsors/explore
Star https://patch-diff.githubusercontent.com/login?return_to=%2Ftopic.reward-model
All 32 https://github.com/topics/reward-model
Python 22 https://github.com/topics/reward-model?l=python
Jupyter Notebook 5 https://github.com/topics/reward-model?l=jupyter+notebook
Most stars https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=stars
Fewest stars https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=stars
Most forks https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=forks
Fewest forks https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=forks
Recently updated https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=updated
Least recently updated https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=updated
agentscope-aihttps://patch-diff.githubusercontent.com/agentscope-ai
OpenJudgehttps://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge
Star 384 https://patch-diff.githubusercontent.com/login?return_to=%2Fagentscope-ai%2FOpenJudge
Code https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge
Issues https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/issues
Pull requests https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/pulls
Discussions https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/discussions
agenthttps://patch-diff.githubusercontent.com/topics/agent
evaluationhttps://patch-diff.githubusercontent.com/topics/evaluation
alignmenthttps://patch-diff.githubusercontent.com/topics/alignment
graderhttps://patch-diff.githubusercontent.com/topics/grader
rewardhttps://patch-diff.githubusercontent.com/topics/reward
llmhttps://patch-diff.githubusercontent.com/topics/llm
rlhfhttps://patch-diff.githubusercontent.com/topics/rlhf
llmopshttps://patch-diff.githubusercontent.com/topics/llmops
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
wendell0218https://patch-diff.githubusercontent.com/wendell0218
Awesome-RL-for-Video-Generationhttps://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation
Star 342 https://patch-diff.githubusercontent.com/login?return_to=%2Fwendell0218%2FAwesome-RL-for-Video-Generation
Code https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation
Issues https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation/issues
Pull requests https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
ppohttps://patch-diff.githubusercontent.com/topics/ppo
video-generationhttps://patch-diff.githubusercontent.com/topics/video-generation
dpohttps://patch-diff.githubusercontent.com/topics/dpo
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
grpohttps://patch-diff.githubusercontent.com/topics/grpo
opendilabhttps://patch-diff.githubusercontent.com/opendilab
LightRFThttps://patch-diff.githubusercontent.com/opendilab/LightRFT
Star 157 https://patch-diff.githubusercontent.com/login?return_to=%2Fopendilab%2FLightRFT
Code https://patch-diff.githubusercontent.com/opendilab/LightRFT
Issues https://patch-diff.githubusercontent.com/opendilab/LightRFT/issues
Pull requests https://patch-diff.githubusercontent.com/opendilab/LightRFT/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
multi-modalhttps://patch-diff.githubusercontent.com/topics/multi-modal
vlmhttps://patch-diff.githubusercontent.com/topics/vlm
rfthttps://patch-diff.githubusercontent.com/topics/rft
llmhttps://patch-diff.githubusercontent.com/topics/llm
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
llm-traininghttps://patch-diff.githubusercontent.com/topics/llm-training
grpohttps://patch-diff.githubusercontent.com/topics/grpo
dapohttps://patch-diff.githubusercontent.com/topics/dapo
VectorInstitutehttps://patch-diff.githubusercontent.com/VectorInstitute
vector-inferencehttps://patch-diff.githubusercontent.com/VectorInstitute/vector-inference
Star 91 https://patch-diff.githubusercontent.com/login?return_to=%2FVectorInstitute%2Fvector-inference
Code https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference
Issues https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference/issues
Pull requests https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference/pulls
inferencehttps://patch-diff.githubusercontent.com/topics/inference
speech-to-texthttps://patch-diff.githubusercontent.com/topics/speech-to-text
vlmhttps://patch-diff.githubusercontent.com/topics/vlm
text-embeddinghttps://patch-diff.githubusercontent.com/topics/text-embedding
multimodalhttps://patch-diff.githubusercontent.com/topics/multimodal
audio-transcriptionhttps://patch-diff.githubusercontent.com/topics/audio-transcription
llmhttps://patch-diff.githubusercontent.com/topics/llm
vllmhttps://patch-diff.githubusercontent.com/topics/vllm
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
llm-infernecehttps://patch-diff.githubusercontent.com/topics/llm-infernece
sglanghttps://patch-diff.githubusercontent.com/topics/sglang
llm-infrastructurehttps://patch-diff.githubusercontent.com/topics/llm-infrastructure
Westlake-AIhttps://patch-diff.githubusercontent.com/Westlake-AI
SemiRewardhttps://patch-diff.githubusercontent.com/Westlake-AI/SemiReward
Star 77 https://patch-diff.githubusercontent.com/login?return_to=%2FWestlake-AI%2FSemiReward
Code https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward
Issues https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward/issues
Pull requests https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward/pulls
machine-learninghttps://patch-diff.githubusercontent.com/topics/machine-learning
natural-language-processinghttps://patch-diff.githubusercontent.com/topics/natural-language-processing
computer-visionhttps://patch-diff.githubusercontent.com/topics/computer-vision
regressionhttps://patch-diff.githubusercontent.com/topics/regression
transformerhttps://patch-diff.githubusercontent.com/topics/transformer
semi-supervised-learninghttps://patch-diff.githubusercontent.com/topics/semi-supervised-learning
audio-classificationhttps://patch-diff.githubusercontent.com/topics/audio-classification
weakly-supervised-learninghttps://patch-diff.githubusercontent.com/topics/weakly-supervised-learning
yahoo-answershttps://patch-diff.githubusercontent.com/topics/yahoo-answers
cifar-100https://patch-diff.githubusercontent.com/topics/cifar-100
label-noisehttps://patch-diff.githubusercontent.com/topics/label-noise
esc-50https://patch-diff.githubusercontent.com/topics/esc-50
vision-transformerhttps://patch-diff.githubusercontent.com/topics/vision-transformer
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
bobxwuhttps://patch-diff.githubusercontent.com/bobxwu
learning-from-rewards-llm-papershttps://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers
Star 63 https://patch-diff.githubusercontent.com/login?return_to=%2Fbobxwu%2Flearning-from-rewards-llm-papers
Code https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers
Issues https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers/issues
Pull requests https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
post-traininghttps://patch-diff.githubusercontent.com/topics/post-training
self-correctionhttps://patch-diff.githubusercontent.com/topics/self-correction
reward-learninghttps://patch-diff.githubusercontent.com/topics/reward-learning
large-language-modelshttps://patch-diff.githubusercontent.com/topics/large-language-models
llmhttps://patch-diff.githubusercontent.com/topics/llm
llmshttps://patch-diff.githubusercontent.com/topics/llms
reward-modelshttps://patch-diff.githubusercontent.com/topics/reward-models
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
reward-modelinghttps://patch-diff.githubusercontent.com/topics/reward-modeling
guided-decodinghttps://patch-diff.githubusercontent.com/topics/guided-decoding
test-time-scalinghttps://patch-diff.githubusercontent.com/topics/test-time-scaling
tongjingqihttps://patch-diff.githubusercontent.com/tongjingqi
Awesome-Agent-RLhttps://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL
Star 55 https://patch-diff.githubusercontent.com/login?return_to=%2Ftongjingqi%2FAwesome-Agent-RL
Code https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL
Issues https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL/issues
Pull requests https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL/pulls
agenthttps://patch-diff.githubusercontent.com/topics/agent
awesomehttps://patch-diff.githubusercontent.com/topics/awesome
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
rlhttps://patch-diff.githubusercontent.com/topics/rl
awesome-listhttps://patch-diff.githubusercontent.com/topics/awesome-list
llmhttps://patch-diff.githubusercontent.com/topics/llm
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
agentic-aihttps://patch-diff.githubusercontent.com/topics/agentic-ai
rlvrhttps://patch-diff.githubusercontent.com/topics/rlvr
agent-traininghttps://patch-diff.githubusercontent.com/topics/agent-training
Amirhosein-gh98https://patch-diff.githubusercontent.com/Amirhosein-gh98
Gnosishttps://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis
Star 29 https://patch-diff.githubusercontent.com/login?return_to=%2FAmirhosein-gh98%2FGnosis
Code https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis
Issues https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis/issues
Pull requests https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis/pulls
internalhttps://patch-diff.githubusercontent.com/topics/internal
ormhttps://patch-diff.githubusercontent.com/topics/orm
efficienthttps://patch-diff.githubusercontent.com/topics/efficient
circuitshttps://patch-diff.githubusercontent.com/topics/circuits
attentionhttps://patch-diff.githubusercontent.com/topics/attention
reasoninghttps://patch-diff.githubusercontent.com/topics/reasoning
error-detectionhttps://patch-diff.githubusercontent.com/topics/error-detection
hallucinationhttps://patch-diff.githubusercontent.com/topics/hallucination
self-awarenesshttps://patch-diff.githubusercontent.com/topics/self-awareness
latent-representationshttps://patch-diff.githubusercontent.com/topics/latent-representations
gnosis-safehttps://patch-diff.githubusercontent.com/topics/gnosis-safe
llmhttps://patch-diff.githubusercontent.com/topics/llm
large-language-modelhttps://patch-diff.githubusercontent.com/topics/large-language-model
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
hallucination-detectionhttps://patch-diff.githubusercontent.com/topics/hallucination-detection
InternLMhttps://patch-diff.githubusercontent.com/InternLM
Sparkhttps://patch-diff.githubusercontent.com/InternLM/Spark
Star 25 https://patch-diff.githubusercontent.com/login?return_to=%2FInternLM%2FSpark
Code https://patch-diff.githubusercontent.com/InternLM/Spark
Issues https://patch-diff.githubusercontent.com/InternLM/Spark/issues
Pull requests https://patch-diff.githubusercontent.com/InternLM/Spark/pulls
self-improvementhttps://patch-diff.githubusercontent.com/topics/self-improvement
multi-modalhttps://patch-diff.githubusercontent.com/topics/multi-modal
large-language-modelshttps://patch-diff.githubusercontent.com/topics/large-language-models
vision-language-modelhttps://patch-diff.githubusercontent.com/topics/vision-language-model
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
large-vision-language-modelshttps://patch-diff.githubusercontent.com/topics/large-vision-language-models
self-rewardinghttps://patch-diff.githubusercontent.com/topics/self-rewarding
math-reasoninghttps://patch-diff.githubusercontent.com/topics/math-reasoning
yeyimilkhttps://patch-diff.githubusercontent.com/yeyimilk
CrowdVLM-R1https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1
Star 21 https://patch-diff.githubusercontent.com/login?return_to=%2Fyeyimilk%2FCrowdVLM-R1
Code https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1
Issues https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1/issues
Pull requests https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
vlmhttps://patch-diff.githubusercontent.com/topics/vlm
crowdcountinghttps://patch-diff.githubusercontent.com/topics/crowdcounting
llmhttps://patch-diff.githubusercontent.com/topics/llm
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
r1-zerohttps://patch-diff.githubusercontent.com/topics/r1-zero
vlm-r1https://patch-diff.githubusercontent.com/topics/vlm-r1
multimodal-r1https://patch-diff.githubusercontent.com/topics/multimodal-r1
NiuTranshttps://patch-diff.githubusercontent.com/NiuTrans
GRAMhttps://patch-diff.githubusercontent.com/NiuTrans/GRAM
Star 17 https://patch-diff.githubusercontent.com/login?return_to=%2FNiuTrans%2FGRAM
Code https://patch-diff.githubusercontent.com/NiuTrans/GRAM
Issues https://patch-diff.githubusercontent.com/NiuTrans/GRAM/issues
Pull requests https://patch-diff.githubusercontent.com/NiuTrans/GRAM/pulls
generativehttps://patch-diff.githubusercontent.com/topics/generative
generalizationhttps://patch-diff.githubusercontent.com/topics/generalization
rlhfhttps://patch-diff.githubusercontent.com/topics/rlhf
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
rochitasundarhttps://patch-diff.githubusercontent.com/rochitasundar
Generative-AI-with-Large-Language-Modelshttps://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models
Star 16 https://patch-diff.githubusercontent.com/login?return_to=%2Frochitasundar%2FGenerative-AI-with-Large-Language-Models
Code https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models
Issues https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models/issues
Pull requests https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
transformerhttps://patch-diff.githubusercontent.com/topics/transformer
kl-divergencehttps://patch-diff.githubusercontent.com/topics/kl-divergence
proximal-policy-optimizationhttps://patch-diff.githubusercontent.com/topics/proximal-policy-optimization
large-language-modelshttps://patch-diff.githubusercontent.com/topics/large-language-models
prompt-engineeringhttps://patch-diff.githubusercontent.com/topics/prompt-engineering
flan-t5https://patch-diff.githubusercontent.com/topics/flan-t5
instruction-finetuninghttps://patch-diff.githubusercontent.com/topics/instruction-finetuning
low-rank-adaptationhttps://patch-diff.githubusercontent.com/topics/low-rank-adaptation
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
parameter-efficient-fine-tuninghttps://patch-diff.githubusercontent.com/topics/parameter-efficient-fine-tuning
llm-evaluationhttps://patch-diff.githubusercontent.com/topics/llm-evaluation
https://patch-diff.githubusercontent.com/itaychachy/RewardSDS
itaychachyhttps://patch-diff.githubusercontent.com/itaychachy
RewardSDShttps://patch-diff.githubusercontent.com/itaychachy/RewardSDS
Star 12 https://patch-diff.githubusercontent.com/login?return_to=%2Fitaychachy%2FRewardSDS
Code https://patch-diff.githubusercontent.com/itaychachy/RewardSDS
Issues https://patch-diff.githubusercontent.com/itaychachy/RewardSDS/issues
Pull requests https://patch-diff.githubusercontent.com/itaychachy/RewardSDS/pulls
aihttps://patch-diff.githubusercontent.com/topics/ai
computer-visionhttps://patch-diff.githubusercontent.com/topics/computer-vision
3d-generationhttps://patch-diff.githubusercontent.com/topics/3d-generation
mechine-learninghttps://patch-diff.githubusercontent.com/topics/mechine-learning
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
Junpliuhttps://patch-diff.githubusercontent.com/Junpliu
DocRewardhttps://patch-diff.githubusercontent.com/Junpliu/DocReward
Star 10 https://patch-diff.githubusercontent.com/login?return_to=%2FJunpliu%2FDocReward
Code https://patch-diff.githubusercontent.com/Junpliu/DocReward
Issues https://patch-diff.githubusercontent.com/Junpliu/DocReward/issues
Pull requests https://patch-diff.githubusercontent.com/Junpliu/DocReward/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
structurehttps://patch-diff.githubusercontent.com/topics/structure
stylehttps://patch-diff.githubusercontent.com/topics/style
documenthttps://patch-diff.githubusercontent.com/topics/document
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
AlignRMhttps://patch-diff.githubusercontent.com/AlignRM
CheemsRMhttps://patch-diff.githubusercontent.com/AlignRM/CheemsRM
Star 10 https://patch-diff.githubusercontent.com/login?return_to=%2FAlignRM%2FCheemsRM
Code https://patch-diff.githubusercontent.com/AlignRM/CheemsRM
Issues https://patch-diff.githubusercontent.com/AlignRM/CheemsRM/issues
Pull requests https://patch-diff.githubusercontent.com/AlignRM/CheemsRM/pulls
reinforcement-learninghttps://patch-diff.githubusercontent.com/topics/reinforcement-learning
large-language-modelhttps://patch-diff.githubusercontent.com/topics/large-language-model
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
hlp-aihttps://patch-diff.githubusercontent.com/hlp-ai
miniChatGPThttps://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT
Star 6 https://patch-diff.githubusercontent.com/login?return_to=%2Fhlp-ai%2FminiChatGPT
Code https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT
Issues https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT/issues
Pull requests https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT/pulls
pytorchhttps://patch-diff.githubusercontent.com/topics/pytorch
ppohttps://patch-diff.githubusercontent.com/topics/ppo
sfthttps://patch-diff.githubusercontent.com/topics/sft
gpt2https://patch-diff.githubusercontent.com/topics/gpt2
chatgpthttps://patch-diff.githubusercontent.com/topics/chatgpt
instructgpthttps://patch-diff.githubusercontent.com/topics/instructgpt
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
taishan1994https://patch-diff.githubusercontent.com/taishan1994
Reward-Model-Finetuninghttps://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning
Star 4 https://patch-diff.githubusercontent.com/login?return_to=%2Ftaishan1994%2FReward-Model-Finetuning
Code https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning
Issues https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning/issues
Pull requests https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning/pulls
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
qwen2https://patch-diff.githubusercontent.com/topics/qwen2
kaicheng001https://patch-diff.githubusercontent.com/kaicheng001
Awesome-R1https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1
Star 2 https://patch-diff.githubusercontent.com/login?return_to=%2Fkaicheng001%2FAwesome-R1
Code https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1
Issues https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1/issues
Pull requests https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1/pulls
awesomehttps://patch-diff.githubusercontent.com/topics/awesome
thinkinghttps://patch-diff.githubusercontent.com/topics/thinking
r1https://patch-diff.githubusercontent.com/topics/r1
vlmhttps://patch-diff.githubusercontent.com/topics/vlm
lmmhttps://patch-diff.githubusercontent.com/topics/lmm
llmhttps://patch-diff.githubusercontent.com/topics/llm
mllmhttps://patch-diff.githubusercontent.com/topics/mllm
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
reasoning-modelshttps://patch-diff.githubusercontent.com/topics/reasoning-models
deepseek-r1https://patch-diff.githubusercontent.com/topics/deepseek-r1
techandy42https://patch-diff.githubusercontent.com/techandy42
LLM_Reward_Modelhttps://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model
Star 2 https://patch-diff.githubusercontent.com/login?return_to=%2Ftechandy42%2FLLM_Reward_Model
Code https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model
Issues https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model/issues
Pull requests https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model/pulls
language-modelhttps://patch-diff.githubusercontent.com/topics/language-model
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
hfrlhttps://patch-diff.githubusercontent.com/topics/hfrl
m-serioushttps://patch-diff.githubusercontent.com/m-serious
module-reward-modelshttps://patch-diff.githubusercontent.com/m-serious/module-reward-models
Star 1 https://patch-diff.githubusercontent.com/login?return_to=%2Fm-serious%2Fmodule-reward-models
Code https://patch-diff.githubusercontent.com/m-serious/module-reward-models
Issues https://patch-diff.githubusercontent.com/m-serious/module-reward-models/issues
Pull requests https://patch-diff.githubusercontent.com/m-serious/module-reward-models/pulls
reinforcement-learning-agenthttps://patch-diff.githubusercontent.com/topics/reinforcement-learning-agent
reward-modelhttps://patch-diff.githubusercontent.com/topics/reward-model
Curate this topic https://github.com/github/explore/tree/master/CONTRIBUTING.md?source=add-description-reward-model
Learn more https://docs.github.com/en/articles/classifying-your-repository-with-topics
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.