René's URL Explorer Experiment

Title: reward-model · GitHub Topics · GitHub

Open Graph Title: Build software better, together

X Title: GitHub

Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Open Graph Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

X Description: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

Opengraph URL: https://github.com

X: github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern	/topics/:topic_name(.:format)
route-controller	topics
route-action	show
fetch-nonce	v2:7ddfab90-466d-2e7a-0120-b6f83e3f9e40
current-catalog-service-hash	82c569b93da5c18ed649ebd4c2c79437db4611a6a1373e805a3cb001c64130b7
request-id	E402:6A780:3813891:4A15717:698C5B62
html-safe-nonce	fdfa7f09dbf1e73e11d9f19da29eea9930eb882edb8bc77148cdf4ef77182b04
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFNDAyOjZBNzgwOjM4MTM4OTE6NEExNTcxNzo2OThDNUI2MiIsInZpc2l0b3JfaWQiOiI4NTg2NTY1MTA2MDI3NzQ4MTk0IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmac	f081d3da36ab0f2facb95b5c18b86cbd6b4f6f0a874b12f735359b2f5cd23d45
github-keyboard-shortcuts	copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/topics/reward-model
og:site_name	GitHub
og:image	https://github.githubassets.com/assets/github-octocat-13c86b8b336d.png
og:image:type	image/png
og:image:width	1200
og:image:height	620
twitter:site:id	13334762
twitter:creator	github
twitter:creator:id	13334762
twitter:card	summary_large_image
twitter:image	https://github.githubassets.com/assets/github-logo-55c5b9a1fe52.png
twitter:image:width	1200
twitter:image:height	1200
hostname	github.com
expected-hostname	github.com
None	640eeb7b6ff4d8d106235d228c0c286e82592d4d2403227b5b2b4fc5832297a4
turbo-cache-control	no-preview
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	3d444f0a47beeeac94cddbb51c91ab408befe8d4
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://patch-diff.githubusercontent.com/topics/reward-model#start-of-content
	https://patch-diff.githubusercontent.com/
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Ftopics%2Freward-model
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Ftopics%2Freward-model
Sign up	https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2Ftopics%2Freward-model&source=header
Reload	https://patch-diff.githubusercontent.com/topics/reward-model
Reload	https://patch-diff.githubusercontent.com/topics/reward-model
Reload	https://patch-diff.githubusercontent.com/topics/reward-model
Explore	https://patch-diff.githubusercontent.com/explore
Topics	https://patch-diff.githubusercontent.com/topics
Trending	https://patch-diff.githubusercontent.com/trending
Collections	https://patch-diff.githubusercontent.com/collections
Events	https://patch-diff.githubusercontent.com/events
GitHub Sponsors	https://patch-diff.githubusercontent.com/sponsors/explore
Star	https://patch-diff.githubusercontent.com/login?return_to=%2Ftopic.reward-model
All 32	https://github.com/topics/reward-model
Python 22	https://github.com/topics/reward-model?l=python
Jupyter Notebook 5	https://github.com/topics/reward-model?l=jupyter+notebook
Most stars	https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=stars
Fewest stars	https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=stars
Most forks	https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=forks
Fewest forks	https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=forks
Recently updated	https://patch-diff.githubusercontent.com/topics/reward-model?o=desc&s=updated
Least recently updated	https://patch-diff.githubusercontent.com/topics/reward-model?o=asc&s=updated
agentscope-ai	https://patch-diff.githubusercontent.com/agentscope-ai
OpenJudge	https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge
Star 384	https://patch-diff.githubusercontent.com/login?return_to=%2Fagentscope-ai%2FOpenJudge
Code	https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge
Issues	https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/issues
Pull requests	https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/pulls
Discussions	https://patch-diff.githubusercontent.com/agentscope-ai/OpenJudge/discussions
agent	https://patch-diff.githubusercontent.com/topics/agent
evaluation	https://patch-diff.githubusercontent.com/topics/evaluation
alignment	https://patch-diff.githubusercontent.com/topics/alignment
grader	https://patch-diff.githubusercontent.com/topics/grader
reward	https://patch-diff.githubusercontent.com/topics/reward
llm	https://patch-diff.githubusercontent.com/topics/llm
rlhf	https://patch-diff.githubusercontent.com/topics/rlhf
llmops	https://patch-diff.githubusercontent.com/topics/llmops
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
wendell0218	https://patch-diff.githubusercontent.com/wendell0218
Awesome-RL-for-Video-Generation	https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation
Star 342	https://patch-diff.githubusercontent.com/login?return_to=%2Fwendell0218%2FAwesome-RL-for-Video-Generation
Code	https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation
Issues	https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation/issues
Pull requests	https://patch-diff.githubusercontent.com/wendell0218/Awesome-RL-for-Video-Generation/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
ppo	https://patch-diff.githubusercontent.com/topics/ppo
video-generation	https://patch-diff.githubusercontent.com/topics/video-generation
dpo	https://patch-diff.githubusercontent.com/topics/dpo
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
grpo	https://patch-diff.githubusercontent.com/topics/grpo
opendilab	https://patch-diff.githubusercontent.com/opendilab
LightRFT	https://patch-diff.githubusercontent.com/opendilab/LightRFT
Star 157	https://patch-diff.githubusercontent.com/login?return_to=%2Fopendilab%2FLightRFT
Code	https://patch-diff.githubusercontent.com/opendilab/LightRFT
Issues	https://patch-diff.githubusercontent.com/opendilab/LightRFT/issues
Pull requests	https://patch-diff.githubusercontent.com/opendilab/LightRFT/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
multi-modal	https://patch-diff.githubusercontent.com/topics/multi-modal
vlm	https://patch-diff.githubusercontent.com/topics/vlm
rft	https://patch-diff.githubusercontent.com/topics/rft
llm	https://patch-diff.githubusercontent.com/topics/llm
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
llm-training	https://patch-diff.githubusercontent.com/topics/llm-training
grpo	https://patch-diff.githubusercontent.com/topics/grpo
dapo	https://patch-diff.githubusercontent.com/topics/dapo
VectorInstitute	https://patch-diff.githubusercontent.com/VectorInstitute
vector-inference	https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference
Star 91	https://patch-diff.githubusercontent.com/login?return_to=%2FVectorInstitute%2Fvector-inference
Code	https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference
Issues	https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference/issues
Pull requests	https://patch-diff.githubusercontent.com/VectorInstitute/vector-inference/pulls
inference	https://patch-diff.githubusercontent.com/topics/inference
speech-to-text	https://patch-diff.githubusercontent.com/topics/speech-to-text
vlm	https://patch-diff.githubusercontent.com/topics/vlm
text-embedding	https://patch-diff.githubusercontent.com/topics/text-embedding
multimodal	https://patch-diff.githubusercontent.com/topics/multimodal
audio-transcription	https://patch-diff.githubusercontent.com/topics/audio-transcription
llm	https://patch-diff.githubusercontent.com/topics/llm
vllm	https://patch-diff.githubusercontent.com/topics/vllm
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
llm-infernece	https://patch-diff.githubusercontent.com/topics/llm-infernece
sglang	https://patch-diff.githubusercontent.com/topics/sglang
llm-infrastructure	https://patch-diff.githubusercontent.com/topics/llm-infrastructure
Westlake-AI	https://patch-diff.githubusercontent.com/Westlake-AI
SemiReward	https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward
Star 77	https://patch-diff.githubusercontent.com/login?return_to=%2FWestlake-AI%2FSemiReward
Code	https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward
Issues	https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward/issues
Pull requests	https://patch-diff.githubusercontent.com/Westlake-AI/SemiReward/pulls
machine-learning	https://patch-diff.githubusercontent.com/topics/machine-learning
natural-language-processing	https://patch-diff.githubusercontent.com/topics/natural-language-processing
computer-vision	https://patch-diff.githubusercontent.com/topics/computer-vision
regression	https://patch-diff.githubusercontent.com/topics/regression
transformer	https://patch-diff.githubusercontent.com/topics/transformer
semi-supervised-learning	https://patch-diff.githubusercontent.com/topics/semi-supervised-learning
audio-classification	https://patch-diff.githubusercontent.com/topics/audio-classification
weakly-supervised-learning	https://patch-diff.githubusercontent.com/topics/weakly-supervised-learning
yahoo-answers	https://patch-diff.githubusercontent.com/topics/yahoo-answers
cifar-100	https://patch-diff.githubusercontent.com/topics/cifar-100
label-noise	https://patch-diff.githubusercontent.com/topics/label-noise
esc-50	https://patch-diff.githubusercontent.com/topics/esc-50
vision-transformer	https://patch-diff.githubusercontent.com/topics/vision-transformer
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
bobxwu	https://patch-diff.githubusercontent.com/bobxwu
learning-from-rewards-llm-papers	https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers
Star 63	https://patch-diff.githubusercontent.com/login?return_to=%2Fbobxwu%2Flearning-from-rewards-llm-papers
Code	https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers
Issues	https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers/issues
Pull requests	https://patch-diff.githubusercontent.com/bobxwu/learning-from-rewards-llm-papers/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
post-training	https://patch-diff.githubusercontent.com/topics/post-training
self-correction	https://patch-diff.githubusercontent.com/topics/self-correction
reward-learning	https://patch-diff.githubusercontent.com/topics/reward-learning
large-language-models	https://patch-diff.githubusercontent.com/topics/large-language-models
llm	https://patch-diff.githubusercontent.com/topics/llm
llms	https://patch-diff.githubusercontent.com/topics/llms
reward-models	https://patch-diff.githubusercontent.com/topics/reward-models
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
reward-modeling	https://patch-diff.githubusercontent.com/topics/reward-modeling
guided-decoding	https://patch-diff.githubusercontent.com/topics/guided-decoding
test-time-scaling	https://patch-diff.githubusercontent.com/topics/test-time-scaling
tongjingqi	https://patch-diff.githubusercontent.com/tongjingqi
Awesome-Agent-RL	https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL
Star 55	https://patch-diff.githubusercontent.com/login?return_to=%2Ftongjingqi%2FAwesome-Agent-RL
Code	https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL
Issues	https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL/issues
Pull requests	https://patch-diff.githubusercontent.com/tongjingqi/Awesome-Agent-RL/pulls
agent	https://patch-diff.githubusercontent.com/topics/agent
awesome	https://patch-diff.githubusercontent.com/topics/awesome
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
rl	https://patch-diff.githubusercontent.com/topics/rl
awesome-list	https://patch-diff.githubusercontent.com/topics/awesome-list
llm	https://patch-diff.githubusercontent.com/topics/llm
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
agentic-ai	https://patch-diff.githubusercontent.com/topics/agentic-ai
rlvr	https://patch-diff.githubusercontent.com/topics/rlvr
agent-training	https://patch-diff.githubusercontent.com/topics/agent-training
Amirhosein-gh98	https://patch-diff.githubusercontent.com/Amirhosein-gh98
Gnosis	https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis
Star 29	https://patch-diff.githubusercontent.com/login?return_to=%2FAmirhosein-gh98%2FGnosis
Code	https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis
Issues	https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis/issues
Pull requests	https://patch-diff.githubusercontent.com/Amirhosein-gh98/Gnosis/pulls
internal	https://patch-diff.githubusercontent.com/topics/internal
orm	https://patch-diff.githubusercontent.com/topics/orm
efficient	https://patch-diff.githubusercontent.com/topics/efficient
circuits	https://patch-diff.githubusercontent.com/topics/circuits
attention	https://patch-diff.githubusercontent.com/topics/attention
reasoning	https://patch-diff.githubusercontent.com/topics/reasoning
error-detection	https://patch-diff.githubusercontent.com/topics/error-detection
hallucination	https://patch-diff.githubusercontent.com/topics/hallucination
self-awareness	https://patch-diff.githubusercontent.com/topics/self-awareness
latent-representations	https://patch-diff.githubusercontent.com/topics/latent-representations
gnosis-safe	https://patch-diff.githubusercontent.com/topics/gnosis-safe
llm	https://patch-diff.githubusercontent.com/topics/llm
large-language-model	https://patch-diff.githubusercontent.com/topics/large-language-model
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
hallucination-detection	https://patch-diff.githubusercontent.com/topics/hallucination-detection
InternLM	https://patch-diff.githubusercontent.com/InternLM
Spark	https://patch-diff.githubusercontent.com/InternLM/Spark
Star 25	https://patch-diff.githubusercontent.com/login?return_to=%2FInternLM%2FSpark
Code	https://patch-diff.githubusercontent.com/InternLM/Spark
Issues	https://patch-diff.githubusercontent.com/InternLM/Spark/issues
Pull requests	https://patch-diff.githubusercontent.com/InternLM/Spark/pulls
self-improvement	https://patch-diff.githubusercontent.com/topics/self-improvement
multi-modal	https://patch-diff.githubusercontent.com/topics/multi-modal
large-language-models	https://patch-diff.githubusercontent.com/topics/large-language-models
vision-language-model	https://patch-diff.githubusercontent.com/topics/vision-language-model
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
large-vision-language-models	https://patch-diff.githubusercontent.com/topics/large-vision-language-models
self-rewarding	https://patch-diff.githubusercontent.com/topics/self-rewarding
math-reasoning	https://patch-diff.githubusercontent.com/topics/math-reasoning
yeyimilk	https://patch-diff.githubusercontent.com/yeyimilk
CrowdVLM-R1	https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1
Star 21	https://patch-diff.githubusercontent.com/login?return_to=%2Fyeyimilk%2FCrowdVLM-R1
Code	https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1
Issues	https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1/issues
Pull requests	https://patch-diff.githubusercontent.com/yeyimilk/CrowdVLM-R1/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
vlm	https://patch-diff.githubusercontent.com/topics/vlm
crowdcounting	https://patch-diff.githubusercontent.com/topics/crowdcounting
llm	https://patch-diff.githubusercontent.com/topics/llm
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
r1-zero	https://patch-diff.githubusercontent.com/topics/r1-zero
vlm-r1	https://patch-diff.githubusercontent.com/topics/vlm-r1
multimodal-r1	https://patch-diff.githubusercontent.com/topics/multimodal-r1
NiuTrans	https://patch-diff.githubusercontent.com/NiuTrans
GRAM	https://patch-diff.githubusercontent.com/NiuTrans/GRAM
Star 17	https://patch-diff.githubusercontent.com/login?return_to=%2FNiuTrans%2FGRAM
Code	https://patch-diff.githubusercontent.com/NiuTrans/GRAM
Issues	https://patch-diff.githubusercontent.com/NiuTrans/GRAM/issues
Pull requests	https://patch-diff.githubusercontent.com/NiuTrans/GRAM/pulls
generative	https://patch-diff.githubusercontent.com/topics/generative
generalization	https://patch-diff.githubusercontent.com/topics/generalization
rlhf	https://patch-diff.githubusercontent.com/topics/rlhf
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
rochitasundar	https://patch-diff.githubusercontent.com/rochitasundar
Generative-AI-with-Large-Language-Models	https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models
Star 16	https://patch-diff.githubusercontent.com/login?return_to=%2Frochitasundar%2FGenerative-AI-with-Large-Language-Models
Code	https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models
Issues	https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models/issues
Pull requests	https://patch-diff.githubusercontent.com/rochitasundar/Generative-AI-with-Large-Language-Models/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
transformer	https://patch-diff.githubusercontent.com/topics/transformer
kl-divergence	https://patch-diff.githubusercontent.com/topics/kl-divergence
proximal-policy-optimization	https://patch-diff.githubusercontent.com/topics/proximal-policy-optimization
large-language-models	https://patch-diff.githubusercontent.com/topics/large-language-models
prompt-engineering	https://patch-diff.githubusercontent.com/topics/prompt-engineering
flan-t5	https://patch-diff.githubusercontent.com/topics/flan-t5
instruction-finetuning	https://patch-diff.githubusercontent.com/topics/instruction-finetuning
low-rank-adaptation	https://patch-diff.githubusercontent.com/topics/low-rank-adaptation
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
parameter-efficient-fine-tuning	https://patch-diff.githubusercontent.com/topics/parameter-efficient-fine-tuning
llm-evaluation	https://patch-diff.githubusercontent.com/topics/llm-evaluation
	https://patch-diff.githubusercontent.com/itaychachy/RewardSDS
itaychachy	https://patch-diff.githubusercontent.com/itaychachy
RewardSDS	https://patch-diff.githubusercontent.com/itaychachy/RewardSDS
Star 12	https://patch-diff.githubusercontent.com/login?return_to=%2Fitaychachy%2FRewardSDS
Code	https://patch-diff.githubusercontent.com/itaychachy/RewardSDS
Issues	https://patch-diff.githubusercontent.com/itaychachy/RewardSDS/issues
Pull requests	https://patch-diff.githubusercontent.com/itaychachy/RewardSDS/pulls
ai	https://patch-diff.githubusercontent.com/topics/ai
computer-vision	https://patch-diff.githubusercontent.com/topics/computer-vision
3d-generation	https://patch-diff.githubusercontent.com/topics/3d-generation
mechine-learning	https://patch-diff.githubusercontent.com/topics/mechine-learning
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
Junpliu	https://patch-diff.githubusercontent.com/Junpliu
DocReward	https://patch-diff.githubusercontent.com/Junpliu/DocReward
Star 10	https://patch-diff.githubusercontent.com/login?return_to=%2FJunpliu%2FDocReward
Code	https://patch-diff.githubusercontent.com/Junpliu/DocReward
Issues	https://patch-diff.githubusercontent.com/Junpliu/DocReward/issues
Pull requests	https://patch-diff.githubusercontent.com/Junpliu/DocReward/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
structure	https://patch-diff.githubusercontent.com/topics/structure
style	https://patch-diff.githubusercontent.com/topics/style
document	https://patch-diff.githubusercontent.com/topics/document
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
AlignRM	https://patch-diff.githubusercontent.com/AlignRM
CheemsRM	https://patch-diff.githubusercontent.com/AlignRM/CheemsRM
Star 10	https://patch-diff.githubusercontent.com/login?return_to=%2FAlignRM%2FCheemsRM
Code	https://patch-diff.githubusercontent.com/AlignRM/CheemsRM
Issues	https://patch-diff.githubusercontent.com/AlignRM/CheemsRM/issues
Pull requests	https://patch-diff.githubusercontent.com/AlignRM/CheemsRM/pulls
reinforcement-learning	https://patch-diff.githubusercontent.com/topics/reinforcement-learning
large-language-model	https://patch-diff.githubusercontent.com/topics/large-language-model
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
hlp-ai	https://patch-diff.githubusercontent.com/hlp-ai
miniChatGPT	https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT
Star 6	https://patch-diff.githubusercontent.com/login?return_to=%2Fhlp-ai%2FminiChatGPT
Code	https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT
Issues	https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT/issues
Pull requests	https://patch-diff.githubusercontent.com/hlp-ai/miniChatGPT/pulls
pytorch	https://patch-diff.githubusercontent.com/topics/pytorch
ppo	https://patch-diff.githubusercontent.com/topics/ppo
sft	https://patch-diff.githubusercontent.com/topics/sft
gpt2	https://patch-diff.githubusercontent.com/topics/gpt2
chatgpt	https://patch-diff.githubusercontent.com/topics/chatgpt
instructgpt	https://patch-diff.githubusercontent.com/topics/instructgpt
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
taishan1994	https://patch-diff.githubusercontent.com/taishan1994
Reward-Model-Finetuning	https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning
Star 4	https://patch-diff.githubusercontent.com/login?return_to=%2Ftaishan1994%2FReward-Model-Finetuning
Code	https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning
Issues	https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning/issues
Pull requests	https://patch-diff.githubusercontent.com/taishan1994/Reward-Model-Finetuning/pulls
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
qwen2	https://patch-diff.githubusercontent.com/topics/qwen2
kaicheng001	https://patch-diff.githubusercontent.com/kaicheng001
Awesome-R1	https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1
Star 2	https://patch-diff.githubusercontent.com/login?return_to=%2Fkaicheng001%2FAwesome-R1
Code	https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1
Issues	https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1/issues
Pull requests	https://patch-diff.githubusercontent.com/kaicheng001/Awesome-R1/pulls
awesome	https://patch-diff.githubusercontent.com/topics/awesome
thinking	https://patch-diff.githubusercontent.com/topics/thinking
r1	https://patch-diff.githubusercontent.com/topics/r1
vlm	https://patch-diff.githubusercontent.com/topics/vlm
lmm	https://patch-diff.githubusercontent.com/topics/lmm
llm	https://patch-diff.githubusercontent.com/topics/llm
mllm	https://patch-diff.githubusercontent.com/topics/mllm
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
reasoning-models	https://patch-diff.githubusercontent.com/topics/reasoning-models
deepseek-r1	https://patch-diff.githubusercontent.com/topics/deepseek-r1
techandy42	https://patch-diff.githubusercontent.com/techandy42
LLM_Reward_Model	https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model
Star 2	https://patch-diff.githubusercontent.com/login?return_to=%2Ftechandy42%2FLLM_Reward_Model
Code	https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model
Issues	https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model/issues
Pull requests	https://patch-diff.githubusercontent.com/techandy42/LLM_Reward_Model/pulls
language-model	https://patch-diff.githubusercontent.com/topics/language-model
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
hfrl	https://patch-diff.githubusercontent.com/topics/hfrl
m-serious	https://patch-diff.githubusercontent.com/m-serious
module-reward-models	https://patch-diff.githubusercontent.com/m-serious/module-reward-models
Star 1	https://patch-diff.githubusercontent.com/login?return_to=%2Fm-serious%2Fmodule-reward-models
Code	https://patch-diff.githubusercontent.com/m-serious/module-reward-models
Issues	https://patch-diff.githubusercontent.com/m-serious/module-reward-models/issues
Pull requests	https://patch-diff.githubusercontent.com/m-serious/module-reward-models/pulls
reinforcement-learning-agent	https://patch-diff.githubusercontent.com/topics/reinforcement-learning-agent
reward-model	https://patch-diff.githubusercontent.com/topics/reward-model
Curate this topic	https://github.com/github/explore/tree/master/CONTRIBUTING.md?source=add-description-reward-model
Learn more	https://docs.github.com/en/articles/classifying-your-repository-with-topics
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.