René's URL Explorer Experiment


Title: GitHub - voidful/TextRL: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Open Graph Title: GitHub - voidful/TextRL: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

X Title: GitHub - voidful/TextRL: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Description: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL) - voidful/TextRL

Open Graph Description: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL) - voidful/TextRL

X Description: Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL) - voidful/TextRL

Opengraph URL: https://github.com/voidful/TextRL

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:bd99d16d-1f20-a6d0-9e54-7023276e7a5a
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-id813A:30E8A:8170165:A97EA1D:697576EA
html-safe-nonce4747a4014183494c4e1d50df80434b01fa1df8114d611c10cb5c5f5e78464940
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4MTNBOjMwRThBOjgxNzAxNjU6QTk3RUExRDo2OTc1NzZFQSIsInZpc2l0b3JfaWQiOiI3MDEwNjc3NjYxNjUwNzQ1MDY2IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmacbb28dddfd961eb3003ce4b057c5122b36d556d6b3f420094b08875ff77910f0e
hovercard-subject-tagrepository:349008051
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/voidful/TextRL
twitter:imagehttps://opengraph.githubassets.com/f46142b7aede391cc80fdff28a6bd17656ee9b157c8b727e25fd8225ef4e74fc/voidful/TextRL
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/f46142b7aede391cc80fdff28a6bd17656ee9b157c8b727e25fd8225ef4e74fc/voidful/TextRL
og:image:altImplementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL) - voidful/TextRL
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None4a4bf5f4e28041a9d2e5c107d7d20b78b4294ba261cab243b28167c16a623a1f
turbo-cache-controlno-preview
go-importgithub.com/voidful/TextRL git https://github.com/voidful/TextRL.git
octolytics-dimension-user_id10904842
octolytics-dimension-user_loginvoidful
octolytics-dimension-repository_id349008051
octolytics-dimension-repository_nwovoidful/TextRL
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id349008051
octolytics-dimension-repository_network_root_nwovoidful/TextRL
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release488b30e96dfd057fbbe44c6665ccbc030b729dde
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/voidful/TextRL#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fvoidful%2FTextRL
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fvoidful%2FTextRL
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=voidful%2FTextRL
Reloadhttps://patch-diff.githubusercontent.com/voidful/TextRL
Reloadhttps://patch-diff.githubusercontent.com/voidful/TextRL
Reloadhttps://patch-diff.githubusercontent.com/voidful/TextRL
voidful https://patch-diff.githubusercontent.com/voidful
TextRLhttps://patch-diff.githubusercontent.com/voidful/TextRL
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fvoidful%2FTextRL
Fork 61 https://patch-diff.githubusercontent.com/login?return_to=%2Fvoidful%2FTextRL
Star 565 https://patch-diff.githubusercontent.com/login?return_to=%2Fvoidful%2FTextRL
MIT license https://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/LICENSE
565 stars https://patch-diff.githubusercontent.com/voidful/TextRL/stargazers
61 forks https://patch-diff.githubusercontent.com/voidful/TextRL/forks
Branches https://patch-diff.githubusercontent.com/voidful/TextRL/branches
Tags https://patch-diff.githubusercontent.com/voidful/TextRL/tags
Activity https://patch-diff.githubusercontent.com/voidful/TextRL/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2Fvoidful%2FTextRL
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fvoidful%2FTextRL
Code https://patch-diff.githubusercontent.com/voidful/TextRL
Issues 3 https://patch-diff.githubusercontent.com/voidful/TextRL/issues
Pull requests 1 https://patch-diff.githubusercontent.com/voidful/TextRL/pulls
Actions https://patch-diff.githubusercontent.com/voidful/TextRL/actions
Projects 0 https://patch-diff.githubusercontent.com/voidful/TextRL/projects
Security 0 https://patch-diff.githubusercontent.com/voidful/TextRL/security
Insights https://patch-diff.githubusercontent.com/voidful/TextRL/pulse
Code https://patch-diff.githubusercontent.com/voidful/TextRL
Issues https://patch-diff.githubusercontent.com/voidful/TextRL/issues
Pull requests https://patch-diff.githubusercontent.com/voidful/TextRL/pulls
Actions https://patch-diff.githubusercontent.com/voidful/TextRL/actions
Projects https://patch-diff.githubusercontent.com/voidful/TextRL/projects
Security https://patch-diff.githubusercontent.com/voidful/TextRL/security
Insights https://patch-diff.githubusercontent.com/voidful/TextRL/pulse
Brancheshttps://patch-diff.githubusercontent.com/voidful/TextRL/branches
Tagshttps://patch-diff.githubusercontent.com/voidful/TextRL/tags
https://patch-diff.githubusercontent.com/voidful/TextRL/branches
https://patch-diff.githubusercontent.com/voidful/TextRL/tags
70 Commitshttps://patch-diff.githubusercontent.com/voidful/TextRL/commits/main/
https://patch-diff.githubusercontent.com/voidful/TextRL/commits/main/
examplehttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/example
examplehttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/example
imghttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/img
imghttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/img
textrlhttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/textrl
textrlhttps://patch-diff.githubusercontent.com/voidful/TextRL/tree/main/textrl
.gitignorehttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/.gitignore
LICENSEhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/LICENSE
LICENSEhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/LICENSE
README.mdhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/README.md
README.mdhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/README.md
requirement.txthttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/requirement.txt
requirement.txthttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/requirement.txt
setup.pyhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/setup.py
setup.pyhttps://patch-diff.githubusercontent.com/voidful/TextRL/blob/main/setup.py
READMEhttps://patch-diff.githubusercontent.com/voidful/TextRL
MIT licensehttps://patch-diff.githubusercontent.com/voidful/TextRL
https://patch-diff.githubusercontent.com/voidful/TextRL#textrl-text-generation-with-reinforcement-learning
https://pypi.org/project/textrl/
https://github.com/voidful/tfkit
https://github.com/voidful/tfkit
https://www.codefactor.io/repository/github/voidful/textrl
https://github.com/voidful/textrl
https://github.com/voidful/TextRL/raw/main/img/Designer.png
https://patch-diff.githubusercontent.com/voidful/TextRL#table-of-contents
Introductionhttps://patch-diff.githubusercontent.com/voidful/TextRL#introduction
Exampleshttps://patch-diff.githubusercontent.com/voidful/TextRL#examples
GPT-2 Examplehttps://patch-diff.githubusercontent.com/voidful/TextRL#gpt-2-example
FLAN-T5 Examplehttps://patch-diff.githubusercontent.com/voidful/TextRL#flan-t5-example
Bigscience/BLOOMZ-7B1-MT Examplehttps://patch-diff.githubusercontent.com/voidful/TextRL#bigsciencebloomz-7b1-mt-example
176B BLOOM Examplehttps://patch-diff.githubusercontent.com/voidful/TextRL#176b-bloom-example
Controllable Generation via RL Examplehttps://patch-diff.githubusercontent.com/voidful/TextRL#controllable-generation-via-rl-example
Installationhttps://patch-diff.githubusercontent.com/voidful/TextRL#installation
Pip Installhttps://patch-diff.githubusercontent.com/voidful/TextRL#pip-install
Build from Sourcehttps://patch-diff.githubusercontent.com/voidful/TextRL#build-from-source
Usagehttps://patch-diff.githubusercontent.com/voidful/TextRL#usage
Initialize Agent and Environmenthttps://patch-diff.githubusercontent.com/voidful/TextRL#initialize-agent-and-environment
Setup Reward Function for Environmenthttps://patch-diff.githubusercontent.com/voidful/TextRL#setup-reward-function-for-environment
Prepare for Traininghttps://patch-diff.githubusercontent.com/voidful/TextRL#prepare-for-training
Traininghttps://patch-diff.githubusercontent.com/voidful/TextRL#training
Dump Modelhttps://patch-diff.githubusercontent.com/voidful/TextRL#dump-trained-model-to-huggingfaces-model
Key Parameters for RL Traininghttps://patch-diff.githubusercontent.com/voidful/TextRL#key-parameters-for-rl-training
https://patch-diff.githubusercontent.com/voidful/TextRL#introduction
Hugging Face's Transformershttps://github.com/huggingface/transformers
PFRLhttps://github.com/pfnet/pfrl
OpenAI GYMhttps://gym.openai.com
https://patch-diff.githubusercontent.com/voidful/TextRL#example---gpt2
https://patch-diff.githubusercontent.com/voidful/TextRL#gpt2-example
https://patch-diff.githubusercontent.com/voidful/TextRL#example---flan-t5
https://patch-diff.githubusercontent.com/voidful/TextRL#example-code
google/flan-t5-basehttps://colab.research.google.com/drive/1DYHt0mi6cyl8ZTMJEkMNpsSZCCvR4jM1?usp=sharing
https://patch-diff.githubusercontent.com/voidful/TextRL#example---bigsciencebloomz-7b1-mt
https://patch-diff.githubusercontent.com/voidful/TextRL#bloomz-7b1-mt-example
https://patch-diff.githubusercontent.com/voidful/TextRL#example---176b-bloom
https://patch-diff.githubusercontent.com/voidful/TextRL#bloomz-176b-example
https://github.com/bigscience-workshop/petalshttps://github.com/bigscience-workshop/petals
https://patch-diff.githubusercontent.com/voidful/TextRL#example---controllable-generation-via-rl-to-let-elon-musk-speak-ill-of-doge
https://github.com/voidful/TextRL/blob/main/example/2022-12-10-textrl-elon-musk.ipynbhttps://github.com/voidful/TextRL/blob/main/example/2022-12-10-textrl-elon-musk.ipynb
bigscience/bloom-560mhttps://colab.research.google.com/drive/1ThSHtkfzC2dDc6JOdeCTthuDovTCheRf?usp=sharing
huggingtweets/elonmuskhttps://colab.research.google.com/drive/149MG6uxu7CjMU1pXnYXfSvJ6HEdwcOFt?usp=sharing
https://patch-diff.githubusercontent.com/voidful/TextRL#installation
https://patch-diff.githubusercontent.com/voidful/TextRL#pip-install
https://patch-diff.githubusercontent.com/voidful/TextRL#build-from-source
https://patch-diff.githubusercontent.com/voidful/TextRL#usage
https://patch-diff.githubusercontent.com/voidful/TextRL#initialize-agent-and-environment
https://patch-diff.githubusercontent.com/voidful/TextRL#set-up-reward-function-for-environment
https://patch-diff.githubusercontent.com/voidful/TextRL#prepare-for-training
https://patch-diff.githubusercontent.com/voidful/TextRL#train
https://patch-diff.githubusercontent.com/voidful/TextRL#prediction
https://patch-diff.githubusercontent.com/voidful/TextRL#dump-trained-model-to-huggingfaces-model
https://patch-diff.githubusercontent.com/voidful/TextRL#key-parameters-for-rl-training
nlp https://patch-diff.githubusercontent.com/topics/nlp
reinforcement-learning https://patch-diff.githubusercontent.com/topics/reinforcement-learning
pytorch https://patch-diff.githubusercontent.com/topics/pytorch
nlg https://patch-diff.githubusercontent.com/topics/nlg
language-model https://patch-diff.githubusercontent.com/topics/language-model
gpt-2 https://patch-diff.githubusercontent.com/topics/gpt-2
gpt-3 https://patch-diff.githubusercontent.com/topics/gpt-3
controlled-nlg https://patch-diff.githubusercontent.com/topics/controlled-nlg
chatgpt https://patch-diff.githubusercontent.com/topics/chatgpt
rlhf https://patch-diff.githubusercontent.com/topics/rlhf
Readme https://patch-diff.githubusercontent.com/voidful/TextRL#readme-ov-file
MIT license https://patch-diff.githubusercontent.com/voidful/TextRL#MIT-1-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/voidful/TextRL
Activityhttps://patch-diff.githubusercontent.com/voidful/TextRL/activity
565 starshttps://patch-diff.githubusercontent.com/voidful/TextRL/stargazers
9 watchinghttps://patch-diff.githubusercontent.com/voidful/TextRL/watchers
61 forkshttps://patch-diff.githubusercontent.com/voidful/TextRL/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2Fvoidful%2FTextRL&report=voidful+%28user%29
Releaseshttps://patch-diff.githubusercontent.com/voidful/TextRL/releases
Packages 0https://patch-diff.githubusercontent.com/users/voidful/packages?repo_name=TextRL
Please reload this pagehttps://patch-diff.githubusercontent.com/voidful/TextRL
Contributors 3https://patch-diff.githubusercontent.com/voidful/TextRL/graphs/contributors
Please reload this pagehttps://patch-diff.githubusercontent.com/voidful/TextRL
Python 87.4% https://patch-diff.githubusercontent.com/voidful/TextRL/search?l=python
Jupyter Notebook 12.6% https://patch-diff.githubusercontent.com/voidful/TextRL/search?l=jupyter-notebook
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.