| Skip to content | https://patch-diff.githubusercontent.com/S4Plus/transformers#start-of-content |
|
| https://patch-diff.githubusercontent.com/ |
|
Sign in
| https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FS4Plus%2Ftransformers |
| GitHub CopilotWrite better code with AI | https://github.com/features/copilot |
| GitHub SparkBuild and deploy intelligent apps | https://github.com/features/spark |
| GitHub ModelsManage and compare prompts | https://github.com/features/models |
| MCP RegistryNewIntegrate external tools | https://github.com/mcp |
| ActionsAutomate any workflow | https://github.com/features/actions |
| CodespacesInstant dev environments | https://github.com/features/codespaces |
| IssuesPlan and track work | https://github.com/features/issues |
| Code ReviewManage code changes | https://github.com/features/code-review |
| GitHub Advanced SecurityFind and fix vulnerabilities | https://github.com/security/advanced-security |
| Code securitySecure your code as you build | https://github.com/security/advanced-security/code-security |
| Secret protectionStop leaks before they start | https://github.com/security/advanced-security/secret-protection |
| Why GitHub | https://github.com/why-github |
| Documentation | https://docs.github.com |
| Blog | https://github.blog |
| Changelog | https://github.blog/changelog |
| Marketplace | https://github.com/marketplace |
| View all features | https://github.com/features |
| Enterprises | https://github.com/enterprise |
| Small and medium teams | https://github.com/team |
| Startups | https://github.com/enterprise/startups |
| Nonprofits | https://github.com/solutions/industry/nonprofits |
| App Modernization | https://github.com/solutions/use-case/app-modernization |
| DevSecOps | https://github.com/solutions/use-case/devsecops |
| DevOps | https://github.com/solutions/use-case/devops |
| CI/CD | https://github.com/solutions/use-case/ci-cd |
| View all use cases | https://github.com/solutions/use-case |
| Healthcare | https://github.com/solutions/industry/healthcare |
| Financial services | https://github.com/solutions/industry/financial-services |
| Manufacturing | https://github.com/solutions/industry/manufacturing |
| Government | https://github.com/solutions/industry/government |
| View all industries | https://github.com/solutions/industry |
| View all solutions | https://github.com/solutions |
| AI | https://github.com/resources/articles?topic=ai |
| Software Development | https://github.com/resources/articles?topic=software-development |
| DevOps | https://github.com/resources/articles?topic=devops |
| Security | https://github.com/resources/articles?topic=security |
| View all topics | https://github.com/resources/articles |
| Customer stories | https://github.com/customer-stories |
| Events & webinars | https://github.com/resources/events |
| Ebooks & reports | https://github.com/resources/whitepapers |
| Business insights | https://github.com/solutions/executive-insights |
| GitHub Skills | https://skills.github.com |
| Documentation | https://docs.github.com |
| Customer support | https://support.github.com |
| Community forum | https://github.com/orgs/community/discussions |
| Trust center | https://github.com/trust-center |
| Partners | https://github.com/partners |
| GitHub SponsorsFund open source developers | https://github.com/sponsors |
| Security Lab | https://securitylab.github.com |
| Maintainer Community | https://maintainers.github.com |
| Accelerator | https://github.com/accelerator |
| Archive Program | https://archiveprogram.github.com |
| Topics | https://github.com/topics |
| Trending | https://github.com/trending |
| Collections | https://github.com/collections |
| Enterprise platformAI-powered developer platform | https://github.com/enterprise |
| GitHub Advanced SecurityEnterprise-grade security features | https://github.com/security/advanced-security |
| Copilot for BusinessEnterprise-grade AI features | https://github.com/features/copilot/copilot-business |
| Premium SupportEnterprise-grade 24/7 support | https://github.com/premium-support |
| Pricing | https://github.com/pricing |
| Search syntax tips | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
| documentation | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
|
Sign in
| https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FS4Plus%2Ftransformers |
|
Sign up
| https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=S4Plus%2Ftransformers |
| Reload | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| Reload | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| Reload | https://patch-diff.githubusercontent.com/S4Plus/transformers |
|
S4Plus
| https://patch-diff.githubusercontent.com/S4Plus |
| transformers | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| huggingface/transformers | https://patch-diff.githubusercontent.com/huggingface/transformers |
|
Notifications
| https://patch-diff.githubusercontent.com/login?return_to=%2FS4Plus%2Ftransformers |
|
Fork
0
| https://patch-diff.githubusercontent.com/login?return_to=%2FS4Plus%2Ftransformers |
|
Star
0
| https://patch-diff.githubusercontent.com/login?return_to=%2FS4Plus%2Ftransformers |
| huggingface.co/transformers | https://huggingface.co/transformers |
|
Apache-2.0 license
| https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/LICENSE |
|
0
stars
| https://patch-diff.githubusercontent.com/S4Plus/transformers/stargazers |
|
31.9k
forks
| https://patch-diff.githubusercontent.com/S4Plus/transformers/forks |
|
Branches
| https://patch-diff.githubusercontent.com/S4Plus/transformers/branches |
|
Tags
| https://patch-diff.githubusercontent.com/S4Plus/transformers/tags |
|
Activity
| https://patch-diff.githubusercontent.com/S4Plus/transformers/activity |
|
Star
| https://patch-diff.githubusercontent.com/login?return_to=%2FS4Plus%2Ftransformers |
|
Notifications
| https://patch-diff.githubusercontent.com/login?return_to=%2FS4Plus%2Ftransformers |
|
Code
| https://patch-diff.githubusercontent.com/S4Plus/transformers |
|
Pull requests
0
| https://patch-diff.githubusercontent.com/S4Plus/transformers/pulls |
|
Actions
| https://patch-diff.githubusercontent.com/S4Plus/transformers/actions |
|
Projects
0
| https://patch-diff.githubusercontent.com/S4Plus/transformers/projects |
|
Security
0
| https://patch-diff.githubusercontent.com/S4Plus/transformers/security |
|
Insights
| https://patch-diff.githubusercontent.com/S4Plus/transformers/pulse |
|
Code
| https://patch-diff.githubusercontent.com/S4Plus/transformers |
|
Pull requests
| https://patch-diff.githubusercontent.com/S4Plus/transformers/pulls |
|
Actions
| https://patch-diff.githubusercontent.com/S4Plus/transformers/actions |
|
Projects
| https://patch-diff.githubusercontent.com/S4Plus/transformers/projects |
|
Security
| https://patch-diff.githubusercontent.com/S4Plus/transformers/security |
|
Insights
| https://patch-diff.githubusercontent.com/S4Plus/transformers/pulse |
| Branches | https://patch-diff.githubusercontent.com/S4Plus/transformers/branches |
| Tags | https://patch-diff.githubusercontent.com/S4Plus/transformers/tags |
| https://patch-diff.githubusercontent.com/S4Plus/transformers/branches |
| https://patch-diff.githubusercontent.com/S4Plus/transformers/tags |
| 15,280 Commits | https://patch-diff.githubusercontent.com/S4Plus/transformers/commits/main/ |
| https://patch-diff.githubusercontent.com/S4Plus/transformers/commits/main/ |
| .circleci | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/.circleci |
| .circleci | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/.circleci |
| .github | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/.github |
| .github | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/.github |
| docker | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/docker |
| docker | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/docker |
| docs | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/docs |
| docs | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/docs |
| examples | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/examples |
| examples | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/examples |
| model_cards | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/model_cards |
| model_cards | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/model_cards |
| notebooks | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/notebooks |
| notebooks | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/notebooks |
| scripts | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/scripts |
| scripts | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/scripts |
| src/transformers | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/src/transformers |
| src/transformers | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/src/transformers |
| templates | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/templates |
| templates | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/templates |
| tests | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/tests |
| tests | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/tests |
| utils | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/utils |
| utils | https://patch-diff.githubusercontent.com/S4Plus/transformers/tree/main/utils |
| .coveragerc | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.coveragerc |
| .coveragerc | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.coveragerc |
| .gitattributes | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.gitattributes |
| .gitattributes | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.gitattributes |
| .gitignore | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.gitignore |
| .gitignore | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/.gitignore |
| CITATION.cff | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CITATION.cff |
| CITATION.cff | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CITATION.cff |
| CODE_OF_CONDUCT.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CODE_OF_CONDUCT.md |
| CODE_OF_CONDUCT.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CODE_OF_CONDUCT.md |
| CONTRIBUTING.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CONTRIBUTING.md |
| CONTRIBUTING.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CONTRIBUTING.md |
| ISSUES.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/ISSUES.md |
| ISSUES.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/ISSUES.md |
| LICENSE | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/LICENSE |
| LICENSE | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/LICENSE |
| Makefile | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/Makefile |
| Makefile | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/Makefile |
| README.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README.md |
| README.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README.md |
| README_de.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_de.md |
| README_de.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_de.md |
| README_es.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_es.md |
| README_es.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_es.md |
| README_fr.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_fr.md |
| README_fr.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_fr.md |
| README_hd.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_hd.md |
| README_hd.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_hd.md |
| README_ja.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ja.md |
| README_ja.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ja.md |
| README_ko.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ko.md |
| README_ko.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ko.md |
| README_pt-br.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_pt-br.md |
| README_pt-br.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_pt-br.md |
| README_ru.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ru.md |
| README_ru.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_ru.md |
| README_te.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_te.md |
| README_te.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_te.md |
| README_vi.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_vi.md |
| README_vi.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_vi.md |
| README_zh-hans.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_zh-hans.md |
| README_zh-hans.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_zh-hans.md |
| README_zh-hant.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_zh-hant.md |
| README_zh-hant.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/README_zh-hant.md |
| SECURITY.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/SECURITY.md |
| SECURITY.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/SECURITY.md |
| awesome-transformers.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/awesome-transformers.md |
| awesome-transformers.md | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/awesome-transformers.md |
| conftest.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/conftest.py |
| conftest.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/conftest.py |
| hubconf.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/hubconf.py |
| hubconf.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/hubconf.py |
| pyproject.toml | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/pyproject.toml |
| pyproject.toml | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/pyproject.toml |
| setup.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/setup.py |
| setup.py | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/setup.py |
| README | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| Code of conduct | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| Contributing | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| License | https://patch-diff.githubusercontent.com/S4Plus/transformers |
| Security | https://patch-diff.githubusercontent.com/S4Plus/transformers |
|
| https://circleci.com/gh/huggingface/transformers |
|
| https://github.com/huggingface/transformers/blob/main/LICENSE |
|
| https://huggingface.co/docs/transformers/index |
|
| https://github.com/huggingface/transformers/releases |
|
| https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md |
| https://zenodo.org/badge/latestdoi/155220641 |
| 简体中文 | https://github.com/huggingface/transformers/blob/main/README_zh-hans.md |
| 繁體中文 | https://github.com/huggingface/transformers/blob/main/README_zh-hant.md |
| 한국어 | https://github.com/huggingface/transformers/blob/main/README_ko.md |
| Español | https://github.com/huggingface/transformers/blob/main/README_es.md |
| 日本語 | https://github.com/huggingface/transformers/blob/main/README_ja.md |
| हिन्दी | https://github.com/huggingface/transformers/blob/main/README_hd.md |
| Русский | https://github.com/huggingface/transformers/blob/main/README_ru.md |
| Рortuguês | https://github.com/huggingface/transformers/blob/main/README_pt-br.md |
| తెలుగు | https://github.com/huggingface/transformers/blob/main/README_te.md |
| Français | https://github.com/huggingface/transformers/blob/main/README_fr.md |
| Deutsch | https://github.com/huggingface/transformers/blob/main/README_de.md |
| Tiếng Việt | https://github.com/huggingface/transformers/blob/main/README_vi.md |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#------------english---------简体中文---------繁體中文---------한국어---------español---------日本語---------हिन्दी---------русский---------рortuguês---------తెలుగు---------français---------deutsch---------tiếng-việt----- |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#----state-of-the-art-machine-learning-for-jax-pytorch-and-tensorflow |
| https://hf.co/course |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#---- |
| model hub | https://huggingface.co/models |
| Jax | https://jax.readthedocs.io/en/latest/ |
| PyTorch | https://pytorch.org/ |
| TensorFlow | https://www.tensorflow.org/ |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#online-demos |
| model hub | https://huggingface.co/models |
| private model hosting, versioning, & an inference API | https://huggingface.co/pricing |
| Masked word completion with BERT | https://huggingface.co/google-bert/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France |
| Named Entity Recognition with Electra | https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city |
| Text generation with Mistral | https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 |
| Natural Language Inference with RoBERTa | https://huggingface.co/FacebookAI/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal |
| Summarization with BART | https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct |
| Question answering with DistilBERT | https://huggingface.co/distilbert/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species |
| Translation with T5 | https://huggingface.co/google-t5/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin |
| Image classification with ViT | https://huggingface.co/google/vit-base-patch16-224 |
| Object Detection with DETR | https://huggingface.co/facebook/detr-resnet-50 |
| Semantic Segmentation with SegFormer | https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512 |
| Panoptic Segmentation with Mask2Former | https://huggingface.co/facebook/mask2former-swin-large-coco-panoptic |
| Depth Estimation with Depth Anything | https://huggingface.co/docs/transformers/main/model_doc/depth_anything |
| Video Classification with VideoMAE | https://huggingface.co/docs/transformers/model_doc/videomae |
| Universal Segmentation with OneFormer | https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large |
| Automatic Speech Recognition with Whisper | https://huggingface.co/openai/whisper-large-v3 |
| Keyword Spotting with Wav2Vec2 | https://huggingface.co/superb/wav2vec2-base-superb-ks |
| Audio Classification with Audio Spectrogram Transformer | https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593 |
| Table Question Answering with TAPAS | https://huggingface.co/google/tapas-base-finetuned-wtq |
| Visual Question Answering with ViLT | https://huggingface.co/dandelin/vilt-b32-finetuned-vqa |
| Image captioning with LLaVa | https://huggingface.co/llava-hf/llava-1.5-7b-hf |
| Zero-shot Image Classification with SigLIP | https://huggingface.co/google/siglip-so400m-patch14-384 |
| Document Question Answering with LayoutLM | https://huggingface.co/impira/layoutlm-document-qa |
| Zero-shot Video Classification with X-CLIP | https://huggingface.co/docs/transformers/model_doc/xclip |
| Zero-shot Object Detection with OWLv2 | https://huggingface.co/docs/transformers/en/model_doc/owlv2 |
| Zero-shot Image Segmentation with CLIPSeg | https://huggingface.co/docs/transformers/model_doc/clipseg |
| Automatic Mask Generation with SAM | https://huggingface.co/docs/transformers/model_doc/sam |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#100-projects-using-transformers |
| awesome-transformers | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/awesome-transformers.md |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#if-you-are-looking-for-custom-support-from-the-hugging-face-team |
|
| https://huggingface.co/support |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#quick-tour |
| https://camo.githubusercontent.com/4153c3f6ae91d9b2d21065c7ff7596b0e24b0c5c23febf3e66eb503324eed1d3/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f636f636f5f73616d706c652e706e67 |
| https://camo.githubusercontent.com/c8821fb97a1b525d5ea9b5f67057b37392c430ee7b5915b4d6ad481202f410a8/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f636f636f5f73616d706c655f706f73745f70726f6365737365642e706e67 |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#-------- |
| this tutorial | https://huggingface.co/docs/transformers/task_summary |
| Pytorch nn.Module | https://pytorch.org/docs/stable/nn.html#torch.nn.Module |
| TensorFlow tf.keras.Model | https://www.tensorflow.org/api_docs/python/tf/keras/Model |
| This tutorial | https://huggingface.co/docs/transformers/training |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#why-should-i-use-transformers |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#why-shouldnt-i-use-transformers |
| Accelerate | https://huggingface.co/docs/accelerate |
| examples folder | https://github.com/huggingface/transformers/tree/main/examples |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#installation |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#with-pip |
| virtual environment | https://docs.python.org/3/library/venv.html |
| user guide | https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/ |
| TensorFlow installation page | https://www.tensorflow.org/install/ |
| PyTorch installation page | https://pytorch.org/get-started/locally/#start-locally |
| Flax | https://github.com/google/flax#quick-install |
| Jax | https://github.com/google/jax#installation |
| install the library from source | https://huggingface.co/docs/transformers/installation#installing-from-source |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#with-conda |
| this issue | https://github.com/huggingface/huggingface_hub/issues/1062 |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#model-architectures |
| All the model checkpoints | https://huggingface.co/models |
| model hub | https://huggingface.co/models |
| users | https://huggingface.co/users |
| organizations | https://huggingface.co/organizations |
| https://camo.githubusercontent.com/f36a36c84f2ff8605938db0f71595cdfebb5ebc941833aeb2591205f220bc9d2/68747470733a2f2f696d672e736869656c64732e696f2f656e64706f696e743f75726c3d68747470733a2f2f68756767696e67666163652e636f2f6170692f736869656c64732f6d6f64656c7326636f6c6f723d627269676874677265656e |
| here | https://huggingface.co/docs/transformers/model_summary |
| ALBERT | https://huggingface.co/docs/transformers/model_doc/albert |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | https://arxiv.org/abs/1909.11942 |
| ALIGN | https://huggingface.co/docs/transformers/model_doc/align |
| Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | https://arxiv.org/abs/2102.05918 |
| AltCLIP | https://huggingface.co/docs/transformers/model_doc/altclip |
| AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | https://arxiv.org/abs/2211.06679 |
| Audio Spectrogram Transformer | https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer |
| AST: Audio Spectrogram Transformer | https://arxiv.org/abs/2104.01778 |
| Autoformer | https://huggingface.co/docs/transformers/model_doc/autoformer |
| Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting | https://arxiv.org/abs/2106.13008 |
| Bark | https://huggingface.co/docs/transformers/model_doc/bark |
| suno-ai/bark | https://github.com/suno-ai/bark |
| BART | https://huggingface.co/docs/transformers/model_doc/bart |
| BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | https://arxiv.org/abs/1910.13461 |
| BARThez | https://huggingface.co/docs/transformers/model_doc/barthez |
| BARThez: a Skilled Pretrained French Sequence-to-Sequence Model | https://arxiv.org/abs/2010.12321 |
| BARTpho | https://huggingface.co/docs/transformers/model_doc/bartpho |
| BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese | https://arxiv.org/abs/2109.09701 |
| BEiT | https://huggingface.co/docs/transformers/model_doc/beit |
| BEiT: BERT Pre-Training of Image Transformers | https://arxiv.org/abs/2106.08254 |
| BERT | https://huggingface.co/docs/transformers/model_doc/bert |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | https://arxiv.org/abs/1810.04805 |
| BERT For Sequence Generation | https://huggingface.co/docs/transformers/model_doc/bert-generation |
| Leveraging Pre-trained Checkpoints for Sequence Generation Tasks | https://arxiv.org/abs/1907.12461 |
| BERTweet | https://huggingface.co/docs/transformers/model_doc/bertweet |
| BERTweet: A pre-trained language model for English Tweets | https://aclanthology.org/2020.emnlp-demos.2/ |
| BigBird-Pegasus | https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus |
| Big Bird: Transformers for Longer Sequences | https://arxiv.org/abs/2007.14062 |
| BigBird-RoBERTa | https://huggingface.co/docs/transformers/model_doc/big_bird |
| Big Bird: Transformers for Longer Sequences | https://arxiv.org/abs/2007.14062 |
| BioGpt | https://huggingface.co/docs/transformers/model_doc/biogpt |
| BioGPT: generative pre-trained transformer for biomedical text generation and mining | https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9 |
| BiT | https://huggingface.co/docs/transformers/model_doc/bit |
| Big Transfer (BiT): General Visual Representation Learning | https://arxiv.org/abs/1912.11370 |
| Blenderbot | https://huggingface.co/docs/transformers/model_doc/blenderbot |
| Recipes for building an open-domain chatbot | https://arxiv.org/abs/2004.13637 |
| BlenderbotSmall | https://huggingface.co/docs/transformers/model_doc/blenderbot-small |
| Recipes for building an open-domain chatbot | https://arxiv.org/abs/2004.13637 |
| BLIP | https://huggingface.co/docs/transformers/model_doc/blip |
| BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | https://arxiv.org/abs/2201.12086 |
| BLIP-2 | https://huggingface.co/docs/transformers/model_doc/blip-2 |
| BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | https://arxiv.org/abs/2301.12597 |
| BLOOM | https://huggingface.co/docs/transformers/model_doc/bloom |
| BigScience Workshop | https://bigscience.huggingface.co/ |
| BORT | https://huggingface.co/docs/transformers/model_doc/bort |
| Optimal Subarchitecture Extraction For BERT | https://arxiv.org/abs/2010.10499 |
| BridgeTower | https://huggingface.co/docs/transformers/model_doc/bridgetower |
| BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning | https://arxiv.org/abs/2206.08657 |
| BROS | https://huggingface.co/docs/transformers/model_doc/bros |
| BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents | https://arxiv.org/abs/2108.04539 |
| ByT5 | https://huggingface.co/docs/transformers/model_doc/byt5 |
| ByT5: Towards a token-free future with pre-trained byte-to-byte models | https://arxiv.org/abs/2105.13626 |
| CamemBERT | https://huggingface.co/docs/transformers/model_doc/camembert |
| CamemBERT: a Tasty French Language Model | https://arxiv.org/abs/1911.03894 |
| CANINE | https://huggingface.co/docs/transformers/model_doc/canine |
| CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation | https://arxiv.org/abs/2103.06874 |
| Chinese-CLIP | https://huggingface.co/docs/transformers/model_doc/chinese_clip |
| Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | https://arxiv.org/abs/2211.01335 |
| CLAP | https://huggingface.co/docs/transformers/model_doc/clap |
| Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation | https://arxiv.org/abs/2211.06687 |
| CLIP | https://huggingface.co/docs/transformers/model_doc/clip |
| Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/abs/2103.00020 |
| CLIPSeg | https://huggingface.co/docs/transformers/model_doc/clipseg |
| Image Segmentation Using Text and Image Prompts | https://arxiv.org/abs/2112.10003 |
| CLVP | https://huggingface.co/docs/transformers/model_doc/clvp |
| Better speech synthesis through scaling | https://arxiv.org/abs/2305.07243 |
| CodeGen | https://huggingface.co/docs/transformers/model_doc/codegen |
| A Conversational Paradigm for Program Synthesis | https://arxiv.org/abs/2203.13474 |
| CodeLlama | https://huggingface.co/docs/transformers/model_doc/llama_code |
| Code Llama: Open Foundation Models for Code | https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/ |
| Conditional DETR | https://huggingface.co/docs/transformers/model_doc/conditional_detr |
| Conditional DETR for Fast Training Convergence | https://arxiv.org/abs/2108.06152 |
| ConvBERT | https://huggingface.co/docs/transformers/model_doc/convbert |
| ConvBERT: Improving BERT with Span-based Dynamic Convolution | https://arxiv.org/abs/2008.02496 |
| ConvNeXT | https://huggingface.co/docs/transformers/model_doc/convnext |
| A ConvNet for the 2020s | https://arxiv.org/abs/2201.03545 |
| ConvNeXTV2 | https://huggingface.co/docs/transformers/model_doc/convnextv2 |
| ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders | https://arxiv.org/abs/2301.00808 |
| CPM | https://huggingface.co/docs/transformers/model_doc/cpm |
| CPM: A Large-scale Generative Chinese Pre-trained Language Model | https://arxiv.org/abs/2012.00413 |
| CPM-Ant | https://huggingface.co/docs/transformers/model_doc/cpmant |
| OpenBMB | https://www.openbmb.org/ |
| CTRL | https://huggingface.co/docs/transformers/model_doc/ctrl |
| CTRL: A Conditional Transformer Language Model for Controllable Generation | https://arxiv.org/abs/1909.05858 |
| CvT | https://huggingface.co/docs/transformers/model_doc/cvt |
| CvT: Introducing Convolutions to Vision Transformers | https://arxiv.org/abs/2103.15808 |
| Data2Vec | https://huggingface.co/docs/transformers/model_doc/data2vec |
| Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | https://arxiv.org/abs/2202.03555 |
| DeBERTa | https://huggingface.co/docs/transformers/model_doc/deberta |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | https://arxiv.org/abs/2006.03654 |
| DeBERTa-v2 | https://huggingface.co/docs/transformers/model_doc/deberta-v2 |
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | https://arxiv.org/abs/2006.03654 |
| Decision Transformer | https://huggingface.co/docs/transformers/model_doc/decision_transformer |
| Decision Transformer: Reinforcement Learning via Sequence Modeling | https://arxiv.org/abs/2106.01345 |
| Deformable DETR | https://huggingface.co/docs/transformers/model_doc/deformable_detr |
| Deformable DETR: Deformable Transformers for End-to-End Object Detection | https://arxiv.org/abs/2010.04159 |
| DeiT | https://huggingface.co/docs/transformers/model_doc/deit |
| Training data-efficient image transformers & distillation through attention | https://arxiv.org/abs/2012.12877 |
| DePlot | https://huggingface.co/docs/transformers/model_doc/deplot |
| DePlot: One-shot visual language reasoning by plot-to-table translation | https://arxiv.org/abs/2212.10505 |
| Depth Anything | https://huggingface.co/docs/transformers/model_doc/depth_anything |
| Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | https://arxiv.org/abs/2401.10891 |
| DETA | https://huggingface.co/docs/transformers/model_doc/deta |
| NMS Strikes Back | https://arxiv.org/abs/2212.06137 |
| DETR | https://huggingface.co/docs/transformers/model_doc/detr |
| End-to-End Object Detection with Transformers | https://arxiv.org/abs/2005.12872 |
| DialoGPT | https://huggingface.co/docs/transformers/model_doc/dialogpt |
| DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | https://arxiv.org/abs/1911.00536 |
| DiNAT | https://huggingface.co/docs/transformers/model_doc/dinat |
| Dilated Neighborhood Attention Transformer | https://arxiv.org/abs/2209.15001 |
| DINOv2 | https://huggingface.co/docs/transformers/model_doc/dinov2 |
| DINOv2: Learning Robust Visual Features without Supervision | https://arxiv.org/abs/2304.07193 |
| DistilBERT | https://huggingface.co/docs/transformers/model_doc/distilbert |
| DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | https://arxiv.org/abs/1910.01108 |
| DistilGPT2 | https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation |
| DistilRoBERTa | https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation |
| DistilmBERT | https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation |
| DiT | https://huggingface.co/docs/transformers/model_doc/dit |
| DiT: Self-supervised Pre-training for Document Image Transformer | https://arxiv.org/abs/2203.02378 |
| Donut | https://huggingface.co/docs/transformers/model_doc/donut |
| OCR-free Document Understanding Transformer | https://arxiv.org/abs/2111.15664 |
| DPR | https://huggingface.co/docs/transformers/model_doc/dpr |
| Dense Passage Retrieval for Open-Domain Question Answering | https://arxiv.org/abs/2004.04906 |
| DPT | https://huggingface.co/docs/transformers/master/model_doc/dpt |
| Vision Transformers for Dense Prediction | https://arxiv.org/abs/2103.13413 |
| EfficientFormer | https://huggingface.co/docs/transformers/model_doc/efficientformer |
| EfficientFormer: Vision Transformers at MobileNetSpeed | https://arxiv.org/abs/2206.01191 |
| EfficientNet | https://huggingface.co/docs/transformers/model_doc/efficientnet |
| EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | https://arxiv.org/abs/1905.11946 |
| ELECTRA | https://huggingface.co/docs/transformers/model_doc/electra |
| ELECTRA: Pre-training text encoders as discriminators rather than generators | https://arxiv.org/abs/2003.10555 |
| EnCodec | https://huggingface.co/docs/transformers/model_doc/encodec |
| High Fidelity Neural Audio Compression | https://arxiv.org/abs/2210.13438 |
| EncoderDecoder | https://huggingface.co/docs/transformers/model_doc/encoder-decoder |
| Leveraging Pre-trained Checkpoints for Sequence Generation Tasks | https://arxiv.org/abs/1907.12461 |
| ERNIE | https://huggingface.co/docs/transformers/model_doc/ernie |
| ERNIE: Enhanced Representation through Knowledge Integration | https://arxiv.org/abs/1904.09223 |
| ErnieM | https://huggingface.co/docs/transformers/model_doc/ernie_m |
| ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora | https://arxiv.org/abs/2012.15674 |
| ESM | https://huggingface.co/docs/transformers/model_doc/esm |
| Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences | https://www.pnas.org/content/118/15/e2016239118 |
| Language models enable zero-shot prediction of the effects of mutations on protein function | https://doi.org/10.1101/2021.07.09.450648 |
| Language models of protein sequences at the scale of evolution enable accurate structure prediction | https://doi.org/10.1101/2022.07.20.500902 |
| Falcon | https://huggingface.co/docs/transformers/model_doc/falcon |
| FastSpeech2Conformer | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/model_doc/fastspeech2_conformer |
| Recent Developments On Espnet Toolkit Boosted By Conformer | https://arxiv.org/abs/2010.13956 |
| FLAN-T5 | https://huggingface.co/docs/transformers/model_doc/flan-t5 |
| google-research/t5x | https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints |
| FLAN-UL2 | https://huggingface.co/docs/transformers/model_doc/flan-ul2 |
| google-research/t5x | https://github.com/google-research/t5x/blob/main/docs/models.md#flan-ul2-checkpoints |
| FlauBERT | https://huggingface.co/docs/transformers/model_doc/flaubert |
| FlauBERT: Unsupervised Language Model Pre-training for French | https://arxiv.org/abs/1912.05372 |
| FLAVA | https://huggingface.co/docs/transformers/model_doc/flava |
| FLAVA: A Foundational Language And Vision Alignment Model | https://arxiv.org/abs/2112.04482 |
| FNet | https://huggingface.co/docs/transformers/model_doc/fnet |
| FNet: Mixing Tokens with Fourier Transforms | https://arxiv.org/abs/2105.03824 |
| FocalNet | https://huggingface.co/docs/transformers/model_doc/focalnet |
| Focal Modulation Networks | https://arxiv.org/abs/2203.11926 |
| Funnel Transformer | https://huggingface.co/docs/transformers/model_doc/funnel |
| Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | https://arxiv.org/abs/2006.03236 |
| Fuyu | https://huggingface.co/docs/transformers/model_doc/fuyu |
| blog post | https://www.adept.ai/blog/fuyu-8b |
| Gemma | https://huggingface.co/docs/transformers/main/model_doc/gemma |
| Gemma: Open Models Based on Gemini Technology and Research | https://blog.google/technology/developers/gemma-open-models/ |
| GIT | https://huggingface.co/docs/transformers/model_doc/git |
| GIT: A Generative Image-to-text Transformer for Vision and Language | https://arxiv.org/abs/2205.14100 |
| GLPN | https://huggingface.co/docs/transformers/model_doc/glpn |
| Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth | https://arxiv.org/abs/2201.07436 |
| GPT | https://huggingface.co/docs/transformers/model_doc/openai-gpt |
| Improving Language Understanding by Generative Pre-Training | https://openai.com/research/language-unsupervised/ |
| GPT Neo | https://huggingface.co/docs/transformers/model_doc/gpt_neo |
| EleutherAI/gpt-neo | https://github.com/EleutherAI/gpt-neo |
| GPT NeoX | https://huggingface.co/docs/transformers/model_doc/gpt_neox |
| GPT-NeoX-20B: An Open-Source Autoregressive Language Model | https://arxiv.org/abs/2204.06745 |
| GPT NeoX Japanese | https://huggingface.co/docs/transformers/model_doc/gpt_neox_japanese |
| GPT-2 | https://huggingface.co/docs/transformers/model_doc/gpt2 |
| Language Models are Unsupervised Multitask Learners | https://openai.com/research/better-language-models/ |
| GPT-J | https://huggingface.co/docs/transformers/model_doc/gptj |
| kingoflolz/mesh-transformer-jax | https://github.com/kingoflolz/mesh-transformer-jax/ |
| GPT-Sw3 | https://huggingface.co/docs/transformers/model_doc/gpt-sw3 |
| Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish | http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf |
| GPTBigCode | https://huggingface.co/docs/transformers/model_doc/gpt_bigcode |
| SantaCoder: don't reach for the stars! | https://arxiv.org/abs/2301.03988 |
| GPTSAN-japanese | https://huggingface.co/docs/transformers/model_doc/gptsan-japanese |
| tanreinama/GPTSAN | https://github.com/tanreinama/GPTSAN/blob/main/report/model.md |
| Graphormer | https://huggingface.co/docs/transformers/model_doc/graphormer |
| Do Transformers Really Perform Bad for Graph Representation? | https://arxiv.org/abs/2106.05234 |
| GroupViT | https://huggingface.co/docs/transformers/model_doc/groupvit |
| GroupViT: Semantic Segmentation Emerges from Text Supervision | https://arxiv.org/abs/2202.11094 |
| HerBERT | https://huggingface.co/docs/transformers/model_doc/herbert |
| KLEJ: Comprehensive Benchmark for Polish Language Understanding | https://www.aclweb.org/anthology/2020.acl-main.111.pdf |
| Hubert | https://huggingface.co/docs/transformers/model_doc/hubert |
| HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units | https://arxiv.org/abs/2106.07447 |
| I-BERT | https://huggingface.co/docs/transformers/model_doc/ibert |
| I-BERT: Integer-only BERT Quantization | https://arxiv.org/abs/2101.01321 |
| IDEFICS | https://huggingface.co/docs/transformers/model_doc/idefics |
| OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents | https://huggingface.co/papers/2306.16527 |
| ImageGPT | https://huggingface.co/docs/transformers/model_doc/imagegpt |
| Generative Pretraining from Pixels | https://openai.com/blog/image-gpt/ |
| Informer | https://huggingface.co/docs/transformers/model_doc/informer |
| Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting | https://arxiv.org/abs/2012.07436 |
| InstructBLIP | https://huggingface.co/docs/transformers/model_doc/instructblip |
| InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | https://arxiv.org/abs/2305.06500 |
| Jukebox | https://huggingface.co/docs/transformers/model_doc/jukebox |
| Jukebox: A Generative Model for Music | https://arxiv.org/pdf/2005.00341.pdf |
| KOSMOS-2 | https://huggingface.co/docs/transformers/model_doc/kosmos-2 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | https://arxiv.org/abs/2306.14824 |
| LayoutLM | https://huggingface.co/docs/transformers/model_doc/layoutlm |
| LayoutLM: Pre-training of Text and Layout for Document Image Understanding | https://arxiv.org/abs/1912.13318 |
| LayoutLMv2 | https://huggingface.co/docs/transformers/model_doc/layoutlmv2 |
| LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | https://arxiv.org/abs/2012.14740 |
| LayoutLMv3 | https://huggingface.co/docs/transformers/model_doc/layoutlmv3 |
| LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | https://arxiv.org/abs/2204.08387 |
| LayoutXLM | https://huggingface.co/docs/transformers/model_doc/layoutxlm |
| LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | https://arxiv.org/abs/2104.08836 |
| LED | https://huggingface.co/docs/transformers/model_doc/led |
| Longformer: The Long-Document Transformer | https://arxiv.org/abs/2004.05150 |
| LeViT | https://huggingface.co/docs/transformers/model_doc/levit |
| LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference | https://arxiv.org/abs/2104.01136 |
| LiLT | https://huggingface.co/docs/transformers/model_doc/lilt |
| LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding | https://arxiv.org/abs/2202.13669 |
| LLaMA | https://huggingface.co/docs/transformers/model_doc/llama |
| LLaMA: Open and Efficient Foundation Language Models | https://arxiv.org/abs/2302.13971 |
| Llama2 | https://huggingface.co/docs/transformers/model_doc/llama2 |
| Llama2: Open Foundation and Fine-Tuned Chat Models | https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/ |
| LLaVa | https://huggingface.co/docs/transformers/model_doc/llava |
| Visual Instruction Tuning | https://arxiv.org/abs/2304.08485 |
| Longformer | https://huggingface.co/docs/transformers/model_doc/longformer |
| Longformer: The Long-Document Transformer | https://arxiv.org/abs/2004.05150 |
| LongT5 | https://huggingface.co/docs/transformers/model_doc/longt5 |
| LongT5: Efficient Text-To-Text Transformer for Long Sequences | https://arxiv.org/abs/2112.07916 |
| LUKE | https://huggingface.co/docs/transformers/model_doc/luke |
| LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention | https://arxiv.org/abs/2010.01057 |
| LXMERT | https://huggingface.co/docs/transformers/model_doc/lxmert |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering | https://arxiv.org/abs/1908.07490 |
| M-CTC-T | https://huggingface.co/docs/transformers/model_doc/mctct |
| Pseudo-Labeling For Massively Multilingual Speech Recognition | https://arxiv.org/abs/2111.00161 |
| M2M100 | https://huggingface.co/docs/transformers/model_doc/m2m_100 |
| Beyond English-Centric Multilingual Machine Translation | https://arxiv.org/abs/2010.11125 |
| MADLAD-400 | https://huggingface.co/docs/transformers/model_doc/madlad-400 |
| MADLAD-400: A Multilingual And Document-Level Large Audited Dataset | https://arxiv.org/abs/2309.04662 |
| Mamba | https://huggingface.co/docs/transformers/main/model_doc/mamba |
| Mamba: Linear-Time Sequence Modeling with Selective State Spaces | https://arxiv.org/abs/2312.00752 |
| MarianMT | https://huggingface.co/docs/transformers/model_doc/marian |
| OPUS | http://opus.nlpl.eu/ |
| Marian Framework | https://marian-nmt.github.io/ |
| MarkupLM | https://huggingface.co/docs/transformers/model_doc/markuplm |
| MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding | https://arxiv.org/abs/2110.08518 |
| Mask2Former | https://huggingface.co/docs/transformers/model_doc/mask2former |
| Masked-attention Mask Transformer for Universal Image Segmentation | https://arxiv.org/abs/2112.01527 |
| MaskFormer | https://huggingface.co/docs/transformers/model_doc/maskformer |
| Per-Pixel Classification is Not All You Need for Semantic Segmentation | https://arxiv.org/abs/2107.06278 |
| MatCha | https://huggingface.co/docs/transformers/model_doc/matcha |
| MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering | https://arxiv.org/abs/2212.09662 |
| mBART | https://huggingface.co/docs/transformers/model_doc/mbart |
| Multilingual Denoising Pre-training for Neural Machine Translation | https://arxiv.org/abs/2001.08210 |
| mBART-50 | https://huggingface.co/docs/transformers/model_doc/mbart |
| Multilingual Translation with Extensible Multilingual Pretraining and Finetuning | https://arxiv.org/abs/2008.00401 |
| MEGA | https://huggingface.co/docs/transformers/model_doc/mega |
| Mega: Moving Average Equipped Gated Attention | https://arxiv.org/abs/2209.10655 |
| Megatron-BERT | https://huggingface.co/docs/transformers/model_doc/megatron-bert |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | https://arxiv.org/abs/1909.08053 |
| Megatron-GPT2 | https://huggingface.co/docs/transformers/model_doc/megatron_gpt2 |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | https://arxiv.org/abs/1909.08053 |
| MGP-STR | https://huggingface.co/docs/transformers/model_doc/mgp-str |
| Multi-Granularity Prediction for Scene Text Recognition | https://arxiv.org/abs/2209.03592 |
| Mistral | https://huggingface.co/docs/transformers/model_doc/mistral |
| Mistral AI | https://mistral.ai |
| Mixtral | https://huggingface.co/docs/transformers/model_doc/mixtral |
| Mistral AI | https://mistral.ai |
| mLUKE | https://huggingface.co/docs/transformers/model_doc/mluke |
| mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models | https://arxiv.org/abs/2110.08151 |
| MMS | https://huggingface.co/docs/transformers/model_doc/mms |
| Scaling Speech Technology to 1,000+ Languages | https://arxiv.org/abs/2305.13516 |
| MobileBERT | https://huggingface.co/docs/transformers/model_doc/mobilebert |
| MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices | https://arxiv.org/abs/2004.02984 |
| MobileNetV1 | https://huggingface.co/docs/transformers/model_doc/mobilenet_v1 |
| MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications | https://arxiv.org/abs/1704.04861 |
| MobileNetV2 | https://huggingface.co/docs/transformers/model_doc/mobilenet_v2 |
| MobileNetV2: Inverted Residuals and Linear Bottlenecks | https://arxiv.org/abs/1801.04381 |
| MobileViT | https://huggingface.co/docs/transformers/model_doc/mobilevit |
| MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer | https://arxiv.org/abs/2110.02178 |
| MobileViTV2 | https://huggingface.co/docs/transformers/model_doc/mobilevitv2 |
| Separable Self-attention for Mobile Vision Transformers | https://arxiv.org/abs/2206.02680 |
| MPNet | https://huggingface.co/docs/transformers/model_doc/mpnet |
| MPNet: Masked and Permuted Pre-training for Language Understanding | https://arxiv.org/abs/2004.09297 |
| MPT | https://huggingface.co/docs/transformers/model_doc/mpt |
| llm-foundry | https://github.com/mosaicml/llm-foundry/ |
| MRA | https://huggingface.co/docs/transformers/model_doc/mra |
| Multi Resolution Analysis (MRA) for Approximate Self-Attention | https://arxiv.org/abs/2207.10284 |
| MT5 | https://huggingface.co/docs/transformers/model_doc/mt5 |
| mT5: A massively multilingual pre-trained text-to-text transformer | https://arxiv.org/abs/2010.11934 |
| MusicGen | https://huggingface.co/docs/transformers/model_doc/musicgen |
| Simple and Controllable Music Generation | https://arxiv.org/abs/2306.05284 |
| MVP | https://huggingface.co/docs/transformers/model_doc/mvp |
| MVP: Multi-task Supervised Pre-training for Natural Language Generation | https://arxiv.org/abs/2206.12131 |
| NAT | https://huggingface.co/docs/transformers/model_doc/nat |
| Neighborhood Attention Transformer | https://arxiv.org/abs/2204.07143 |
| Nezha | https://huggingface.co/docs/transformers/model_doc/nezha |
| NEZHA: Neural Contextualized Representation for Chinese Language Understanding | https://arxiv.org/abs/1909.00204 |
| NLLB | https://huggingface.co/docs/transformers/model_doc/nllb |
| No Language Left Behind: Scaling Human-Centered Machine Translation | https://arxiv.org/abs/2207.04672 |
| NLLB-MOE | https://huggingface.co/docs/transformers/model_doc/nllb-moe |
| No Language Left Behind: Scaling Human-Centered Machine Translation | https://arxiv.org/abs/2207.04672 |
| Nougat | https://huggingface.co/docs/transformers/model_doc/nougat |
| Nougat: Neural Optical Understanding for Academic Documents | https://arxiv.org/abs/2308.13418 |
| Nyströmformer | https://huggingface.co/docs/transformers/model_doc/nystromformer |
| Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention | https://arxiv.org/abs/2102.03902 |
| OneFormer | https://huggingface.co/docs/transformers/model_doc/oneformer |
| OneFormer: One Transformer to Rule Universal Image Segmentation | https://arxiv.org/abs/2211.06220 |
| OpenLlama | https://huggingface.co/docs/transformers/model_doc/open-llama |
| s-JoL | https://huggingface.co/s-JoL |
| OPT | https://huggingface.co/docs/transformers/master/model_doc/opt |
| OPT: Open Pre-trained Transformer Language Models | https://arxiv.org/abs/2205.01068 |
| OWL-ViT | https://huggingface.co/docs/transformers/model_doc/owlvit |
| Simple Open-Vocabulary Object Detection with Vision Transformers | https://arxiv.org/abs/2205.06230 |
| OWLv2 | https://huggingface.co/docs/transformers/model_doc/owlv2 |
| Scaling Open-Vocabulary Object Detection | https://arxiv.org/abs/2306.09683 |
| PatchTSMixer | https://huggingface.co/docs/transformers/model_doc/patchtsmixer |
| TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting | https://arxiv.org/pdf/2306.09364.pdf |
| PatchTST | https://huggingface.co/docs/transformers/model_doc/patchtst |
| A Time Series is Worth 64 Words: Long-term Forecasting with Transformers | https://arxiv.org/abs/2211.14730 |
| Pegasus | https://huggingface.co/docs/transformers/model_doc/pegasus |
| PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization | https://arxiv.org/abs/1912.08777 |
| PEGASUS-X | https://huggingface.co/docs/transformers/model_doc/pegasus_x |
| Investigating Efficiently Extending Transformers for Long Input Summarization | https://arxiv.org/abs/2208.04347 |
| Perceiver IO | https://huggingface.co/docs/transformers/model_doc/perceiver |
| Perceiver IO: A General Architecture for Structured Inputs & Outputs | https://arxiv.org/abs/2107.14795 |
| Persimmon | https://huggingface.co/docs/transformers/model_doc/persimmon |
| blog post | https://www.adept.ai/blog/persimmon-8b |
| Phi | https://huggingface.co/docs/transformers/model_doc/phi |
| Textbooks Are All You Need | https://arxiv.org/abs/2306.11644 |
| Textbooks Are All You Need II: phi-1.5 technical report | https://arxiv.org/abs/2309.05463 |
| PhoBERT | https://huggingface.co/docs/transformers/model_doc/phobert |
| PhoBERT: Pre-trained language models for Vietnamese | https://www.aclweb.org/anthology/2020.findings-emnlp.92/ |
| Pix2Struct | https://huggingface.co/docs/transformers/model_doc/pix2struct |
| Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding | https://arxiv.org/abs/2210.03347 |
| PLBart | https://huggingface.co/docs/transformers/model_doc/plbart |
| Unified Pre-training for Program Understanding and Generation | https://arxiv.org/abs/2103.06333 |
| PoolFormer | https://huggingface.co/docs/transformers/model_doc/poolformer |
| MetaFormer is Actually What You Need for Vision | https://arxiv.org/abs/2111.11418 |
| Pop2Piano | https://huggingface.co/docs/transformers/model_doc/pop2piano |
| Pop2Piano : Pop Audio-based Piano Cover Generation | https://arxiv.org/abs/2211.00895 |
| ProphetNet | https://huggingface.co/docs/transformers/model_doc/prophetnet |
| ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | https://arxiv.org/abs/2001.04063 |
| PVT | https://huggingface.co/docs/transformers/model_doc/pvt |
| Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions | https://arxiv.org/pdf/2102.12122.pdf |
| QDQBert | https://huggingface.co/docs/transformers/model_doc/qdqbert |
| Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation | https://arxiv.org/abs/2004.09602 |
| Qwen2 | https://huggingface.co/docs/transformers/model_doc/qwen2 |
| Qwen Technical Report | https://arxiv.org/abs/2309.16609 |
| RAG | https://huggingface.co/docs/transformers/model_doc/rag |
| Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | https://arxiv.org/abs/2005.11401 |
| REALM | https://huggingface.co/docs/transformers/model_doc/realm.html |
| REALM: Retrieval-Augmented Language Model Pre-Training | https://arxiv.org/abs/2002.08909 |
| Reformer | https://huggingface.co/docs/transformers/model_doc/reformer |
| Reformer: The Efficient Transformer | https://arxiv.org/abs/2001.04451 |
| RegNet | https://huggingface.co/docs/transformers/model_doc/regnet |
| Designing Network Design Space | https://arxiv.org/abs/2003.13678 |
| RemBERT | https://huggingface.co/docs/transformers/model_doc/rembert |
| Rethinking embedding coupling in pre-trained language models | https://arxiv.org/abs/2010.12821 |
| ResNet | https://huggingface.co/docs/transformers/model_doc/resnet |
| Deep Residual Learning for Image Recognition | https://arxiv.org/abs/1512.03385 |
| RoBERTa | https://huggingface.co/docs/transformers/model_doc/roberta |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | https://arxiv.org/abs/1907.11692 |
| RoBERTa-PreLayerNorm | https://huggingface.co/docs/transformers/model_doc/roberta-prelayernorm |
| fairseq: A Fast, Extensible Toolkit for Sequence Modeling | https://arxiv.org/abs/1904.01038 |
| RoCBert | https://huggingface.co/docs/transformers/model_doc/roc_bert |
| RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining | https://aclanthology.org/2022.acl-long.65.pdf |
| RoFormer | https://huggingface.co/docs/transformers/model_doc/roformer |
| RoFormer: Enhanced Transformer with Rotary Position Embedding | https://arxiv.org/abs/2104.09864 |
| RWKV | https://huggingface.co/docs/transformers/model_doc/rwkv |
| this repo | https://github.com/BlinkDL/RWKV-LM |
| SeamlessM4T | https://huggingface.co/docs/transformers/model_doc/seamless_m4t |
| SeamlessM4T — Massively Multilingual & Multimodal Machine Translation | https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf |
| SeamlessM4Tv2 | https://huggingface.co/docs/transformers/model_doc/seamless_m4t_v2 |
| Seamless: Multilingual Expressive and Streaming Speech Translation | https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/ |
| SegFormer | https://huggingface.co/docs/transformers/model_doc/segformer |
| SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers | https://arxiv.org/abs/2105.15203 |
| SegGPT | https://huggingface.co/docs/transformers/main/model_doc/seggpt |
| SegGPT: Segmenting Everything In Context | https://arxiv.org/abs/2304.03284 |
| Segment Anything | https://huggingface.co/docs/transformers/model_doc/sam |
| Segment Anything | https://arxiv.org/pdf/2304.02643v1.pdf |
| SEW | https://huggingface.co/docs/transformers/model_doc/sew |
| Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition | https://arxiv.org/abs/2109.06870 |
| SEW-D | https://huggingface.co/docs/transformers/model_doc/sew_d |
| Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition | https://arxiv.org/abs/2109.06870 |
| SigLIP | https://huggingface.co/docs/transformers/model_doc/siglip |
| Sigmoid Loss for Language Image Pre-Training | https://arxiv.org/abs/2303.15343 |
| SpeechT5 | https://huggingface.co/docs/transformers/model_doc/speecht5 |
| SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing | https://arxiv.org/abs/2110.07205 |
| SpeechToTextTransformer | https://huggingface.co/docs/transformers/model_doc/speech_to_text |
| fairseq S2T: Fast Speech-to-Text Modeling with fairseq | https://arxiv.org/abs/2010.05171 |
| SpeechToTextTransformer2 | https://huggingface.co/docs/transformers/model_doc/speech_to_text_2 |
| Large-Scale Self- and Semi-Supervised Learning for Speech Translation | https://arxiv.org/abs/2104.06678 |
| Splinter | https://huggingface.co/docs/transformers/model_doc/splinter |
| Few-Shot Question Answering by Pretraining Span Selection | https://arxiv.org/abs/2101.00438 |
| SqueezeBERT | https://huggingface.co/docs/transformers/model_doc/squeezebert |
| SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | https://arxiv.org/abs/2006.11316 |
| StableLm | https://huggingface.co/docs/transformers/model_doc/stablelm |
| StableLM 3B 4E1T (Technical Report) | https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo |
| Starcoder2 | https://huggingface.co/docs/transformers/main/model_doc/starcoder2 |
| StarCoder 2 and The Stack v2: The Next Generation | https://arxiv.org/abs/2402.19173 |
| SwiftFormer | https://huggingface.co/docs/transformers/model_doc/swiftformer |
| SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications | https://arxiv.org/abs/2303.15446 |
| Swin Transformer | https://huggingface.co/docs/transformers/model_doc/swin |
| Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | https://arxiv.org/abs/2103.14030 |
| Swin Transformer V2 | https://huggingface.co/docs/transformers/model_doc/swinv2 |
| Swin Transformer V2: Scaling Up Capacity and Resolution | https://arxiv.org/abs/2111.09883 |
| Swin2SR | https://huggingface.co/docs/transformers/model_doc/swin2sr |
| Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration | https://arxiv.org/abs/2209.11345 |
| SwitchTransformers | https://huggingface.co/docs/transformers/model_doc/switch_transformers |
| Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | https://arxiv.org/abs/2101.03961 |
| T5 | https://huggingface.co/docs/transformers/model_doc/t5 |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | https://arxiv.org/abs/1910.10683 |
| T5v1.1 | https://huggingface.co/docs/transformers/model_doc/t5v1.1 |
| google-research/text-to-text-transfer-transformer | https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511 |
| Table Transformer | https://huggingface.co/docs/transformers/model_doc/table-transformer |
| PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents | https://arxiv.org/abs/2110.00061 |
| TAPAS | https://huggingface.co/docs/transformers/model_doc/tapas |
| TAPAS: Weakly Supervised Table Parsing via Pre-training | https://arxiv.org/abs/2004.02349 |
| TAPEX | https://huggingface.co/docs/transformers/model_doc/tapex |
| TAPEX: Table Pre-training via Learning a Neural SQL Executor | https://arxiv.org/abs/2107.07653 |
| Time Series Transformer | https://huggingface.co/docs/transformers/model_doc/time_series_transformer |
| TimeSformer | https://huggingface.co/docs/transformers/model_doc/timesformer |
| Is Space-Time Attention All You Need for Video Understanding? | https://arxiv.org/abs/2102.05095 |
| Trajectory Transformer | https://huggingface.co/docs/transformers/model_doc/trajectory_transformers |
| Offline Reinforcement Learning as One Big Sequence Modeling Problem | https://arxiv.org/abs/2106.02039 |
| Transformer-XL | https://huggingface.co/docs/transformers/model_doc/transfo-xl |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | https://arxiv.org/abs/1901.02860 |
| TrOCR | https://huggingface.co/docs/transformers/model_doc/trocr |
| TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models | https://arxiv.org/abs/2109.10282 |
| TVLT | https://huggingface.co/docs/transformers/model_doc/tvlt |
| TVLT: Textless Vision-Language Transformer | https://arxiv.org/abs/2209.14156 |
| TVP | https://huggingface.co/docs/transformers/model_doc/tvp |
| Text-Visual Prompting for Efficient 2D Temporal Video Grounding | https://arxiv.org/abs/2303.04995 |
| UDOP | https://huggingface.co/docs/transformers/main/model_doc/udop |
| Unifying Vision, Text, and Layout for Universal Document Processing | https://arxiv.org/abs/2212.02623 |
| UL2 | https://huggingface.co/docs/transformers/model_doc/ul2 |
| Unifying Language Learning Paradigms | https://arxiv.org/abs/2205.05131v1 |
| UMT5 | https://huggingface.co/docs/transformers/model_doc/umt5 |
| UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining | https://openreview.net/forum?id=kXwdL1cWOAi |
| UniSpeech | https://huggingface.co/docs/transformers/model_doc/unispeech |
| UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data | https://arxiv.org/abs/2101.07597 |
| UniSpeechSat | https://huggingface.co/docs/transformers/model_doc/unispeech-sat |
| UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING | https://arxiv.org/abs/2110.05752 |
| UnivNet | https://huggingface.co/docs/transformers/model_doc/univnet |
| UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation | https://arxiv.org/abs/2106.07889 |
| UPerNet | https://huggingface.co/docs/transformers/model_doc/upernet |
| Unified Perceptual Parsing for Scene Understanding | https://arxiv.org/abs/1807.10221 |
| VAN | https://huggingface.co/docs/transformers/model_doc/van |
| Visual Attention Network | https://arxiv.org/abs/2202.09741 |
| VideoMAE | https://huggingface.co/docs/transformers/model_doc/videomae |
| VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | https://arxiv.org/abs/2203.12602 |
| ViLT | https://huggingface.co/docs/transformers/model_doc/vilt |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | https://arxiv.org/abs/2102.03334 |
| VipLlava | https://huggingface.co/docs/transformers/model_doc/vipllava |
| Making Large Multimodal Models Understand Arbitrary Visual Prompts | https://arxiv.org/abs/2312.00784 |
| Vision Transformer (ViT) | https://huggingface.co/docs/transformers/model_doc/vit |
| An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | https://arxiv.org/abs/2010.11929 |
| VisualBERT | https://huggingface.co/docs/transformers/model_doc/visual_bert |
| VisualBERT: A Simple and Performant Baseline for Vision and Language | https://arxiv.org/pdf/1908.03557 |
| ViT Hybrid | https://huggingface.co/docs/transformers/model_doc/vit_hybrid |
| An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | https://arxiv.org/abs/2010.11929 |
| VitDet | https://huggingface.co/docs/transformers/model_doc/vitdet |
| Exploring Plain Vision Transformer Backbones for Object Detection | https://arxiv.org/abs/2203.16527 |
| ViTMAE | https://huggingface.co/docs/transformers/model_doc/vit_mae |
| Masked Autoencoders Are Scalable Vision Learners | https://arxiv.org/abs/2111.06377 |
| ViTMatte | https://huggingface.co/docs/transformers/model_doc/vitmatte |
| ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers | https://arxiv.org/abs/2305.15272 |
| ViTMSN | https://huggingface.co/docs/transformers/model_doc/vit_msn |
| Masked Siamese Networks for Label-Efficient Learning | https://arxiv.org/abs/2204.07141 |
| VITS | https://huggingface.co/docs/transformers/model_doc/vits |
| Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech | https://arxiv.org/abs/2106.06103 |
| ViViT | https://huggingface.co/docs/transformers/model_doc/vivit |
| ViViT: A Video Vision Transformer | https://arxiv.org/abs/2103.15691 |
| Wav2Vec2 | https://huggingface.co/docs/transformers/model_doc/wav2vec2 |
| wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | https://arxiv.org/abs/2006.11477 |
| Wav2Vec2-BERT | https://huggingface.co/docs/transformers/model_doc/wav2vec2-bert |
| Seamless: Multilingual Expressive and Streaming Speech Translation | https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/ |
| Wav2Vec2-Conformer | https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer |
| FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ | https://arxiv.org/abs/2010.05171 |
| Wav2Vec2Phoneme | https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme |
| Simple and Effective Zero-shot Cross-lingual Phoneme Recognition | https://arxiv.org/abs/2109.11680 |
| WavLM | https://huggingface.co/docs/transformers/model_doc/wavlm |
| WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing | https://arxiv.org/abs/2110.13900 |
| Whisper | https://huggingface.co/docs/transformers/model_doc/whisper |
| Robust Speech Recognition via Large-Scale Weak Supervision | https://cdn.openai.com/papers/whisper.pdf |
| X-CLIP | https://huggingface.co/docs/transformers/model_doc/xclip |
| Expanding Language-Image Pretrained Models for General Video Recognition | https://arxiv.org/abs/2208.02816 |
| X-MOD | https://huggingface.co/docs/transformers/model_doc/xmod |
| Lifting the Curse of Multilinguality by Pre-training Modular Transformers | http://dx.doi.org/10.18653/v1/2022.naacl-main.255 |
| XGLM | https://huggingface.co/docs/transformers/model_doc/xglm |
| Few-shot Learning with Multilingual Language Models | https://arxiv.org/abs/2112.10668 |
| XLM | https://huggingface.co/docs/transformers/model_doc/xlm |
| Cross-lingual Language Model Pretraining | https://arxiv.org/abs/1901.07291 |
| XLM-ProphetNet | https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet |
| ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | https://arxiv.org/abs/2001.04063 |
| XLM-RoBERTa | https://huggingface.co/docs/transformers/model_doc/xlm-roberta |
| Unsupervised Cross-lingual Representation Learning at Scale | https://arxiv.org/abs/1911.02116 |
| XLM-RoBERTa-XL | https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl |
| Larger-Scale Transformers for Multilingual Masked Language Modeling | https://arxiv.org/abs/2105.00572 |
| XLM-V | https://huggingface.co/docs/transformers/model_doc/xlm-v |
| XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models | https://arxiv.org/abs/2301.10472 |
| XLNet | https://huggingface.co/docs/transformers/model_doc/xlnet |
| XLNet: Generalized Autoregressive Pretraining for Language Understanding | https://arxiv.org/abs/1906.08237 |
| XLS-R | https://huggingface.co/docs/transformers/model_doc/xls_r |
| XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale | https://arxiv.org/abs/2111.09296 |
| XLSR-Wav2Vec2 | https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2 |
| Unsupervised Cross-Lingual Representation Learning For Speech Recognition | https://arxiv.org/abs/2006.13979 |
| YOLOS | https://huggingface.co/docs/transformers/model_doc/yolos |
| You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection | https://arxiv.org/abs/2106.00666 |
| YOSO | https://huggingface.co/docs/transformers/model_doc/yoso |
| You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling | https://arxiv.org/abs/2111.09714 |
| templates | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/templates |
| contributing guidelines | https://patch-diff.githubusercontent.com/S4Plus/transformers/blob/main/CONTRIBUTING.md |
| this table | https://huggingface.co/docs/transformers/index#supported-frameworks |
| documentation | https://github.com/huggingface/transformers/tree/main/examples |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#learn-more |
| Documentation | https://huggingface.co/docs/transformers/ |
| Task summary | https://huggingface.co/docs/transformers/task_summary |
| Preprocessing tutorial | https://huggingface.co/docs/transformers/preprocessing |
| Training and fine-tuning | https://huggingface.co/docs/transformers/training |
| Quick tour: Fine-tuning/usage scripts | https://github.com/huggingface/transformers/tree/main/examples |
| Model sharing and uploading | https://huggingface.co/docs/transformers/model_sharing |
| https://patch-diff.githubusercontent.com/S4Plus/transformers#citation |
| paper | https://www.aclweb.org/anthology/2020.emnlp-demos.6/ |
| huggingface.co/transformers | https://huggingface.co/transformers |
|
Readme
| https://patch-diff.githubusercontent.com/S4Plus/transformers#readme-ov-file |
|
Apache-2.0 license
| https://patch-diff.githubusercontent.com/S4Plus/transformers#Apache-2.0-1-ov-file |
| Please reload this page | https://patch-diff.githubusercontent.com/S4Plus/transformers |
|
Activity | https://patch-diff.githubusercontent.com/S4Plus/transformers/activity |
|
Custom properties | https://patch-diff.githubusercontent.com/S4Plus/transformers/custom-properties |
|
0
stars | https://patch-diff.githubusercontent.com/S4Plus/transformers/stargazers |
|
0
watching | https://patch-diff.githubusercontent.com/S4Plus/transformers/watchers |
|
0
forks | https://patch-diff.githubusercontent.com/S4Plus/transformers/forks |
|
Report repository
| https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2FS4Plus%2Ftransformers&report=S4Plus+%28user%29 |
| Releases | https://patch-diff.githubusercontent.com/S4Plus/transformers/releases |
| Packages
0 | https://patch-diff.githubusercontent.com/orgs/S4Plus/packages?repo_name=transformers |
|
| https://github.com |
| Terms | https://docs.github.com/site-policy/github-terms/github-terms-of-service |
| Privacy | https://docs.github.com/site-policy/privacy-policies/github-privacy-statement |
| Security | https://github.com/security |
| Status | https://www.githubstatus.com/ |
| Community | https://github.community/ |
| Docs | https://docs.github.com/ |
| Contact | https://support.github.com?tags=dotcom-footer |