René's URL Explorer Experiment


Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Open Graph Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

X Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed. - NVIDIA/Model-Optimizer

Open Graph Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...

X Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...

Opengraph URL: https://github.com/NVIDIA/Model-Optimizer

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository
route-controllerfiles
route-actiondisambiguate
fetch-noncev2:cf014cc6-9f3e-ddc6-f4b4-8907cee83666
current-catalog-service-hashf3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-idC408:1BFE61:127C26B:17E4D25:6992568C
html-safe-nonce522a95c33a0a19f55717fdfe12c13d4c2fd7ff29a59138713491bf911f408356
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDNDA4OjFCRkU2MToxMjdDMjZCOjE3RTREMjU6Njk5MjU2OEMiLCJ2aXNpdG9yX2lkIjoiNDA4MzE0MTIwMjk0OTU5MjcxNyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac2f8ad7e1204e6259f86fb9b37d8fa43b0ae3f7849686331c1d7025acfca3c50f
hovercard-subject-tagrepository:790916393
github-keyboard-shortcutsrepository,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location//
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/NVIDIA/Model-Optimizer
twitter:imagehttps://opengraph.githubassets.com/6d2e6a182c803c2c222b54f6b25fcb6d91cdf5cbeedcc03b220204962bfa0340/NVIDIA/Model-Optimizer
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/6d2e6a182c803c2c222b54f6b25fcb6d91cdf5cbeedcc03b220204962bfa0340/NVIDIA/Model-Optimizer
og:image:altA unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-preview
go-importgithub.com/NVIDIA/Model-Optimizer git https://github.com/NVIDIA/Model-Optimizer.git
octolytics-dimension-user_id1728152
octolytics-dimension-user_loginNVIDIA
octolytics-dimension-repository_id790916393
octolytics-dimension-repository_nwoNVIDIA/Model-Optimizer
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id790916393
octolytics-dimension-repository_network_root_nwoNVIDIA/Model-Optimizer
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release848bc6032dcc93a9a7301dcc3f379a72ba13b96e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=NVIDIA%2FModel-Optimizer
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
NVIDIA https://patch-diff.githubusercontent.com/NVIDIA
Model-Optimizerhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Fork 273 https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Star 2k https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
nvidia.github.io/Model-Optimizer/https://nvidia.github.io/Model-Optimizer/
Apache-2.0 license https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
2k stars https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/stargazers
273 forks https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/forks
Branches https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
Tags https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
Activity https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/activity
Star https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Code https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Issues 67 https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/issues
Pull requests 87 https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulls
Actions https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/actions
Security 0 https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/security
Insights https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulse
Code https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Issues https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/issues
Pull requests https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulls
Actions https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/actions
Security https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/security
Insights https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulse
Brancheshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
Tagshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
461 Commitshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/commits/main/
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/commits/main/
.githubhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.github
.githubhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.github
.gitlabhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.gitlab
.gitlabhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.gitlab
.vscodehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.vscode
.vscodehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.vscode
docs/sourcehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/docs/source
docs/sourcehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/docs/source
exampleshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/examples
exampleshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/examples
experimentalhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/experimental
experimentalhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/experimental
modelopthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/modelopt
modelopthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/modelopt
testshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/tests
testshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/tests
.coderabbit.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.coderabbit.yaml
.coderabbit.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.coderabbit.yaml
.dockerignorehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.dockerignore
.dockerignorehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.dockerignore
.gitignorehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.gitignore
.gitignorehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.gitignore
.markdownlint-cli2.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.markdownlint-cli2.yaml
.markdownlint-cli2.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.markdownlint-cli2.yaml
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.pre-commit-config.yaml
.pre-commit-config.yamlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.pre-commit-config.yaml
CHANGELOG-Windows.rsthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG-Windows.rst
CHANGELOG-Windows.rsthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG-Windows.rst
CHANGELOG.rsthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst
CHANGELOG.rsthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst
CODE_OF_CONDUCT.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CODE_OF_CONDUCT.md
CONTRIBUTING.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
CONTRIBUTING.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
LICENSEhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
LICENSEhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
LICENSE_HEADERhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE_HEADER
LICENSE_HEADERhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE_HEADER
README.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/README.md
README.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/README.md
SECURITY.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md
SECURITY.mdhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md
pyproject.tomlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/pyproject.toml
pyproject.tomlhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/pyproject.toml
setup.pyhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/setup.py
setup.pyhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/setup.py
tox.inihttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/tox.ini
tox.inihttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/tox.ini
READMEhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Code of conducthttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Contributinghttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Apache-2.0 licensehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Securityhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/docs/source/assets/model-optimizer-banner.png
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#nvidia-model-optimizer
https://nvidia.github.io/Model-Optimizer
https://pypi.org/project/nvidia-modelopt/
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
Documentationhttps://nvidia.github.io/Model-Optimizer
Roadmaphttps://github.com/NVIDIA/Model-Optimizer/issues/146
techniqueshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#techniques
Hugging Facehttps://huggingface.co/
PyTorchhttps://github.com/pytorch/pytorch
ONNXhttps://github.com/onnx/onnx
NVIDIA Megatron-Bridgehttps://github.com/NVIDIA-NeMo/Megatron-Bridge
Megatron-LMhttps://github.com/NVIDIA/Megatron-LM
Hugging Face Acceleratehttps://github.com/huggingface/accelerate
SGLanghttps://github.com/sgl-project/sglang
TensorRT-LLMhttps://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization
TensorRThttps://github.com/NVIDIA/TensorRT
vLLMhttps://github.com/vllm-project/vllm
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#latest-news
BLOG: Top 5 AI Model Optimization Techniques for Faster, Smarter Inferencehttps://developer.nvidia.com/blog/top-5-ai-model-optimization-techniques-for-faster-smarter-inference/
BLOG: Pruning and Distilling LLMs Using NVIDIA Model Optimizerhttps://developer.nvidia.com/blog/pruning-and-distilling-llms-using-nvidia-tensorrt-model-optimizer/
BLOG: An Introduction to Speculative Decoding for Reducing Latency in AI Inferencehttps://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/
BLOG: How Quantization Aware Training Enables Low-Precision Accuracy Recoveryhttps://developer.nvidia.com/blog/how-quantization-aware-training-enables-low-precision-accuracy-recovery/
BLOG: Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Traininghttps://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/
BLOG: Optimizing LLMs for Performance and Accuracy with Post-Training Quantizationhttps://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/
BLOG: Introducing NVFP4 for Efficient and Accurate Low-Precision Inferencehttps://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/
NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUshttps://developer.nvidia.com/blog/nvidia-tensorrt-unlocks-fp4-image-generation-for-nvidia-blackwell-geforce-rtx-50-series-gpus/
Adobe optimized deployment using Model-Optimizer + TensorRT leading to a 60% reduction in diffusion latency, a 40% reduction in total cost of ownershiphttps://developer.nvidia.com/blog/optimizing-transformer-based-diffusion-models-for-video-generation-with-nvidia-tensorrt/
NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverickhttps://developer.nvidia.com/blog/nvidia-accelerates-inference-on-meta-llama-4-scout-and-maverick/
herehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#llama-4
World's Fastest DeepSeek-R1 Inference with Blackwell FP4 & Increasing Image Generation Efficiency on Blackwellhttps://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/
DeepSeek-R1-FP4https://huggingface.co/nvidia/DeepSeek-R1-FP4
Llama-3.3-70B-Instruct-FP4https://huggingface.co/nvidia/Llama-3.3-70B-Instruct-FP4
Llama-3.1-405B-Instruct-FP4https://huggingface.co/nvidia/Llama-3.1-405B-Instruct-FP4
herehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#model-quantization-and-trt-llm-conversion
8Bhttps://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8
70Bhttps://huggingface.co/nvidia/Llama-3.1-70B-Instruct-FP8
405Bhttps://huggingface.co/nvidia/Llama-3.1-405B-Instruct-FP8
Post-Training Quantization of LLMs with NVIDIA NeMo and Model Optimizerhttps://developer.nvidia.com/blog/post-training-quantization-of-llms-with-nvidia-nemo-and-nvidia-tensorrt-model-optimizer/
Boosting Llama 3.1 405B Performance up to 44% with Model Optimizer on NVIDIA H200 GPUshttps://developer.nvidia.com/blog/boosting-llama-3-1-405b-performance-by-up-to-44-with-nvidia-tensorrt-model-optimizer-on-nvidia-h200-gpus/
Up to 1.9X Higher Llama 3.1 Performance with Medusahttps://developer.nvidia.com/blog/low-latency-inference-chapter-1-up-to-1-9x-higher-llama-3-1-performance-with-medusa-on-nvidia-hgx-h200-with-nvlink-switch/
Cache Diffusionhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers/cache_diffusion
QLoRA workflow with NVIDIA NeMohttps://docs.nvidia.com/nemo-framework/user-guide/24.09/sft_peft/qlora.html
our bloghttps://developer.nvidia.com/blog/nvidia-tensorrt-model-optimizer-v0-15-boosts-inference-performance-and-expands-model-support/
herehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#deploy-fp8-quantized-model-using-vllm
Announcement: Model Optimizer Now Formally Available to Further Accelerate GenAI Inference Performancehttps://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/
Model Optimizer supercharges TensorRT-LLM to set MLPerf LLM inference recordshttps://developer.nvidia.com/blog/nvidia-h200-tensor-core-gpus-and-nvidia-tensorrt-llm-set-mlperf-llm-inference-records/
GTC Session: Optimize Generative AI Inference with Quantization in TensorRT-LLM and TensorRThttps://www.nvidia.com/en-us/on-demand/session/gtc24-s63213/
Model Optimizer's 8-bit Post-Training Quantization enables TensorRT to accelerate Stable Diffusion to nearly 2x fasterhttps://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
Speed up inference with Model Optimizer quantization techniques in TRT-LLMhttps://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/quantization-in-TRT-LLM.md
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#install
PyPIhttps://pypi.org/project/nvidia-modelopt/
TensorRT-LLM docker imageshttps://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags
installation guidehttps://nvidia.github.io/Model-Optimizer/getting_started/2_installation.html
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#techniques
LLMshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq
diffusershttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers
VLMshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/vlm_ptq
onnxhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/onnx_ptq
windowshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/windows
docshttps://nvidia.github.io/Model-Optimizer/guides/1_quantization.html
NeMohttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat#nemo-qatqad-simplified-flow-example
Hugging Facehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat
docshttps://nvidia.github.io/Model-Optimizer/guides/1_quantization.html
PyTorchhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/pruning
docshttps://nvidia.github.io/Model-Optimizer/guides/3_pruning.html
NeMohttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill#knowledge-distillation-kd-for-nvidia-nemo-models
Hugging Facehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill
docshttps://nvidia.github.io/Model-Optimizer/guides/4_distillation.html
Megatronhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding#mlm-example
Hugging Facehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding
docshttps://nvidia.github.io/Model-Optimizer/guides/5_speculative_decoding.html
PyTorchhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_sparsity
docshttps://nvidia.github.io/Model-Optimizer/guides/6_sparsity.html
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#pre-quantized-checkpoints
🤗 Hugging Face - Nvidia Model Optimizer Collectionhttps://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer
TensorRT-LLMhttps://github.com/NVIDIA/TensorRT-LLM
vLLMhttps://github.com/vllm-project/vllm
SGLanghttps://github.com/sgl-project/sglang
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#resources
Roadmaphttps://github.com/NVIDIA/Model-Optimizer/issues/146
Documentationhttps://nvidia.github.io/Model-Optimizer
Benchmarkshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/benchmark.md
Release Noteshttps://nvidia.github.io/Model-Optimizer/reference/0_changelog.html
File a bughttps://github.com/NVIDIA/Model-Optimizer/issues/new?template=1_bug_report.md
File a Feature Requesthttps://github.com/NVIDIA/Model-Optimizer/issues/new?template=2_feature_request.md
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#model-support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/vlm_ptq/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/torch_onnx/README.md#onnx-export-supported-llm-models
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/pruning/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill/README.md#support-matrix
View Support Matrixhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding/README.md#support-matrix
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#contributing
Contributinghttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#top-contributors
https://github.com/NVIDIA/Model-Optimizer/graphs/contributors
nvidia.github.io/Model-Optimizer/https://nvidia.github.io/Model-Optimizer/
Readme https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#readme-ov-file
Apache-2.0 license https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#Apache-2.0-1-ov-file
Code of conduct https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#coc-ov-file
Contributing https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#contributing-ov-file
Security policy https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#security-ov-file
Please reload this pagehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Activityhttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/activity
Custom propertieshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/custom-properties
2k starshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/stargazers
25 watchinghttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/watchers
273 forkshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/forks
Report repository https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer&report=NVIDIA+%28user%29
Releases 20https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases
ModelOpt 0.41.0 Release Latest Jan 20, 2026 https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases/tag/0.41.0
+ 19 releaseshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases
Used by 216https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/network/dependents
+ 208 https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/network/dependents
Contributors 53https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/graphs/contributors
Please reload this pagehttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
+ 39 contributorshttps://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/graphs/contributors
Python 98.6% https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/search?l=python
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.