René's URL Explorer Experiment

Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Open Graph Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

X Title: GitHub - NVIDIA/Model-Optimizer: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed. - NVIDIA/Model-Optimizer

Open Graph Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...

X Description: A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...

Opengraph URL: https://github.com/NVIDIA/Model-Optimizer

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern	/:user_id/:repository
route-controller	files
route-action	disambiguate
fetch-nonce	v2:cf014cc6-9f3e-ddc6-f4b4-8907cee83666
current-catalog-service-hash	f3abb0cc802f3d7b95fc8762b94bdcb13bf39634c40c357301c4aa1d67a256fb
request-id	C408:1BFE61:127C26B:17E4D25:6992568C
html-safe-nonce	522a95c33a0a19f55717fdfe12c13d4c2fd7ff29a59138713491bf911f408356
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDNDA4OjFCRkU2MToxMjdDMjZCOjE3RTREMjU6Njk5MjU2OEMiLCJ2aXNpdG9yX2lkIjoiNDA4MzE0MTIwMjk0OTU5MjcxNyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	2f8ad7e1204e6259f86fb9b37d8fa43b0ae3f7849686331c1d7025acfca3c50f
hovercard-subject-tag	repository:790916393
github-keyboard-shortcuts	repository,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	//
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/NVIDIA/Model-Optimizer
twitter:image	https://opengraph.githubassets.com/6d2e6a182c803c2c222b54f6b25fcb6d91cdf5cbeedcc03b220204962bfa0340/NVIDIA/Model-Optimizer
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/6d2e6a182c803c2c222b54f6b25fcb6d91cdf5cbeedcc03b220204962bfa0340/NVIDIA/Model-Optimizer
og:image:alt	A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks ...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
hostname	github.com
expected-hostname	github.com
None	42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-control	no-preview
go-import	github.com/NVIDIA/Model-Optimizer git https://github.com/NVIDIA/Model-Optimizer.git
octolytics-dimension-user_id	1728152
octolytics-dimension-user_login	NVIDIA
octolytics-dimension-repository_id	790916393
octolytics-dimension-repository_nwo	NVIDIA/Model-Optimizer
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	790916393
octolytics-dimension-repository_network_root_nwo	NVIDIA/Model-Optimizer
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	848bc6032dcc93a9a7301dcc3f379a72ba13b96e
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#start-of-content
	https://patch-diff.githubusercontent.com/
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer
Sign up	https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E&source=header-repo&source_repo=NVIDIA%2FModel-Optimizer
Reload	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Reload	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Reload	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
NVIDIA	https://patch-diff.githubusercontent.com/NVIDIA
Model-Optimizer	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Notifications	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Fork 273	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Star 2k	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
nvidia.github.io/Model-Optimizer/	https://nvidia.github.io/Model-Optimizer/
Apache-2.0 license	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
2k stars	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/stargazers
273 forks	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/forks
Branches	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
Tags	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
Activity	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/activity
Star	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Notifications	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FModel-Optimizer
Code	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Issues 67	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/issues
Pull requests 87	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulls
Actions	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/actions
Security 0	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/security
Insights	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulse
Code	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Issues	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/issues
Pull requests	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulls
Actions	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/actions
Security	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/security
Insights	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/pulse
Branches	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
Tags	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/branches
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tags
461 Commits	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/commits/main/
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/commits/main/
.github	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.github
.github	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.github
.gitlab	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.gitlab
.gitlab	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.gitlab
.vscode	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.vscode
.vscode	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/.vscode
docs/source	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/docs/source
docs/source	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/docs/source
examples	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/examples
examples	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/examples
experimental	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/experimental
experimental	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/experimental
modelopt	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/modelopt
modelopt	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/modelopt
tests	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/tests
tests	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/tree/main/tests
.coderabbit.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.coderabbit.yaml
.coderabbit.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.coderabbit.yaml
.dockerignore	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.dockerignore
.dockerignore	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.dockerignore
.gitignore	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.gitignore
.gitignore	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.gitignore
.markdownlint-cli2.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.markdownlint-cli2.yaml
.markdownlint-cli2.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.markdownlint-cli2.yaml
.pre-commit-config.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.pre-commit-config.yaml
.pre-commit-config.yaml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/.pre-commit-config.yaml
CHANGELOG-Windows.rst	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG-Windows.rst
CHANGELOG-Windows.rst	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG-Windows.rst
CHANGELOG.rst	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst
CHANGELOG.rst	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst
CODE_OF_CONDUCT.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CODE_OF_CONDUCT.md
CODE_OF_CONDUCT.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CODE_OF_CONDUCT.md
CONTRIBUTING.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
CONTRIBUTING.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
LICENSE	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
LICENSE	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
LICENSE_HEADER	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE_HEADER
LICENSE_HEADER	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE_HEADER
README.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/README.md
README.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/README.md
SECURITY.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md
SECURITY.md	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md
pyproject.toml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/pyproject.toml
pyproject.toml	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/pyproject.toml
setup.py	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/setup.py
setup.py	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/setup.py
tox.ini	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/tox.ini
tox.ini	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/tox.ini
README	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Code of conduct	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Contributing	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Apache-2.0 license	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Security	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/docs/source/assets/model-optimizer-banner.png
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#nvidia-model-optimizer
	https://nvidia.github.io/Model-Optimizer
	https://pypi.org/project/nvidia-modelopt/
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/LICENSE
Documentation	https://nvidia.github.io/Model-Optimizer
Roadmap	https://github.com/NVIDIA/Model-Optimizer/issues/146
techniques	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#techniques
Hugging Face	https://huggingface.co/
PyTorch	https://github.com/pytorch/pytorch
ONNX	https://github.com/onnx/onnx
NVIDIA Megatron-Bridge	https://github.com/NVIDIA-NeMo/Megatron-Bridge
Megatron-LM	https://github.com/NVIDIA/Megatron-LM
Hugging Face Accelerate	https://github.com/huggingface/accelerate
SGLang	https://github.com/sgl-project/sglang
TensorRT-LLM	https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization
TensorRT	https://github.com/NVIDIA/TensorRT
vLLM	https://github.com/vllm-project/vllm
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#latest-news
BLOG: Top 5 AI Model Optimization Techniques for Faster, Smarter Inference	https://developer.nvidia.com/blog/top-5-ai-model-optimization-techniques-for-faster-smarter-inference/
BLOG: Pruning and Distilling LLMs Using NVIDIA Model Optimizer	https://developer.nvidia.com/blog/pruning-and-distilling-llms-using-nvidia-tensorrt-model-optimizer/
BLOG: An Introduction to Speculative Decoding for Reducing Latency in AI Inference	https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/
BLOG: How Quantization Aware Training Enables Low-Precision Accuracy Recovery	https://developer.nvidia.com/blog/how-quantization-aware-training-enables-low-precision-accuracy-recovery/
BLOG: Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training	https://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/
BLOG: Optimizing LLMs for Performance and Accuracy with Post-Training Quantization	https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/
BLOG: Introducing NVFP4 for Efficient and Accurate Low-Precision Inference	https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/
NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUs	https://developer.nvidia.com/blog/nvidia-tensorrt-unlocks-fp4-image-generation-for-nvidia-blackwell-geforce-rtx-50-series-gpus/
Adobe optimized deployment using Model-Optimizer + TensorRT leading to a 60% reduction in diffusion latency, a 40% reduction in total cost of ownership	https://developer.nvidia.com/blog/optimizing-transformer-based-diffusion-models-for-video-generation-with-nvidia-tensorrt/
NVIDIA Accelerates Inference on Meta Llama 4 Scout and Maverick	https://developer.nvidia.com/blog/nvidia-accelerates-inference-on-meta-llama-4-scout-and-maverick/
here	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#llama-4
World's Fastest DeepSeek-R1 Inference with Blackwell FP4 & Increasing Image Generation Efficiency on Blackwell	https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/
DeepSeek-R1-FP4	https://huggingface.co/nvidia/DeepSeek-R1-FP4
Llama-3.3-70B-Instruct-FP4	https://huggingface.co/nvidia/Llama-3.3-70B-Instruct-FP4
Llama-3.1-405B-Instruct-FP4	https://huggingface.co/nvidia/Llama-3.1-405B-Instruct-FP4
here	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#model-quantization-and-trt-llm-conversion
8B	https://huggingface.co/nvidia/Llama-3.1-8B-Instruct-FP8
70B	https://huggingface.co/nvidia/Llama-3.1-70B-Instruct-FP8
405B	https://huggingface.co/nvidia/Llama-3.1-405B-Instruct-FP8
Post-Training Quantization of LLMs with NVIDIA NeMo and Model Optimizer	https://developer.nvidia.com/blog/post-training-quantization-of-llms-with-nvidia-nemo-and-nvidia-tensorrt-model-optimizer/
Boosting Llama 3.1 405B Performance up to 44% with Model Optimizer on NVIDIA H200 GPUs	https://developer.nvidia.com/blog/boosting-llama-3-1-405b-performance-by-up-to-44-with-nvidia-tensorrt-model-optimizer-on-nvidia-h200-gpus/
Up to 1.9X Higher Llama 3.1 Performance with Medusa	https://developer.nvidia.com/blog/low-latency-inference-chapter-1-up-to-1-9x-higher-llama-3-1-performance-with-medusa-on-nvidia-hgx-h200-with-nvlink-switch/
Cache Diffusion	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers/cache_diffusion
QLoRA workflow with NVIDIA NeMo	https://docs.nvidia.com/nemo-framework/user-guide/24.09/sft_peft/qlora.html
our blog	https://developer.nvidia.com/blog/nvidia-tensorrt-model-optimizer-v0-15-boosts-inference-performance-and-expands-model-support/
here	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#deploy-fp8-quantized-model-using-vllm
Announcement: Model Optimizer Now Formally Available to Further Accelerate GenAI Inference Performance	https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/
Model Optimizer supercharges TensorRT-LLM to set MLPerf LLM inference records	https://developer.nvidia.com/blog/nvidia-h200-tensor-core-gpus-and-nvidia-tensorrt-llm-set-mlperf-llm-inference-records/
GTC Session: Optimize Generative AI Inference with Quantization in TensorRT-LLM and TensorRT	https://www.nvidia.com/en-us/on-demand/session/gtc24-s63213/
Model Optimizer's 8-bit Post-Training Quantization enables TensorRT to accelerate Stable Diffusion to nearly 2x faster	https://developer.nvidia.com/blog/tensorrt-accelerates-stable-diffusion-nearly-2x-faster-with-8-bit-post-training-quantization/
Speed up inference with Model Optimizer quantization techniques in TRT-LLM	https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/quantization-in-TRT-LLM.md
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#install
PyPI	https://pypi.org/project/nvidia-modelopt/
TensorRT-LLM docker images	https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags
installation guide	https://nvidia.github.io/Model-Optimizer/getting_started/2_installation.html
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#techniques
LLMs	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq
diffusers	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers
VLMs	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/vlm_ptq
onnx	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/onnx_ptq
windows	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/windows
docs	https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html
NeMo	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat#nemo-qatqad-simplified-flow-example
Hugging Face	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat
docs	https://nvidia.github.io/Model-Optimizer/guides/1_quantization.html
PyTorch	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/pruning
docs	https://nvidia.github.io/Model-Optimizer/guides/3_pruning.html
NeMo	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill#knowledge-distillation-kd-for-nvidia-nemo-models
Hugging Face	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill
docs	https://nvidia.github.io/Model-Optimizer/guides/4_distillation.html
Megatron	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding#mlm-example
Hugging Face	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding
docs	https://nvidia.github.io/Model-Optimizer/guides/5_speculative_decoding.html
PyTorch	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_sparsity
docs	https://nvidia.github.io/Model-Optimizer/guides/6_sparsity.html
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#pre-quantized-checkpoints
🤗 Hugging Face - Nvidia Model Optimizer Collection	https://huggingface.co/collections/nvidia/inference-optimized-checkpoints-with-model-optimizer
TensorRT-LLM	https://github.com/NVIDIA/TensorRT-LLM
vLLM	https://github.com/vllm-project/vllm
SGLang	https://github.com/sgl-project/sglang
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#resources
Roadmap	https://github.com/NVIDIA/Model-Optimizer/issues/146
Documentation	https://nvidia.github.io/Model-Optimizer
Benchmarks	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/benchmark.md
Release Notes	https://nvidia.github.io/Model-Optimizer/reference/0_changelog.html
File a bug	https://github.com/NVIDIA/Model-Optimizer/issues/new?template=1_bug_report.md
File a Feature Request	https://github.com/NVIDIA/Model-Optimizer/issues/new?template=2_feature_request.md
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#model-support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_ptq/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/diffusers/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/vlm_ptq/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/torch_onnx/README.md#onnx-export-supported-llm-models
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/pruning/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_distill/README.md#support-matrix
View Support Matrix	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/examples/speculative_decoding/README.md#support-matrix
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#contributing
Contributing	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md
	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#top-contributors
	https://github.com/NVIDIA/Model-Optimizer/graphs/contributors
nvidia.github.io/Model-Optimizer/	https://nvidia.github.io/Model-Optimizer/
Readme	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#readme-ov-file
Apache-2.0 license	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#Apache-2.0-1-ov-file
Code of conduct	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#coc-ov-file
Contributing	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#contributing-ov-file
Security policy	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer#security-ov-file
Please reload this page	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
Activity	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/activity
Custom properties	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/custom-properties
2k stars	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/stargazers
25 watching	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/watchers
273 forks	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/forks
Report repository	https://patch-diff.githubusercontent.com/contact/report-content?content_url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FModel-Optimizer&report=NVIDIA+%28user%29
Releases 20	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases
ModelOpt 0.41.0 Release Latest Jan 20, 2026	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases/tag/0.41.0
+ 19 releases	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/releases
Used by 216	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/network/dependents
+ 208	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/network/dependents
Contributors 53	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/graphs/contributors
Please reload this page	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer
+ 39 contributors	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/graphs/contributors
Python 98.6%	https://patch-diff.githubusercontent.com/NVIDIA/Model-Optimizer/search?l=python
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.