René's URL Explorer Experiment


Title: vLLM

Generator: mkdocs-1.6.1, mkdocs-material-9.7.1

direct link

Domain: docs.vllm.ai

readthedocs-project-slugvllm
readthedocs-version-sluglatest
readthedocs-resolver-filename/
readthedocs-http-status200

Links:

Skip to content https://docs.vllm.ai/en/latest/#welcome-to-vllm
Click herehttps://docs.vllm.ai/en/stable/
https://docs.vllm.ai/en/latest/
javascript:void(0)
GitHub https://github.com/vllm-project/vllm
Home https://docs.vllm.ai/en/latest/
User Guide https://docs.vllm.ai/en/latest/usage/
Developer Guide https://docs.vllm.ai/en/latest/contributing/
Benchmarking https://docs.vllm.ai/en/latest/benchmarking/
API Reference https://docs.vllm.ai/en/latest/api/
CLI Reference https://docs.vllm.ai/en/latest/cli/
Community https://docs.vllm.ai/en/latest/community/contact_us/
https://docs.vllm.ai/en/latest/
GitHub https://github.com/vllm-project/vllm
Home https://docs.vllm.ai/en/latest/
User Guide https://docs.vllm.ai/en/latest/usage/
Quickstart https://docs.vllm.ai/en/latest/getting_started/quickstart/
Installation https://docs.vllm.ai/en/latest/getting_started/installation/
GPU https://docs.vllm.ai/en/latest/getting_started/installation/gpu/
CPU https://docs.vllm.ai/en/latest/getting_started/installation/cpu/
TPU https://docs.vllm.ai/projects/tpu/en/latest/getting_started/installation/
Examples https://docs.vllm.ai/en/latest/examples/
Async LLM Streaming https://docs.vllm.ai/en/latest/examples/offline_inference/async_llm_streaming/
Audio Language https://docs.vllm.ai/en/latest/examples/offline_inference/audio_language/
Automatic Prefix Caching https://docs.vllm.ai/en/latest/examples/offline_inference/automatic_prefix_caching/
Basic https://docs.vllm.ai/en/latest/examples/offline_inference/basic/
Batch LLM Inference https://docs.vllm.ai/en/latest/examples/offline_inference/batch_llm_inference/
Chat With Tools https://docs.vllm.ai/en/latest/examples/offline_inference/chat_with_tools/
Context Extension https://docs.vllm.ai/en/latest/examples/offline_inference/context_extension/
Data Parallel https://docs.vllm.ai/en/latest/examples/offline_inference/data_parallel/
Disaggregated Prefill V1 https://docs.vllm.ai/en/latest/examples/offline_inference/disaggregated-prefill-v1/
Disaggregated Prefill https://docs.vllm.ai/en/latest/examples/offline_inference/disaggregated_prefill/
Encoder Decoder Multimodal https://docs.vllm.ai/en/latest/examples/offline_inference/encoder_decoder_multimodal/
KV Load Failure Recovery Test https://docs.vllm.ai/en/latest/examples/offline_inference/kv_load_failure_recovery/
LLM Engine Example https://docs.vllm.ai/en/latest/examples/offline_inference/llm_engine_example/
LLM Engine Reset Kv https://docs.vllm.ai/en/latest/examples/offline_inference/llm_engine_reset_kv/
Load Sharded State https://docs.vllm.ai/en/latest/examples/offline_inference/load_sharded_state/
Logits Processor https://docs.vllm.ai/en/latest/examples/offline_inference/logits_processor/
LoRA With Quantization Inference https://docs.vllm.ai/en/latest/examples/offline_inference/lora_with_quantization_inference/
Metrics https://docs.vllm.ai/en/latest/examples/offline_inference/metrics/
Mistral-Small https://docs.vllm.ai/en/latest/examples/offline_inference/mistral-small/
MLPSpeculator https://docs.vllm.ai/en/latest/examples/offline_inference/mlpspeculator/
MultiLoRA Inference https://docs.vllm.ai/en/latest/examples/offline_inference/multilora_inference/
New Weight Syncing https://docs.vllm.ai/en/latest/examples/offline_inference/new_weight_syncing/
Offline Inference with the OpenAI Batch file format https://docs.vllm.ai/en/latest/examples/offline_inference/openai_batch/
Pause Resume https://docs.vllm.ai/en/latest/examples/offline_inference/pause_resume/
Prefix Caching https://docs.vllm.ai/en/latest/examples/offline_inference/prefix_caching/
Prompt Embed Inference https://docs.vllm.ai/en/latest/examples/offline_inference/prompt_embed_inference/
Qwen2.5-Omni Offline Inference Examples https://docs.vllm.ai/en/latest/examples/offline_inference/qwen2_5_omni/
Qwen3 Omni https://docs.vllm.ai/en/latest/examples/offline_inference/qwen3_omni/
Qwen 1M https://docs.vllm.ai/en/latest/examples/offline_inference/qwen_1m/
Reproducibility https://docs.vllm.ai/en/latest/examples/offline_inference/reproducibility/
RLHF https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf/
RLHF Colocate https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf_colocate/
RLHF Online Quant https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf_online_quant/
RLHF Utils https://docs.vllm.ai/en/latest/examples/offline_inference/rlhf_utils/
Run One Batch https://docs.vllm.ai/en/latest/examples/offline_inference/run_one_batch/
Save Sharded State https://docs.vllm.ai/en/latest/examples/offline_inference/save_sharded_state/
Simple Profiling https://docs.vllm.ai/en/latest/examples/offline_inference/simple_profiling/
Skip Loading Weights In Engine Init https://docs.vllm.ai/en/latest/examples/offline_inference/skip_loading_weights_in_engine_init/
Spec Decode https://docs.vllm.ai/en/latest/examples/offline_inference/spec_decode/
Structured Outputs https://docs.vllm.ai/en/latest/examples/offline_inference/structured_outputs/
Torchrun Dp Example https://docs.vllm.ai/en/latest/examples/offline_inference/torchrun_dp_example/
Torchrun Example https://docs.vllm.ai/en/latest/examples/offline_inference/torchrun_example/
Vision Language https://docs.vllm.ai/en/latest/examples/offline_inference/vision_language/
Vision Language Multi Image https://docs.vllm.ai/en/latest/examples/offline_inference/vision_language_multi_image/
API Client https://docs.vllm.ai/en/latest/examples/online_serving/api_client/
Helm Charts https://docs.vllm.ai/en/latest/examples/online_serving/chart-helm/
Monitoring Dashboards https://docs.vllm.ai/en/latest/examples/online_serving/dashboards/
Data Parallel Pause Resume https://docs.vllm.ai/en/latest/examples/online_serving/data_parallel_pause_resume/
Disaggregated Encoder https://docs.vllm.ai/en/latest/examples/online_serving/disaggregated_encoder/
Disaggregated Prefill https://docs.vllm.ai/en/latest/examples/online_serving/disaggregated_prefill/
Disaggregated Serving https://docs.vllm.ai/en/latest/examples/online_serving/disaggregated_serving/
Disaggregated Serving P2P Nccl Xpyd https://docs.vllm.ai/en/latest/examples/online_serving/disaggregated_serving_p2p_nccl_xpyd/
Elastic Ep https://docs.vllm.ai/en/latest/examples/online_serving/elastic_ep/
Gradio OpenAI Chatbot Webserver https://docs.vllm.ai/en/latest/examples/online_serving/gradio_openai_chatbot_webserver/
Gradio Webserver https://docs.vllm.ai/en/latest/examples/online_serving/gradio_webserver/
Kv Events Subscriber https://docs.vllm.ai/en/latest/examples/online_serving/kv_events_subscriber/
Multi-Node-Serving https://docs.vllm.ai/en/latest/examples/online_serving/multi-node-serving/
Multi Instance Data Parallel https://docs.vllm.ai/en/latest/examples/online_serving/multi_instance_data_parallel/
OpenAI Chat Completion Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client/
OpenAI Chat Completion Client For Multimodal https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client_for_multimodal/
OpenAI Chat Completion Client With Tools https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client_with_tools/
OpenAI Chat Completion Client With Tools Required https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client_with_tools_required/
OpenAI Chat Completion Client With Tools Xlam https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client_with_tools_xlam/
OpenAI Chat Completion Client With Tools Xlam Streaming https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_client_with_tools_xlam_streaming/
OpenAI Chat Completion Tool Calls With Reasoning https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_tool_calls_with_reasoning/
OpenAI Chat Completion With Reasoning https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_with_reasoning/
OpenAI Chat Completion With Reasoning Streaming https://docs.vllm.ai/en/latest/examples/online_serving/openai_chat_completion_with_reasoning_streaming/
OpenAI Completion Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_completion_client/
OpenAI Realtime Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_realtime_client/
OpenAI Realtime Microphone Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_realtime_microphone_client/
OpenAI Responses Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_responses_client/
OpenAI Responses Client With Mcp Tools https://docs.vllm.ai/en/latest/examples/online_serving/openai_responses_client_with_mcp_tools/
OpenAI Responses Client With Tools https://docs.vllm.ai/en/latest/examples/online_serving/openai_responses_client_with_tools/
OpenAI Transcription Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_transcription_client/
OpenAI Translation Client https://docs.vllm.ai/en/latest/examples/online_serving/openai_translation_client/
Setup OpenTelemetry POC https://docs.vllm.ai/en/latest/examples/online_serving/opentelemetry/
Prometheus and Grafana https://docs.vllm.ai/en/latest/examples/online_serving/prometheus_grafana/
Prompt Embed Inference With OpenAI Client https://docs.vllm.ai/en/latest/examples/online_serving/prompt_embed_inference_with_openai_client/
Ray Serve Deepseek https://docs.vllm.ai/en/latest/examples/online_serving/ray_serve_deepseek/
Retrieval Augmented Generation With Langchain https://docs.vllm.ai/en/latest/examples/online_serving/retrieval_augmented_generation_with_langchain/
Retrieval Augmented Generation With Llamaindex https://docs.vllm.ai/en/latest/examples/online_serving/retrieval_augmented_generation_with_llamaindex/
RLHF Http https://docs.vllm.ai/en/latest/examples/online_serving/rlhf_http/
Run Cluster https://docs.vllm.ai/en/latest/examples/online_serving/run_cluster/
Sagemaker-Entrypoint https://docs.vllm.ai/en/latest/examples/online_serving/sagemaker-entrypoint/
Streamlit OpenAI Chatbot Webserver https://docs.vllm.ai/en/latest/examples/online_serving/streamlit_openai_chatbot_webserver/
Structured Outputs https://docs.vllm.ai/en/latest/examples/online_serving/structured_outputs/
Token Generation Client https://docs.vllm.ai/en/latest/examples/online_serving/token_generation_client/
Utils https://docs.vllm.ai/en/latest/examples/online_serving/utils/
LMCache Examples https://docs.vllm.ai/en/latest/examples/others/lmcache/
Logging Configuration https://docs.vllm.ai/en/latest/examples/others/logging_configuration/
Tensorize vLLM Model https://docs.vllm.ai/en/latest/examples/others/tensorize_vllm_model/
Classify https://docs.vllm.ai/en/latest/examples/pooling/classify/
Embed https://docs.vllm.ai/en/latest/examples/pooling/embed/
Plugin https://docs.vllm.ai/en/latest/examples/pooling/plugin/
Pooling https://docs.vllm.ai/en/latest/examples/pooling/pooling/
Score https://docs.vllm.ai/en/latest/examples/pooling/score/
Token Classify https://docs.vllm.ai/en/latest/examples/pooling/token_classify/
Token Embed https://docs.vllm.ai/en/latest/examples/pooling/token_embed/
vLLM V1 https://docs.vllm.ai/en/latest/usage/v1_guide/
Frequently Asked Questions https://docs.vllm.ai/en/latest/usage/faq/
Production Metrics https://docs.vllm.ai/en/latest/usage/metrics/
Reproducibility https://docs.vllm.ai/en/latest/usage/reproducibility/
Security https://docs.vllm.ai/en/latest/usage/security/
Troubleshooting https://docs.vllm.ai/en/latest/usage/troubleshooting/
Usage Stats Collection https://docs.vllm.ai/en/latest/usage/usage_stats/
Offline Inference https://docs.vllm.ai/en/latest/serving/offline_inference/
OpenAI-Compatible Server https://docs.vllm.ai/en/latest/serving/openai_compatible_server/
Context Parallel Deployment https://docs.vllm.ai/en/latest/serving/context_parallel_deployment/
Data Parallel Deployment https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/
Troubleshooting distributed deployments https://docs.vllm.ai/en/latest/serving/distributed_troubleshooting/
Expert Parallel Deployment https://docs.vllm.ai/en/latest/serving/expert_parallel_deployment/
Parallelism and Scaling https://docs.vllm.ai/en/latest/serving/parallelism_scaling/
Claude Code https://docs.vllm.ai/en/latest/serving/integrations/claude_code/
LangChain https://docs.vllm.ai/en/latest/serving/integrations/langchain/
LlamaIndex https://docs.vllm.ai/en/latest/serving/integrations/llamaindex/
Using Docker https://docs.vllm.ai/en/latest/deployment/docker/
Using Kubernetes https://docs.vllm.ai/en/latest/deployment/k8s/
Using Nginx https://docs.vllm.ai/en/latest/deployment/nginx/
Anyscale https://docs.vllm.ai/en/latest/deployment/frameworks/anyscale/
AnythingLLM https://docs.vllm.ai/en/latest/deployment/frameworks/anything-llm/
AutoGen https://docs.vllm.ai/en/latest/deployment/frameworks/autogen/
BentoML https://docs.vllm.ai/en/latest/deployment/frameworks/bentoml/
Cerebrium https://docs.vllm.ai/en/latest/deployment/frameworks/cerebrium/
Chatbox https://docs.vllm.ai/en/latest/deployment/frameworks/chatbox/
Dify https://docs.vllm.ai/en/latest/deployment/frameworks/dify/
dstack https://docs.vllm.ai/en/latest/deployment/frameworks/dstack/
Haystack https://docs.vllm.ai/en/latest/deployment/frameworks/haystack/
Helm https://docs.vllm.ai/en/latest/deployment/frameworks/helm/
Hugging Face Inference Endpoints https://docs.vllm.ai/en/latest/deployment/frameworks/hf_inference_endpoints/
LiteLLM https://docs.vllm.ai/en/latest/deployment/frameworks/litellm/
Lobe Chat https://docs.vllm.ai/en/latest/deployment/frameworks/lobe-chat/
LWS https://docs.vllm.ai/en/latest/deployment/frameworks/lws/
Modal https://docs.vllm.ai/en/latest/deployment/frameworks/modal/
Open WebUI https://docs.vllm.ai/en/latest/deployment/frameworks/open-webui/
Retrieval-Augmented Generation https://docs.vllm.ai/en/latest/deployment/frameworks/retrieval_augmented_generation/
SkyPilot https://docs.vllm.ai/en/latest/deployment/frameworks/skypilot/
Streamlit https://docs.vllm.ai/en/latest/deployment/frameworks/streamlit/
NVIDIA Triton https://docs.vllm.ai/en/latest/deployment/frameworks/triton/
KAITO https://docs.vllm.ai/en/latest/deployment/integrations/kaito/
KServe https://docs.vllm.ai/en/latest/deployment/integrations/kserve/
Kthena https://docs.vllm.ai/en/latest/deployment/integrations/kthena/
KubeAI https://docs.vllm.ai/en/latest/deployment/integrations/kubeai/
KubeRay https://docs.vllm.ai/en/latest/deployment/integrations/kuberay/
Llama Stack https://docs.vllm.ai/en/latest/deployment/integrations/llamastack/
llm-d https://docs.vllm.ai/en/latest/deployment/integrations/llm-d/
llmaz https://docs.vllm.ai/en/latest/deployment/integrations/llmaz/
Production stack https://docs.vllm.ai/en/latest/deployment/integrations/production-stack/
Reinforcement Learning from Human Feedback https://docs.vllm.ai/en/latest/training/rlhf/
Transformers Reinforcement Learning https://docs.vllm.ai/en/latest/training/trl/
Configuration https://docs.vllm.ai/en/latest/configuration/
Conserving Memory https://docs.vllm.ai/en/latest/configuration/conserving_memory/
Engine Arguments https://docs.vllm.ai/en/latest/configuration/engine_args/
Environment Variables https://docs.vllm.ai/en/latest/configuration/env_vars/
Model Resolution https://docs.vllm.ai/en/latest/configuration/model_resolution/
Optimization and Tuning https://docs.vllm.ai/en/latest/configuration/optimization/
Server Arguments https://docs.vllm.ai/en/latest/configuration/serve_args/
TPU https://docs.vllm.ai/projects/tpu/en/latest/
Supported Models https://docs.vllm.ai/en/latest/models/supported_models/
Generative Models https://docs.vllm.ai/en/latest/models/generative_models/
Pooling Models https://docs.vllm.ai/en/latest/models/pooling_models/
Loading model weights with fastsafetensors https://docs.vllm.ai/en/latest/models/extensions/fastsafetensor/
Loading models with Run:ai Model Streamer https://docs.vllm.ai/en/latest/models/extensions/runai_model_streamer/
Loading models with CoreWeave's Tensorizer https://docs.vllm.ai/en/latest/models/extensions/tensorizer/
CPU - Intel® Xeon® https://docs.vllm.ai/en/latest/models/hardware_supported_models/cpu/
XPU - Intel® GPUs https://docs.vllm.ai/en/latest/models/hardware_supported_models/xpu/
TPU https://docs.vllm.ai/projects/tpu/en/latest/recommended_models_features/
Features https://docs.vllm.ai/en/latest/features/
Automatic Prefix Caching https://docs.vllm.ai/en/latest/features/automatic_prefix_caching/
Batch Invariance https://docs.vllm.ai/en/latest/features/batch_invariance/
Custom Arguments https://docs.vllm.ai/en/latest/features/custom_arguments/
Custom Logits Processors https://docs.vllm.ai/en/latest/features/custom_logitsprocs/
Disaggregated Encoder https://docs.vllm.ai/en/latest/features/disagg_encoder/
Disaggregated Prefilling (experimental) https://docs.vllm.ai/en/latest/features/disagg_prefill/
Interleaved Thinking https://docs.vllm.ai/en/latest/features/interleaved_thinking/
LoRA Adapters https://docs.vllm.ai/en/latest/features/lora/
MooncakeConnector Usage Guide https://docs.vllm.ai/en/latest/features/mooncake_connector_usage/
Multimodal Inputs https://docs.vllm.ai/en/latest/features/multimodal_inputs/
NixlConnector Usage Guide https://docs.vllm.ai/en/latest/features/nixl_connector_usage/
Prompt Embedding Inputs https://docs.vllm.ai/en/latest/features/prompt_embeds/
Reasoning Outputs https://docs.vllm.ai/en/latest/features/reasoning_outputs/
Sleep Mode https://docs.vllm.ai/en/latest/features/sleep_mode/
Structured Outputs https://docs.vllm.ai/en/latest/features/structured_outputs/
Tool Calling https://docs.vllm.ai/en/latest/features/tool_calling/
Quantization https://docs.vllm.ai/en/latest/features/quantization/
AutoAWQ https://docs.vllm.ai/en/latest/features/quantization/auto_awq/
BitsAndBytes https://docs.vllm.ai/en/latest/features/quantization/bnb/
FP8 W8A8 https://docs.vllm.ai/en/latest/features/quantization/fp8/
GGUF https://docs.vllm.ai/en/latest/features/quantization/gguf/
GPTQModel https://docs.vllm.ai/en/latest/features/quantization/gptqmodel/
Intel Quantization Support https://docs.vllm.ai/en/latest/features/quantization/inc/
INT4 W4A16 https://docs.vllm.ai/en/latest/features/quantization/int4/
INT8 W8A8 https://docs.vllm.ai/en/latest/features/quantization/int8/
LLM Compressor https://docs.vllm.ai/en/latest/features/quantization/llm_compressor/
NVIDIA Model Optimizer https://docs.vllm.ai/en/latest/features/quantization/modelopt/
Quantized KV Cache https://docs.vllm.ai/en/latest/features/quantization/quantized_kvcache/
AMD Quark https://docs.vllm.ai/en/latest/features/quantization/quark/
TorchAO https://docs.vllm.ai/en/latest/features/quantization/torchao/
Spec decode https://docs.vllm.ai/en/latest/features/spec_decode/
Speculators https://docs.vllm.ai/en/latest/features/spec_decode/speculators/
Developer Guide https://docs.vllm.ai/en/latest/contributing/
Deprecation Policy https://docs.vllm.ai/en/latest/contributing/deprecation_policy/
Dockerfile https://docs.vllm.ai/en/latest/contributing/dockerfile/dockerfile/
Incremental Compilation Workflow https://docs.vllm.ai/en/latest/contributing/incremental_build/
Profiling vLLM https://docs.vllm.ai/en/latest/contributing/profiling/
Vulnerability Management https://docs.vllm.ai/en/latest/contributing/vulnerability_management/
Model Implementation https://docs.vllm.ai/en/latest/contributing/model/
Basic Model https://docs.vllm.ai/en/latest/contributing/model/basic/
Registering a Model https://docs.vllm.ai/en/latest/contributing/model/registration/
Unit Testing https://docs.vllm.ai/en/latest/contributing/model/tests/
Multi-Modal Support https://docs.vllm.ai/en/latest/contributing/model/multimodal/
Speech-to-Text (Transcription/Translation) Support https://docs.vllm.ai/en/latest/contributing/model/transcription/
CI Failures https://docs.vllm.ai/en/latest/contributing/ci/failures/
Nightly Builds of vLLM Wheels https://docs.vllm.ai/en/latest/contributing/ci/nightly_builds/
Update PyTorch version on vLLM OSS CI/CD https://docs.vllm.ai/en/latest/contributing/ci/update_pytorch_version/
IO Processor Plugins https://docs.vllm.ai/en/latest/design/io_processor_plugins/
LoRA Resolver Plugins https://docs.vllm.ai/en/latest/design/lora_resolver_plugins/
Plugin System https://docs.vllm.ai/en/latest/design/plugin_system/
Architecture Overview https://docs.vllm.ai/en/latest/design/arch_overview/
Attention Backend Feature Support https://docs.vllm.ai/en/latest/design/attention_backends/
CUDA Graphs https://docs.vllm.ai/en/latest/design/cuda_graphs/
CustomOp https://docs.vllm.ai/en/latest/design/custom_op/
Dual Batch Overlap https://docs.vllm.ai/en/latest/design/dbo/
How to debug the vLLM-torch.compile integration https://docs.vllm.ai/en/latest/design/debug_vllm_compile/
Fused MoE Modular Kernel https://docs.vllm.ai/en/latest/design/fused_moe_modular_kernel/
Integration with Hugging Face https://docs.vllm.ai/en/latest/design/huggingface_integration/
Hybrid KV Cache Manager https://docs.vllm.ai/en/latest/design/hybrid_kv_cache_manager/
Logits Processors https://docs.vllm.ai/en/latest/design/logits_processors/
Metrics https://docs.vllm.ai/en/latest/design/metrics/
Multi-Modal Data Processing https://docs.vllm.ai/en/latest/design/mm_processing/
Fused MoE Kernel Features https://docs.vllm.ai/en/latest/design/moe_kernel_features/
Python Multiprocessing https://docs.vllm.ai/en/latest/design/multiprocessing/
Optimization levels https://docs.vllm.ai/en/latest/design/optimization_levels/
P2P NCCL Connector https://docs.vllm.ai/en/latest/design/p2p_nccl_connector/
Paged Attention https://docs.vllm.ai/en/latest/design/paged_attention/
Automatic Prefix Caching https://docs.vllm.ai/en/latest/design/prefix_caching/
torch.compile integration https://docs.vllm.ai/en/latest/design/torch_compile/
torch.compile with Multimodal Encoders https://docs.vllm.ai/en/latest/design/torch_compile_multimodal/
Benchmarking https://docs.vllm.ai/en/latest/benchmarking/
Benchmark CLI https://docs.vllm.ai/en/latest/benchmarking/cli/
Parameter Sweeps https://docs.vllm.ai/en/latest/benchmarking/sweeps/
Performance Dashboard https://docs.vllm.ai/en/latest/benchmarking/dashboard/
API Reference https://docs.vllm.ai/en/latest/api/
vllm https://docs.vllm.ai/en/latest/api/vllm/
beam_search https://docs.vllm.ai/en/latest/api/vllm/beam_search/
collect_env https://docs.vllm.ai/en/latest/api/vllm/collect_env/
connections https://docs.vllm.ai/en/latest/api/vllm/connections/
env_override https://docs.vllm.ai/en/latest/api/vllm/env_override/
envs https://docs.vllm.ai/en/latest/api/vllm/envs/
exceptions https://docs.vllm.ai/en/latest/api/vllm/exceptions/
forward_context https://docs.vllm.ai/en/latest/api/vllm/forward_context/
logger https://docs.vllm.ai/en/latest/api/vllm/logger/
logits_process https://docs.vllm.ai/en/latest/api/vllm/logits_process/
logprobs https://docs.vllm.ai/en/latest/api/vllm/logprobs/
model_inspection https://docs.vllm.ai/en/latest/api/vllm/model_inspection/
outputs https://docs.vllm.ai/en/latest/api/vllm/outputs/
pooling_params https://docs.vllm.ai/en/latest/api/vllm/pooling_params/
sampling_params https://docs.vllm.ai/en/latest/api/vllm/sampling_params/
scalar_type https://docs.vllm.ai/en/latest/api/vllm/scalar_type/
scripts https://docs.vllm.ai/en/latest/api/vllm/scripts/
sequence https://docs.vllm.ai/en/latest/api/vllm/sequence/
tasks https://docs.vllm.ai/en/latest/api/vllm/tasks/
version https://docs.vllm.ai/en/latest/api/vllm/version/
assets https://docs.vllm.ai/en/latest/api/vllm/assets/
audio https://docs.vllm.ai/en/latest/api/vllm/assets/audio/
base https://docs.vllm.ai/en/latest/api/vllm/assets/base/
image https://docs.vllm.ai/en/latest/api/vllm/assets/image/
video https://docs.vllm.ai/en/latest/api/vllm/assets/video/
benchmarks https://docs.vllm.ai/en/latest/api/vllm/benchmarks/
datasets https://docs.vllm.ai/en/latest/api/vllm/benchmarks/datasets/
latency https://docs.vllm.ai/en/latest/api/vllm/benchmarks/latency/
mm_processor https://docs.vllm.ai/en/latest/api/vllm/benchmarks/mm_processor/
serve https://docs.vllm.ai/en/latest/api/vllm/benchmarks/serve/
startup https://docs.vllm.ai/en/latest/api/vllm/benchmarks/startup/
throughput https://docs.vllm.ai/en/latest/api/vllm/benchmarks/throughput/
lib https://docs.vllm.ai/en/latest/api/vllm/benchmarks/lib/
endpoint_request_func https://docs.vllm.ai/en/latest/api/vllm/benchmarks/lib/endpoint_request_func/
ready_checker https://docs.vllm.ai/en/latest/api/vllm/benchmarks/lib/ready_checker/
utils https://docs.vllm.ai/en/latest/api/vllm/benchmarks/lib/utils/
sweep https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/
cli https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/cli/
param_sweep https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/param_sweep/
plot https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/plot/
plot_pareto https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/plot_pareto/
serve https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/serve/
serve_sla https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/serve_sla/
server https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/server/
sla_sweep https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/sla_sweep/
startup https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/startup/
utils https://docs.vllm.ai/en/latest/api/vllm/benchmarks/sweep/utils/
compilation https://docs.vllm.ai/en/latest/api/vllm/compilation/
backends https://docs.vllm.ai/en/latest/api/vllm/compilation/backends/
base_static_graph https://docs.vllm.ai/en/latest/api/vllm/compilation/base_static_graph/
caching https://docs.vllm.ai/en/latest/api/vllm/compilation/caching/
compiler_interface https://docs.vllm.ai/en/latest/api/vllm/compilation/compiler_interface/
counter https://docs.vllm.ai/en/latest/api/vllm/compilation/counter/
cuda_graph https://docs.vllm.ai/en/latest/api/vllm/compilation/cuda_graph/
decorators https://docs.vllm.ai/en/latest/api/vllm/compilation/decorators/
monitor https://docs.vllm.ai/en/latest/api/vllm/compilation/monitor/
partition_rules https://docs.vllm.ai/en/latest/api/vllm/compilation/partition_rules/
piecewise_backend https://docs.vllm.ai/en/latest/api/vllm/compilation/piecewise_backend/
wrapper https://docs.vllm.ai/en/latest/api/vllm/compilation/wrapper/
passes https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/
fx_utils https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fx_utils/
inductor_pass https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/inductor_pass/
pass_manager https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/pass_manager/
vllm_inductor_pass https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/vllm_inductor_pass/
fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/
act_quant_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/act_quant_fusion/
allreduce_rms_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/allreduce_rms_fusion/
attn_quant_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/attn_quant_fusion/
collective_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/collective_fusion/
matcher_utils https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/matcher_utils/
qk_norm_rope_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/qk_norm_rope_fusion/
rms_quant_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/rms_quant_fusion/
rocm_aiter_fusion https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/rocm_aiter_fusion/
sequence_parallelism https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/fusion/sequence_parallelism/
utility https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/utility/
fix_functionalization https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/utility/fix_functionalization/
noop_elimination https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/utility/noop_elimination/
post_cleanup https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/utility/post_cleanup/
split_coalescing https://docs.vllm.ai/en/latest/api/vllm/compilation/passes/utility/split_coalescing/
config https://docs.vllm.ai/en/latest/api/vllm/config/
attention https://docs.vllm.ai/en/latest/api/vllm/config/attention/
cache https://docs.vllm.ai/en/latest/api/vllm/config/cache/
compilation https://docs.vllm.ai/en/latest/api/vllm/config/compilation/
device https://docs.vllm.ai/en/latest/api/vllm/config/device/
ec_transfer https://docs.vllm.ai/en/latest/api/vllm/config/ec_transfer/
kernel https://docs.vllm.ai/en/latest/api/vllm/config/kernel/
kv_events https://docs.vllm.ai/en/latest/api/vllm/config/kv_events/
kv_transfer https://docs.vllm.ai/en/latest/api/vllm/config/kv_transfer/
load https://docs.vllm.ai/en/latest/api/vllm/config/load/
lora https://docs.vllm.ai/en/latest/api/vllm/config/lora/
model https://docs.vllm.ai/en/latest/api/vllm/config/model/
model_arch https://docs.vllm.ai/en/latest/api/vllm/config/model_arch/
multimodal https://docs.vllm.ai/en/latest/api/vllm/config/multimodal/
observability https://docs.vllm.ai/en/latest/api/vllm/config/observability/
parallel https://docs.vllm.ai/en/latest/api/vllm/config/parallel/
pooler https://docs.vllm.ai/en/latest/api/vllm/config/pooler/
profiler https://docs.vllm.ai/en/latest/api/vllm/config/profiler/
scheduler https://docs.vllm.ai/en/latest/api/vllm/config/scheduler/
speculative https://docs.vllm.ai/en/latest/api/vllm/config/speculative/
speech_to_text https://docs.vllm.ai/en/latest/api/vllm/config/speech_to_text/
structured_outputs https://docs.vllm.ai/en/latest/api/vllm/config/structured_outputs/
utils https://docs.vllm.ai/en/latest/api/vllm/config/utils/
vllm https://docs.vllm.ai/en/latest/api/vllm/config/vllm/
weight_transfer https://docs.vllm.ai/en/latest/api/vllm/config/weight_transfer/
device_allocator https://docs.vllm.ai/en/latest/api/vllm/device_allocator/
cumem https://docs.vllm.ai/en/latest/api/vllm/device_allocator/cumem/
distributed https://docs.vllm.ai/en/latest/api/vllm/distributed/
communication_op https://docs.vllm.ai/en/latest/api/vllm/distributed/communication_op/
kv_events https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_events/
parallel_state https://docs.vllm.ai/en/latest/api/vllm/distributed/parallel_state/
utils https://docs.vllm.ai/en/latest/api/vllm/distributed/utils/
device_communicators https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/
all2all https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/all2all/
all_reduce_utils https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/all_reduce_utils/
base_device_communicator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/base_device_communicator/
cpu_communicator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/cpu_communicator/
cuda_communicator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/cuda_communicator/
cuda_wrapper https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/cuda_wrapper/
custom_all_reduce https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/custom_all_reduce/
mnnvl_compat https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/mnnvl_compat/
pynccl https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/pynccl/
pynccl_allocator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/pynccl_allocator/
pynccl_wrapper https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/pynccl_wrapper/
quick_all_reduce https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/quick_all_reduce/
ray_communicator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/ray_communicator/
shm_broadcast https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/shm_broadcast/
shm_object_storage https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/shm_object_storage/
symm_mem https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/symm_mem/
xpu_communicator https://docs.vllm.ai/en/latest/api/vllm/distributed/device_communicators/xpu_communicator/
ec_transfer https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/
ec_transfer_state https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/ec_transfer_state/
ec_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/ec_connector/
base https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/ec_connector/base/
example_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/ec_connector/example_connector/
factory https://docs.vllm.ai/en/latest/api/vllm/distributed/ec_transfer/ec_connector/factory/
eplb https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/
async_worker https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/async_worker/
eplb_state https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/eplb_state/
eplb_utils https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/eplb_utils/
rebalance_execute https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/rebalance_execute/
policy https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/policy/
abstract https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/policy/abstract/
default https://docs.vllm.ai/en/latest/api/vllm/distributed/eplb/policy/default/
kv_transfer https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/
kv_transfer_state https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_transfer_state/
kv_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/
base https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/base/
factory https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/factory/
utils https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/utils/
v1 https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/
base https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/base/
decode_bench_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/decode_bench_connector/
example_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/example_connector/
lmcache_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector/
lmcache_mp_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector/
metrics https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/metrics/
multi_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector/
nixl_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector/
offloading_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector/
lmcache_integration https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_integration/
multi_process_adapter https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_integration/multi_process_adapter/
utils https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_integration/utils/
vllm_v1_adapter https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_integration/vllm_v1_adapter/
mooncake https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/mooncake/
mooncake_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/mooncake/mooncake_connector/
mooncake_utils https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/mooncake/mooncake_utils/
moriio https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/moriio/
moriio_common https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_common/
moriio_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_connector/
moriio_engine https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/moriio/moriio_engine/
p2p https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/p2p/
p2p_nccl_connector https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector/
p2p_nccl_engine https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_engine/
tensor_memory_pool https://docs.vllm.ai/en/latest/api/vllm/distributed/kv_transfer/kv_connector/v1/p2p/tensor_memory_pool/
weight_transfer https://docs.vllm.ai/en/latest/api/vllm/distributed/weight_transfer/
base https://docs.vllm.ai/en/latest/api/vllm/distributed/weight_transfer/base/
factory https://docs.vllm.ai/en/latest/api/vllm/distributed/weight_transfer/factory/
nccl_engine https://docs.vllm.ai/en/latest/api/vllm/distributed/weight_transfer/nccl_engine/
packed_tensor https://docs.vllm.ai/en/latest/api/vllm/distributed/weight_transfer/packed_tensor/
engine https://docs.vllm.ai/en/latest/api/vllm/engine/
arg_utils https://docs.vllm.ai/en/latest/api/vllm/engine/arg_utils/
async_llm_engine https://docs.vllm.ai/en/latest/api/vllm/engine/async_llm_engine/
llm_engine https://docs.vllm.ai/en/latest/api/vllm/engine/llm_engine/
protocol https://docs.vllm.ai/en/latest/api/vllm/engine/protocol/
entrypoints https://docs.vllm.ai/en/latest/api/vllm/entrypoints/
api_server https://docs.vllm.ai/en/latest/api/vllm/entrypoints/api_server/
chat_utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/chat_utils/
constants https://docs.vllm.ai/en/latest/api/vllm/entrypoints/constants/
grpc_server https://docs.vllm.ai/en/latest/api/vllm/entrypoints/grpc_server/
launcher https://docs.vllm.ai/en/latest/api/vllm/entrypoints/launcher/
llm https://docs.vllm.ai/en/latest/api/vllm/entrypoints/llm/
logger https://docs.vllm.ai/en/latest/api/vllm/entrypoints/logger/
ssl https://docs.vllm.ai/en/latest/api/vllm/entrypoints/ssl/
utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/utils/
anthropic https://docs.vllm.ai/en/latest/api/vllm/entrypoints/anthropic/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/anthropic/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/anthropic/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/anthropic/serving/
cli https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/
collect_env https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/collect_env/
main https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/main/
openai https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/openai/
run_batch https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/run_batch/
serve https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/serve/
types https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/types/
benchmark https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/
base https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/base/
latency https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/latency/
main https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/main/
mm_processor https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/mm_processor/
serve https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/serve/
startup https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/startup/
sweep https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/sweep/
throughput https://docs.vllm.ai/en/latest/api/vllm/entrypoints/cli/benchmark/throughput/
mcp https://docs.vllm.ai/en/latest/api/vllm/entrypoints/mcp/
tool https://docs.vllm.ai/en/latest/api/vllm/entrypoints/mcp/tool/
tool_server https://docs.vllm.ai/en/latest/api/vllm/entrypoints/mcp/tool_server/
openai https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/
api_server https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/api_server/
cli_args https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/cli_args/
orca_metrics https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/orca_metrics/
run_batch https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/run_batch/
server_utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/server_utils/
utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/utils/
chat_completion https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/chat_completion/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/chat_completion/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/chat_completion/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/chat_completion/serving/
stream_harmony https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/chat_completion/stream_harmony/
completion https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/completion/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/completion/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/completion/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/completion/serving/
engine https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/engine/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/engine/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/engine/serving/
generate https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/generate/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/generate/api_router/
models https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/models/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/models/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/models/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/models/serving/
parser https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/parser/
harmony_utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/parser/harmony_utils/
responses_parser https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/parser/responses_parser/
realtime https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/api_router/
connection https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/connection/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/realtime/serving/
responses https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/api_router/
context https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/context/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/serving/
utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/responses/utils/
speech_to_text https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/speech_to_text/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/speech_to_text/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/speech_to_text/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/speech_to_text/serving/
speech_to_text https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/speech_to_text/speech_to_text/
translations https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/translations/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/translations/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/translations/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/translations/serving/
speech_to_text https://docs.vllm.ai/en/latest/api/vllm/entrypoints/openai/translations/speech_to_text/
pooling https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/
utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/utils/
base https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/base/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/base/protocol/
classify https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/classify/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/classify/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/classify/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/classify/serving/
embed https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/embed/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/embed/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/embed/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/embed/serving/
pooling https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/pooling/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/pooling/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/pooling/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/pooling/serving/
score https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/score/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/score/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/score/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/score/serving/
utils https://docs.vllm.ai/en/latest/api/vllm/entrypoints/pooling/score/utils/
sagemaker https://docs.vllm.ai/en/latest/api/vllm/entrypoints/sagemaker/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/sagemaker/api_router/
serve https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/
cache https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/cache/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/cache/api_router/
disagg https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/disagg/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/disagg/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/disagg/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/disagg/serving/
elastic_ep https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/elastic_ep/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/elastic_ep/api_router/
middleware https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/elastic_ep/middleware/
instrumentator https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/
basic https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/basic/
health https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/health/
metrics https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/metrics/
offline_docs https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/offline_docs/
server_info https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/instrumentator/server_info/
lora https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/lora/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/lora/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/lora/protocol/
profile https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/profile/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/profile/api_router/
rlhf https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/rlhf/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/rlhf/api_router/
rpc https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/rpc/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/rpc/api_router/
sleep https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/sleep/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/sleep/api_router/
tokenize https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/tokenize/
api_router https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/tokenize/api_router/
protocol https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/tokenize/protocol/
serving https://docs.vllm.ai/en/latest/api/vllm/entrypoints/serve/tokenize/serving/
grpc https://docs.vllm.ai/en/latest/api/vllm/grpc/
compile_protos https://docs.vllm.ai/en/latest/api/vllm/grpc/compile_protos/
inputs https://docs.vllm.ai/en/latest/api/vllm/inputs/
data https://docs.vllm.ai/en/latest/api/vllm/inputs/data/
parse https://docs.vllm.ai/en/latest/api/vllm/inputs/parse/
preprocess https://docs.vllm.ai/en/latest/api/vllm/inputs/preprocess/
kernels https://docs.vllm.ai/en/latest/api/vllm/kernels/
helion https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/
config_manager https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/config_manager/
register https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/register/
utils https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/utils/
ops https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/ops/
silu_mul_fp8 https://docs.vllm.ai/en/latest/api/vllm/kernels/helion/ops/silu_mul_fp8/
logging_utils https://docs.vllm.ai/en/latest/api/vllm/logging_utils/
access_log_filter https://docs.vllm.ai/en/latest/api/vllm/logging_utils/access_log_filter/
dump_input https://docs.vllm.ai/en/latest/api/vllm/logging_utils/dump_input/
formatter https://docs.vllm.ai/en/latest/api/vllm/logging_utils/formatter/
lazy https://docs.vllm.ai/en/latest/api/vllm/logging_utils/lazy/
log_time https://docs.vllm.ai/en/latest/api/vllm/logging_utils/log_time/
lora https://docs.vllm.ai/en/latest/api/vllm/lora/
lora_model https://docs.vllm.ai/en/latest/api/vllm/lora/lora_model/
lora_weights https://docs.vllm.ai/en/latest/api/vllm/lora/lora_weights/
model_manager https://docs.vllm.ai/en/latest/api/vllm/lora/model_manager/
peft_helper https://docs.vllm.ai/en/latest/api/vllm/lora/peft_helper/
request https://docs.vllm.ai/en/latest/api/vllm/lora/request/
resolver https://docs.vllm.ai/en/latest/api/vllm/lora/resolver/
utils https://docs.vllm.ai/en/latest/api/vllm/lora/utils/
worker_manager https://docs.vllm.ai/en/latest/api/vllm/lora/worker_manager/
layers https://docs.vllm.ai/en/latest/api/vllm/lora/layers/
base https://docs.vllm.ai/en/latest/api/vllm/lora/layers/base/
base_linear https://docs.vllm.ai/en/latest/api/vllm/lora/layers/base_linear/
column_parallel_linear https://docs.vllm.ai/en/latest/api/vllm/lora/layers/column_parallel_linear/
fused_moe https://docs.vllm.ai/en/latest/api/vllm/lora/layers/fused_moe/
logits_processor https://docs.vllm.ai/en/latest/api/vllm/lora/layers/logits_processor/
replicated_linear https://docs.vllm.ai/en/latest/api/vllm/lora/layers/replicated_linear/
row_parallel_linear https://docs.vllm.ai/en/latest/api/vllm/lora/layers/row_parallel_linear/
utils https://docs.vllm.ai/en/latest/api/vllm/lora/layers/utils/
vocal_parallel_embedding https://docs.vllm.ai/en/latest/api/vllm/lora/layers/vocal_parallel_embedding/
ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/
ipex_ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/ipex_ops/
lora_ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/ipex_ops/lora_ops/
torch_ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/torch_ops/
lora_ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/torch_ops/lora_ops/
triton_ops https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/
fused_moe_lora_op https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/fused_moe_lora_op/
kernel_utils https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/kernel_utils/
lora_expand_op https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/lora_expand_op/
lora_kernel_metadata https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/lora_kernel_metadata/
lora_shrink_op https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/lora_shrink_op/
utils https://docs.vllm.ai/en/latest/api/vllm/lora/ops/triton_ops/utils/
punica_wrapper https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/
punica_base https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/punica_base/
punica_cpu https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/punica_cpu/
punica_gpu https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/punica_gpu/
punica_selector https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/punica_selector/
punica_xpu https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/punica_xpu/
utils https://docs.vllm.ai/en/latest/api/vllm/lora/punica_wrapper/utils/
model_executor https://docs.vllm.ai/en/latest/api/vllm/model_executor/
custom_op https://docs.vllm.ai/en/latest/api/vllm/model_executor/custom_op/
parameter https://docs.vllm.ai/en/latest/api/vllm/model_executor/parameter/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/utils/
layers https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/
activation https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/activation/
attention_layer_base https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention_layer_base/
batch_invariant https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/batch_invariant/
conv https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/conv/
kda https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/kda/
layernorm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/layernorm/
lightning_attn https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/lightning_attn/
linear https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/linear/
logits_processor https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/logits_processor/
mla https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mla/
resampler https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/resampler/
sparse_attn_indexer https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/sparse_attn_indexer/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/utils/
vocab_parallel_embedding https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/vocab_parallel_embedding/
attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/
attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/attention/
chunked_local_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/chunked_local_attention/
cross_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/cross_attention/
encoder_only_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/encoder_only_attention/
kv_transfer_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/kv_transfer_utils/
mla_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/mla_attention/
mm_encoder_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/mm_encoder_attention/
static_sink_attention https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/attention/static_sink_attention/
fla https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/
ops https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/
chunk https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/chunk/
chunk_delta_h https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/chunk_delta_h/
chunk_o https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/chunk_o/
chunk_scaled_dot_kkt https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/chunk_scaled_dot_kkt/
cumsum https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/cumsum/
fused_recurrent https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/fused_recurrent/
index https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/index_py/
kda https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/kda/
l2norm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/l2norm/
layernorm_guard https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/layernorm_guard/
op https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/op/
solve_tril https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/solve_tril/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/utils/
wy_fast https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fla/ops/wy_fast/
fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/
activation https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/activation/
all2all_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/all2all_utils/
batched_deep_gemm_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe/
config https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/config/
cpu_fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/cpu_fused_moe/
cutlass_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/cutlass_moe/
deep_gemm_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/deep_gemm_moe/
deep_gemm_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/deep_gemm_utils/
deepep_ht_prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize/
deepep_ll_prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/deepep_ll_prepare_finalize/
fallback https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fallback/
flashinfer_a2a_prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/flashinfer_a2a_prepare_finalize/
flashinfer_cutedsl_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe/
flashinfer_cutlass_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe/
flashinfer_trtllm_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe/
fused_batched_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fused_batched_moe/
fused_marlin_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fused_marlin_moe/
fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fused_moe/
fused_moe_method_base https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fused_moe_method_base/
fused_moe_modular_method https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/fused_moe_modular_method/
gpt_oss_triton_kernels_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/gpt_oss_triton_kernels_moe/
layer https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/layer/
modular_kernel https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/modular_kernel/
moe_align_block_size https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/moe_align_block_size/
moe_permute_unpermute https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/moe_permute_unpermute/
mori_prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/mori_prepare_finalize/
pplx_prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/pplx_prepare_finalize/
prepare_finalize https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/prepare_finalize/
rocm_aiter_fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/rocm_aiter_fused_moe/
routed_experts_capturer https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/routed_experts_capturer/
shared_fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/shared_fused_moe/
topk_weight_and_reduce https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/topk_weight_and_reduce/
triton_cutlass_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/triton_cutlass_moe/
triton_deep_gemm_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/triton_deep_gemm_moe/
trtllm_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/trtllm_moe/
unquantized_fused_moe_method https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/utils/
xpu_fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/xpu_fused_moe/
zero_expert_fused_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/zero_expert_fused_moe/
oracle https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/oracle/
fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/oracle/fp8/
nvfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/oracle/nvfp4/
unquantized https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/oracle/unquantized/
router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/
base_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/base_router/
custom_routing_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/custom_routing_router/
fused_moe_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/fused_moe_router/
fused_topk_bias_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router/
fused_topk_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/fused_topk_router/
grouped_topk_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/grouped_topk_router/
router_factory https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/router_factory/
routing_simulator_router https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/router/routing_simulator_router/
runner https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/runner/
default_moe_runner https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/runner/default_moe_runner/
moe_runner https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/fused_moe/runner/moe_runner/
mamba https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/
abstract https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/abstract/
linear_attn https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/linear_attn/
mamba_mixer https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/mamba_mixer/
mamba_mixer2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/mamba_mixer2/
mamba_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/mamba_utils/
short_conv https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/short_conv/
ops https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/
causal_conv1d https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/causal_conv1d/
layernorm_gated https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/layernorm_gated/
mamba_ssm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/mamba_ssm/
ssd_bmm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/ssd_bmm/
ssd_chunk_scan https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/ssd_chunk_scan/
ssd_chunk_state https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/ssd_chunk_state/
ssd_combined https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/ssd_combined/
ssd_state_passing https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/mamba/ops/ssd_state_passing/
pooler https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/
abstract https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/abstract/
activations https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/activations/
common https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/common/
special https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/special/
seqwise https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/seqwise/
heads https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/seqwise/heads/
methods https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/seqwise/methods/
poolers https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/seqwise/poolers/
tokwise https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/tokwise/
heads https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/tokwise/heads/
methods https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/tokwise/methods/
poolers https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/pooler/tokwise/poolers/
quantization https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/
awq https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/awq/
awq_marlin https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/awq_marlin/
awq_triton https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/awq_triton/
base_config https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/base_config/
bitsandbytes https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/bitsandbytes/
cpu_wna16 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/cpu_wna16/
experts_int8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/experts_int8/
fbgemm_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/fbgemm_fp8/
fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/fp8/
fp_quant https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/fp_quant/
gguf https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/gguf/
gptq https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/gptq/
gptq_marlin https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/gptq_marlin/
inc https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/inc/
input_quant_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/input_quant_fp8/
kv_cache https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kv_cache/
modelopt https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/modelopt/
moe_wna16 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/moe_wna16/
mxfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/mxfp4/
petit https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/petit/
ptpc_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/ptpc_fp8/
qutlass_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/qutlass_utils/
schema https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/schema/
torchao https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/torchao/
compressed_tensors https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/
compressed_tensors https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors/
compressed_tensors_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe/
triton_scaled_mm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/triton_scaled_mm/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/utils/
schemes https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/
compressed_tensors_24 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_24/
compressed_tensors_scheme https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_scheme/
compressed_tensors_w4a4_nvfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4/
compressed_tensors_w4a8_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_fp8/
compressed_tensors_w4a8_int https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a8_int/
compressed_tensors_w4a16_mxfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_mxfp4/
compressed_tensors_w4a16_nvfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a16_nvfp4/
compressed_tensors_w8a8_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8/
compressed_tensors_w8a8_int8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_int8/
compressed_tensors_w8a16_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a16_fp8/
compressed_tensors_wNa16 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16/
transform https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/
linear https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/linear/
module https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/module/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/utils/
schemes https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/
linear_qutlass_nvfp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/compressed_tensors/transform/schemes/linear_qutlass_nvfp4/
kernels https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/
mixed_precision https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/
allspark https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/allspark/
conch https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/conch/
cpu https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu/
cutlass https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/cutlass/
dynamic_4bit https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/dynamic_4bit/
exllama https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/exllama/
MPLinearKernel https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel/
machete https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete/
marlin https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin/
xpu https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/mixed_precision/xpu/
scaled_mm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/
aiter https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/aiter/
cpu https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu/
cutlass https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass/
flashinfer https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/flashinfer/
pytorch https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch/
rocm https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/rocm/
ScaledMMLinearKernel https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/ScaledMMLinearKernel/
triton https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/triton/
xpu https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/kernels/scaled_mm/xpu/
quark https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/
quark https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/quark/
quark_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/quark_moe/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/utils/
schemes https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/schemes/
quark_ocp_mx https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/schemes/quark_ocp_mx/
quark_scheme https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/schemes/quark_scheme/
quark_w8a8_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_fp8/
quark_w8a8_int8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/quark/schemes/quark_w8a8_int8/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/
allspark_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/allspark_utils/
flashinfer_fp4_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe/
flashinfer_mxint4_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/flashinfer_mxint4_moe/
flashinfer_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/flashinfer_utils/
fp8_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/fp8_utils/
gptq_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/gptq_utils/
int8_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/int8_utils/
layer_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/layer_utils/
machete_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/machete_utils/
marlin_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/marlin_utils/
marlin_utils_fp4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/marlin_utils_fp4/
marlin_utils_fp8 https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8/
marlin_utils_test https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/marlin_utils_test/
mxfp4_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/mxfp4_utils/
mxfp6_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/mxfp6_utils/
mxfp8_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/mxfp8_utils/
nvfp4_emulation_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils/
nvfp4_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/nvfp4_utils/
ocp_mx_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/ocp_mx_utils/
petit_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/petit_utils/
quant_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/quant_utils/
w8a8_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/utils/w8a8_utils/
rotary_embedding https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/
base https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/base/
common https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/common/
deepseek_scaling_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope/
dual_chunk_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/dual_chunk_rope/
dynamic_ntk_alpha_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_alpha_rope/
dynamic_ntk_scaling_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/dynamic_ntk_scaling_rope/
ernie45_vl_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/ernie45_vl_rope/
fope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/fope/
linear_scaling_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/linear_scaling_rope/
llama3_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/llama3_rope/
llama4_vision_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/llama4_vision_rope/
mrope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/mrope/
mrope_interleaved https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/mrope_interleaved/
ntk_scaling_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/ntk_scaling_rope/
phi3_long_rope_scaled_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/phi3_long_rope_scaled_rope/
xdrope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/xdrope/
yarn_scaling_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/rotary_embedding/yarn_scaling_rope/
model_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/
base_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/base_loader/
bitsandbytes_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/bitsandbytes_loader/
default_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/default_loader/
dummy_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/dummy_loader/
gguf_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/gguf_loader/
runai_streamer_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/runai_streamer_loader/
sharded_state_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/sharded_state_loader/
tensorizer https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/tensorizer/
tensorizer_loader https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/tensorizer_loader/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/utils/
weight_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/weight_utils/
reload https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/
layerwise https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/layerwise/
meta https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/meta/
sanitize https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/sanitize/
torchao_decorator https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/torchao_decorator/
types https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/types/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/model_loader/reload/utils/
models https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/
adapters https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/adapters/
afmoe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/afmoe/
aimv2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/aimv2/
apertus https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/apertus/
arcee https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/arcee/
arctic https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/arctic/
aria https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/aria/
audioflamingo3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/audioflamingo3/
aya_vision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/aya_vision/
bagel https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bagel/
baichuan https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/baichuan/
bailing_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bailing_moe/
bamba https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bamba/
bee https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bee/
bert https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bert/
bert_with_rope https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bert_with_rope/
blip https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/blip/
blip2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/blip2/
bloom https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/bloom/
chameleon https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/chameleon/
chatglm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/chatglm/
clip https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/clip/
cohere2_vision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/cohere2_vision/
colbert https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/colbert/
colqwen3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/colqwen3/
commandr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/commandr/
config https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/config/
dbrx https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/dbrx/
deepencoder https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepencoder/
deepencoder2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepencoder2/
deepseek_eagle https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_eagle/
deepseek_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_mtp/
deepseek_ocr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_ocr/
deepseek_ocr2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_ocr2/
deepseek_v2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_v2/
deepseek_vl2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/deepseek_vl2/
dots1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/dots1/
dots_ocr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/dots_ocr/
eagle2_5_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/eagle2_5_vl/
ernie45 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ernie45/
ernie45_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ernie45_moe/
ernie45_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ernie45_vl/
ernie45_vl_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ernie45_vl_moe/
ernie_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ernie_mtp/
exaone https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/exaone/
exaone4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/exaone4/
exaone_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/exaone_moe/
exaone_moe_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/exaone_moe_mtp/
fairseq2_llama https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/fairseq2_llama/
falcon https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/falcon/
falcon_h1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/falcon_h1/
flex_olmo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/flex_olmo/
funasr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/funasr/
funaudiochat https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/funaudiochat/
fuyu https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/fuyu/
gemma https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma/
gemma2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma2/
gemma3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma3/
gemma3_mm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma3_mm/
gemma3n https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma3n/
gemma3n_audio_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma3n_audio_utils/
gemma3n_mm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gemma3n_mm/
glm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm/
glm4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4/
glm4_1v https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4_1v/
glm4_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4_moe/
glm4_moe_lite https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4_moe_lite/
glm4_moe_lite_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4_moe_lite_mtp/
glm4_moe_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4_moe_mtp/
glm4v https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm4v/
glm_ocr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm_ocr/
glm_ocr_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glm_ocr_mtp/
glmasr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glmasr/
glmasr_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/glmasr_utils/
gpt2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gpt2/
gpt_bigcode https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gpt_bigcode/
gpt_j https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gpt_j/
gpt_neox https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gpt_neox/
gpt_oss https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gpt_oss/
granite https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/granite/
granite_speech https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/granite_speech/
granitemoe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/granitemoe/
granitemoehybrid https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/granitemoehybrid/
granitemoeshared https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/granitemoeshared/
gritlm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/gritlm/
grok1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/grok1/
h2ovl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/h2ovl/
hunyuan_v1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/hunyuan_v1/
hunyuan_vision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/hunyuan_vision/
hyperclovax_vision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/hyperclovax_vision/
idefics2_vision_model https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/idefics2_vision_model/
idefics3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/idefics3/
interfaces https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/interfaces/
interfaces_base https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/interfaces_base/
intern_vit https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/intern_vit/
internlm2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/internlm2/
internlm2_ve https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/internlm2_ve/
interns1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/interns1/
interns1_pro https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/interns1_pro/
interns1_vit https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/interns1_vit/
internvl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/internvl/
iquest_loopcoder https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/iquest_loopcoder/
isaac https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/isaac/
jais https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/jais/
jais2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/jais2/
jamba https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/jamba/
jina_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/jina_vl/
kanana_v https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/kanana_v/
keye https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/keye/
keye_vl1_5 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/keye_vl1_5/
kimi_k25 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/kimi_k25/
kimi_k25_vit https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/kimi_k25_vit/
kimi_linear https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/kimi_linear/
kimi_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/kimi_vl/
lfm2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/lfm2/
lfm2_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/lfm2_moe/
lfm2_siglip2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/lfm2_siglip2/
lfm2_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/lfm2_vl/
lightonocr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/lightonocr/
llama https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llama/
llama4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llama4/
llama4_eagle https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llama4_eagle/
llama_eagle https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llama_eagle/
llama_eagle3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llama_eagle3/
llava https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llava/
llava_next https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llava_next/
llava_next_video https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llava_next_video/
llava_onevision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/llava_onevision/
longcat_flash https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/longcat_flash/
longcat_flash_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/longcat_flash_mtp/
mamba https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mamba/
mamba2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mamba2/
medusa https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/medusa/
midashenglm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/midashenglm/
mimo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mimo/
mimo_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mimo_mtp/
mimo_v2_flash https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mimo_v2_flash/
minicpm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minicpm/
minicpm3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minicpm3/
minicpm_eagle https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minicpm_eagle/
minicpmo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minicpmo/
minicpmv https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minicpmv/
minimax_m2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minimax_m2/
minimax_text_01 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minimax_text_01/
minimax_vl_01 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/minimax_vl_01/
mistral https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mistral/
mistral3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mistral3/
mistral_large_3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mistral_large_3/
mistral_large_3_eagle https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mistral_large_3_eagle/
mixtral https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mixtral/
mllama4 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mllama4/
mlp_speculator https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mlp_speculator/
modernbert https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/modernbert/
module_mapping https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/module_mapping/
molmo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/molmo/
molmo2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/molmo2/
moonvit https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/moonvit/
mpt https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/mpt/
musicflamingo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/musicflamingo/
nano_nemotron_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nano_nemotron_vl/
nemotron https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nemotron/
nemotron_h https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nemotron_h/
nemotron_nas https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nemotron_nas/
nemotron_parse https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nemotron_parse/
nemotron_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nemotron_vl/
nvlm_d https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/nvlm_d/
olmo https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/olmo/
olmo2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/olmo2/
olmoe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/olmoe/
opencua https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/opencua/
openpangu https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/openpangu/
openpangu_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/openpangu_mtp/
openpangu_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/openpangu_vl/
opt https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/opt/
orion https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/orion/
ouro https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ouro/
ovis https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ovis/
ovis2_5 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ovis2_5/
paddleocr_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/paddleocr_vl/
paligemma https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/paligemma/
persimmon https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/persimmon/
phi https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi/
phi3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi3/
phi3v https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi3v/
phi4mm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi4mm/
phi4mm_audio https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi4mm_audio/
phi4mm_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phi4mm_utils/
phimoe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/phimoe/
pixtral https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/pixtral/
plamo2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/plamo2/
plamo3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/plamo3/
qwen https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen/
qwen2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2/
qwen2_5_omni_thinker https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_5_omni_thinker/
qwen2_5_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_5_vl/
qwen2_audio https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_audio/
qwen2_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_moe/
qwen2_rm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_rm/
qwen2_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen2_vl/
qwen3 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3/
qwen3_5 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_5/
qwen3_5_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_5_mtp/
qwen3_asr https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_asr/
qwen3_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_moe/
qwen3_next https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_next/
qwen3_next_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_next_mtp/
qwen3_omni_moe_thinker https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_omni_moe_thinker/
qwen3_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_vl/
qwen3_vl_moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen3_vl_moe/
qwen_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/qwen_vl/
radio https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/radio/
registry https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/registry/
roberta https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/roberta/
rvl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/rvl/
seed_oss https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/seed_oss/
siglip https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/siglip/
siglip2navit https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/siglip2navit/
skyworkr1v https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/skyworkr1v/
smolvlm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/smolvlm/
solar https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/solar/
stablelm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/stablelm/
starcoder2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/starcoder2/
step1 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step1/
step3_text https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step3_text/
step3_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step3_vl/
step3p5 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step3p5/
step3p5_mtp https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step3p5_mtp/
step_vl https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/step_vl/
swin https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/swin/
tarsier https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/tarsier/
telechat2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/telechat2/
teleflm https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/teleflm/
terratorch https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/terratorch/
ultravox https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/ultravox/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/utils/
vision https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/vision/
voxtral https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/voxtral/
voxtral_realtime https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/voxtral_realtime/
voyage https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/voyage/
whisper https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/whisper/
whisper_causal https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/whisper_causal/
whisper_utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/whisper_utils/
zamba2 https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/zamba2/
transformers https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/
base https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/base/
causal https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/causal/
legacy https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/legacy/
moe https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/moe/
multimodal https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/multimodal/
pooling https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/pooling/
utils https://docs.vllm.ai/en/latest/api/vllm/model_executor/models/transformers/utils/
warmup https://docs.vllm.ai/en/latest/api/vllm/model_executor/warmup/
deep_gemm_warmup https://docs.vllm.ai/en/latest/api/vllm/model_executor/warmup/deep_gemm_warmup/
kernel_warmup https://docs.vllm.ai/en/latest/api/vllm/model_executor/warmup/kernel_warmup/
multimodal https://docs.vllm.ai/en/latest/api/vllm/multimodal/
audio https://docs.vllm.ai/en/latest/api/vllm/multimodal/audio/
cache https://docs.vllm.ai/en/latest/api/vllm/multimodal/cache/
encoder_budget https://docs.vllm.ai/en/latest/api/vllm/multimodal/encoder_budget/
evs https://docs.vllm.ai/en/latest/api/vllm/multimodal/evs/
hasher https://docs.vllm.ai/en/latest/api/vllm/multimodal/hasher/
image https://docs.vllm.ai/en/latest/api/vllm/multimodal/image/
inputs https://docs.vllm.ai/en/latest/api/vllm/multimodal/inputs/
parse https://docs.vllm.ai/en/latest/api/vllm/multimodal/parse/
registry https://docs.vllm.ai/en/latest/api/vllm/multimodal/registry/
utils https://docs.vllm.ai/en/latest/api/vllm/multimodal/utils/
video https://docs.vllm.ai/en/latest/api/vllm/multimodal/video/
media https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/
audio https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/audio/
base https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/base/
connector https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/connector/
image https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/image/
video https://docs.vllm.ai/en/latest/api/vllm/multimodal/media/video/
processing https://docs.vllm.ai/en/latest/api/vllm/multimodal/processing/
context https://docs.vllm.ai/en/latest/api/vllm/multimodal/processing/context/
dummy_inputs https://docs.vllm.ai/en/latest/api/vllm/multimodal/processing/dummy_inputs/
processor https://docs.vllm.ai/en/latest/api/vllm/multimodal/processing/processor/
parser https://docs.vllm.ai/en/latest/api/vllm/parser/
abstract_parser https://docs.vllm.ai/en/latest/api/vllm/parser/abstract_parser/
minimax_m2_parser https://docs.vllm.ai/en/latest/api/vllm/parser/minimax_m2_parser/
parser_manager https://docs.vllm.ai/en/latest/api/vllm/parser/parser_manager/
platforms https://docs.vllm.ai/en/latest/api/vllm/platforms/
cpu https://docs.vllm.ai/en/latest/api/vllm/platforms/cpu/
cuda https://docs.vllm.ai/en/latest/api/vllm/platforms/cuda/
interface https://docs.vllm.ai/en/latest/api/vllm/platforms/interface/
rocm https://docs.vllm.ai/en/latest/api/vllm/platforms/rocm/
tpu https://docs.vllm.ai/en/latest/api/vllm/platforms/tpu/
xpu https://docs.vllm.ai/en/latest/api/vllm/platforms/xpu/
plugins https://docs.vllm.ai/en/latest/api/vllm/plugins/
io_processors https://docs.vllm.ai/en/latest/api/vllm/plugins/io_processors/
interface https://docs.vllm.ai/en/latest/api/vllm/plugins/io_processors/interface/
lora_resolvers https://docs.vllm.ai/en/latest/api/vllm/plugins/lora_resolvers/
filesystem_resolver https://docs.vllm.ai/en/latest/api/vllm/plugins/lora_resolvers/filesystem_resolver/
hf_hub_resolver https://docs.vllm.ai/en/latest/api/vllm/plugins/lora_resolvers/hf_hub_resolver/
profiler https://docs.vllm.ai/en/latest/api/vllm/profiler/
layerwise_profile https://docs.vllm.ai/en/latest/api/vllm/profiler/layerwise_profile/
utils https://docs.vllm.ai/en/latest/api/vllm/profiler/utils/
wrapper https://docs.vllm.ai/en/latest/api/vllm/profiler/wrapper/
ray https://docs.vllm.ai/en/latest/api/vllm/ray/
lazy_utils https://docs.vllm.ai/en/latest/api/vllm/ray/lazy_utils/
ray_env https://docs.vllm.ai/en/latest/api/vllm/ray/ray_env/
reasoning https://docs.vllm.ai/en/latest/api/vllm/reasoning/
abs_reasoning_parsers https://docs.vllm.ai/en/latest/api/vllm/reasoning/abs_reasoning_parsers/
basic_parsers https://docs.vllm.ai/en/latest/api/vllm/reasoning/basic_parsers/
deepseek_r1_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/deepseek_r1_reasoning_parser/
deepseek_v3_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/deepseek_v3_reasoning_parser/
ernie45_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/ernie45_reasoning_parser/
gptoss_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/gptoss_reasoning_parser/
granite_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/granite_reasoning_parser/
hunyuan_a13b_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/hunyuan_a13b_reasoning_parser/
identity_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/identity_reasoning_parser/
minimax_m2_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/minimax_m2_reasoning_parser/
mistral_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/mistral_reasoning_parser/
olmo3_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/olmo3_reasoning_parser/
qwen3_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/qwen3_reasoning_parser/
seedoss_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/seedoss_reasoning_parser/
step3_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/step3_reasoning_parser/
step3p5_reasoning_parser https://docs.vllm.ai/en/latest/api/vllm/reasoning/step3p5_reasoning_parser/
renderers https://docs.vllm.ai/en/latest/api/vllm/renderers/
base https://docs.vllm.ai/en/latest/api/vllm/renderers/base/
deepseek_v32 https://docs.vllm.ai/en/latest/api/vllm/renderers/deepseek_v32/
embed_utils https://docs.vllm.ai/en/latest/api/vllm/renderers/embed_utils/
grok2 https://docs.vllm.ai/en/latest/api/vllm/renderers/grok2/
hf https://docs.vllm.ai/en/latest/api/vllm/renderers/hf/
mistral https://docs.vllm.ai/en/latest/api/vllm/renderers/mistral/
params https://docs.vllm.ai/en/latest/api/vllm/renderers/params/
registry https://docs.vllm.ai/en/latest/api/vllm/renderers/registry/
terratorch https://docs.vllm.ai/en/latest/api/vllm/renderers/terratorch/
inputs https://docs.vllm.ai/en/latest/api/vllm/renderers/inputs/
preprocess https://docs.vllm.ai/en/latest/api/vllm/renderers/inputs/preprocess/
tokenize https://docs.vllm.ai/en/latest/api/vllm/renderers/inputs/tokenize/
tokenizers https://docs.vllm.ai/en/latest/api/vllm/tokenizers/
deepseek_v32 https://docs.vllm.ai/en/latest/api/vllm/tokenizers/deepseek_v32/
deepseek_v32_encoding https://docs.vllm.ai/en/latest/api/vllm/tokenizers/deepseek_v32_encoding/
detokenizer_utils https://docs.vllm.ai/en/latest/api/vllm/tokenizers/detokenizer_utils/
grok2 https://docs.vllm.ai/en/latest/api/vllm/tokenizers/grok2/
hf https://docs.vllm.ai/en/latest/api/vllm/tokenizers/hf/
mistral https://docs.vllm.ai/en/latest/api/vllm/tokenizers/mistral/
protocol https://docs.vllm.ai/en/latest/api/vllm/tokenizers/protocol/
registry https://docs.vllm.ai/en/latest/api/vllm/tokenizers/registry/
tool_parsers https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/
abstract_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/abstract_tool_parser/
deepseekv3_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/deepseekv3_tool_parser/
deepseekv31_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/deepseekv31_tool_parser/
deepseekv32_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/deepseekv32_tool_parser/
ernie45_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/ernie45_tool_parser/
functiongemma_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/functiongemma_tool_parser/
gigachat3_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/gigachat3_tool_parser/
glm4_moe_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/glm4_moe_tool_parser/
glm47_moe_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/glm47_moe_tool_parser/
granite_20b_fc_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/granite_20b_fc_tool_parser/
granite_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/granite_tool_parser/
hermes_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/hermes_tool_parser/
hunyuan_a13b_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/hunyuan_a13b_tool_parser/
internlm2_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/internlm2_tool_parser/
jamba_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/jamba_tool_parser/
kimi_k2_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/kimi_k2_tool_parser/
llama4_pythonic_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/llama4_pythonic_tool_parser/
llama_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/llama_tool_parser/
longcat_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/longcat_tool_parser/
minimax_m2_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/minimax_m2_tool_parser/
minimax_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/minimax_tool_parser/
mistral_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/mistral_tool_parser/
olmo3_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/olmo3_tool_parser/
openai_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/openai_tool_parser/
phi4mini_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/phi4mini_tool_parser/
pythonic_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/pythonic_tool_parser/
qwen3coder_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/qwen3coder_tool_parser/
qwen3xml_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/qwen3xml_tool_parser/
seed_oss_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/seed_oss_tool_parser/
step3_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/step3_tool_parser/
step3p5_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/step3p5_tool_parser/
utils https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/utils/
xlam_tool_parser https://docs.vllm.ai/en/latest/api/vllm/tool_parsers/xlam_tool_parser/
tracing https://docs.vllm.ai/en/latest/api/vllm/tracing/
otel https://docs.vllm.ai/en/latest/api/vllm/tracing/otel/
utils https://docs.vllm.ai/en/latest/api/vllm/tracing/utils/
transformers_utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/
config https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/config/
config_parser_base https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/config_parser_base/
dynamic_module https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/dynamic_module/
gguf_utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/gguf_utils/
model_arch_config_convertor https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/model_arch_config_convertor/
processor https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processor/
repo_utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/repo_utils/
runai_utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/runai_utils/
s3_utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/s3_utils/
tokenizer https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/tokenizer/
utils https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/utils/
chat_templates https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/chat_templates/
registry https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/chat_templates/registry/
configs https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/
afmoe https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/afmoe/
arctic https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/arctic/
bagel https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/bagel/
chatglm https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/chatglm/
colqwen3 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/colqwen3/
deepseek_vl2 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/deepseek_vl2/
dotsocr https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/dotsocr/
eagle https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/eagle/
falcon https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/falcon/
flex_olmo https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/flex_olmo/
funaudiochat https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/funaudiochat/
hunyuan_vl https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/hunyuan_vl/
isaac https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/isaac/
jais https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/jais/
kimi_k25 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/kimi_k25/
kimi_linear https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/kimi_linear/
kimi_vl https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/kimi_vl/
lfm2_moe https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/lfm2_moe/
medusa https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/medusa/
midashenglm https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/midashenglm/
mistral https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/mistral/
mlp_speculator https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/mlp_speculator/
moonvit https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/moonvit/
nemotron https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/nemotron/
nemotron_h https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/nemotron_h/
olmo3 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/olmo3/
ovis https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/ovis/
qwen3_5 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/qwen3_5/
qwen3_5_moe https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/qwen3_5_moe/
qwen3_asr https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/qwen3_asr/
qwen3_next https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/qwen3_next/
radio https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/radio/
step3_vl https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/step3_vl/
step3p5 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/step3p5/
tarsier2 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/tarsier2/
ultravox https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/ultravox/
speculators https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/speculators/
algos https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/speculators/algos/
base https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/configs/speculators/base/
processors https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/
bagel https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/bagel/
deepseek_ocr https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/deepseek_ocr/
deepseek_vl2 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/deepseek_vl2/
funasr_processor https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/funasr_processor/
hunyuan_vl https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/hunyuan_vl/
hunyuan_vl_image https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/hunyuan_vl_image/
ovis https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/ovis/
ovis2_5 https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/ovis2_5/
qwen3_asr https://docs.vllm.ai/en/latest/api/vllm/transformers_utils/processors/qwen3_asr/
triton_utils https://docs.vllm.ai/en/latest/api/vllm/triton_utils/
importing https://docs.vllm.ai/en/latest/api/vllm/triton_utils/importing/
usage https://docs.vllm.ai/en/latest/api/vllm/usage/
usage_lib https://docs.vllm.ai/en/latest/api/vllm/usage/usage_lib/
utils https://docs.vllm.ai/en/latest/api/vllm/utils/
argparse_utils https://docs.vllm.ai/en/latest/api/vllm/utils/argparse_utils/
async_utils https://docs.vllm.ai/en/latest/api/vllm/utils/async_utils/
cache https://docs.vllm.ai/en/latest/api/vllm/utils/cache/
collection_utils https://docs.vllm.ai/en/latest/api/vllm/utils/collection_utils/
counter https://docs.vllm.ai/en/latest/api/vllm/utils/counter/
deep_gemm https://docs.vllm.ai/en/latest/api/vllm/utils/deep_gemm/
flashinfer https://docs.vllm.ai/en/latest/api/vllm/utils/flashinfer/
func_utils https://docs.vllm.ai/en/latest/api/vllm/utils/func_utils/
gc_utils https://docs.vllm.ai/en/latest/api/vllm/utils/gc_utils/
hashing https://docs.vllm.ai/en/latest/api/vllm/utils/hashing/
import_utils https://docs.vllm.ai/en/latest/api/vllm/utils/import_utils/
jsontree https://docs.vllm.ai/en/latest/api/vllm/utils/jsontree/
math_utils https://docs.vllm.ai/en/latest/api/vllm/utils/math_utils/
mem_constants https://docs.vllm.ai/en/latest/api/vllm/utils/mem_constants/
mem_utils https://docs.vllm.ai/en/latest/api/vllm/utils/mem_utils/
nccl https://docs.vllm.ai/en/latest/api/vllm/utils/nccl/
network_utils https://docs.vllm.ai/en/latest/api/vllm/utils/network_utils/
nvtx_pytorch_hooks https://docs.vllm.ai/en/latest/api/vllm/utils/nvtx_pytorch_hooks/
platform_utils https://docs.vllm.ai/en/latest/api/vllm/utils/platform_utils/
print_utils https://docs.vllm.ai/en/latest/api/vllm/utils/print_utils/
profiling https://docs.vllm.ai/en/latest/api/vllm/utils/profiling/
registry https://docs.vllm.ai/en/latest/api/vllm/utils/registry/
serial_utils https://docs.vllm.ai/en/latest/api/vllm/utils/serial_utils/
system_utils https://docs.vllm.ai/en/latest/api/vllm/utils/system_utils/
tensor_schema https://docs.vllm.ai/en/latest/api/vllm/utils/tensor_schema/
torch_utils https://docs.vllm.ai/en/latest/api/vllm/utils/torch_utils/
tqdm_utils https://docs.vllm.ai/en/latest/api/vllm/utils/tqdm_utils/
v1 https://docs.vllm.ai/en/latest/api/vllm/v1/
cudagraph_dispatcher https://docs.vllm.ai/en/latest/api/vllm/v1/cudagraph_dispatcher/
kv_cache_interface https://docs.vllm.ai/en/latest/api/vllm/v1/kv_cache_interface/
outputs https://docs.vllm.ai/en/latest/api/vllm/v1/outputs/
request https://docs.vllm.ai/en/latest/api/vllm/v1/request/
serial_utils https://docs.vllm.ai/en/latest/api/vllm/v1/serial_utils/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/utils/
attention https://docs.vllm.ai/en/latest/api/vllm/v1/attention/
backend https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backend/
selector https://docs.vllm.ai/en/latest/api/vllm/v1/attention/selector/
backends https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/
cpu_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/cpu_attn/
fa_utils https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/fa_utils/
flash_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/flash_attn/
flash_attn_diffkv https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/flash_attn_diffkv/
flashinfer https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/flashinfer/
flex_attention https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/flex_attention/
gdn_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/gdn_attn/
linear_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/linear_attn/
mamba1_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mamba1_attn/
mamba2_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mamba2_attn/
mamba_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mamba_attn/
registry https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/registry/
rocm_aiter_fa https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/rocm_aiter_fa/
rocm_aiter_unified_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/rocm_aiter_unified_attn/
rocm_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/rocm_attn/
short_conv_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/short_conv_attn/
tree_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/tree_attn/
triton_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/triton_attn/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/utils/
mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/
aiter_triton_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/aiter_triton_mla/
cutlass_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/cutlass_mla/
flashattn_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/flashattn_mla/
flashinfer_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/flashinfer_mla/
flashinfer_mla_sparse https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/flashinfer_mla_sparse/
flashmla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/flashmla/
flashmla_sparse https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/flashmla_sparse/
indexer https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/indexer/
rocm_aiter_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/rocm_aiter_mla/
rocm_aiter_mla_sparse https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/rocm_aiter_mla_sparse/
sparse_utils https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/sparse_utils/
triton_mla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/backends/mla/triton_mla/
ops https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/
chunked_prefill_paged_decode https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/chunked_prefill_paged_decode/
common https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/common/
flashmla https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/flashmla/
merge_attn_states https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/merge_attn_states/
paged_attn https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/paged_attn/
prefix_prefill https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/prefix_prefill/
rocm_aiter_mla_sparse https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/rocm_aiter_mla_sparse/
triton_decode_attention https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/triton_decode_attention/
triton_merge_attn_states https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/triton_merge_attn_states/
triton_prefill_attention https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/triton_prefill_attention/
triton_reshape_and_cache_flash https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/triton_reshape_and_cache_flash/
triton_unified_attention https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/triton_unified_attention/
vit_attn_wrappers https://docs.vllm.ai/en/latest/api/vllm/v1/attention/ops/vit_attn_wrappers/
core https://docs.vllm.ai/en/latest/api/vllm/v1/core/
block_pool https://docs.vllm.ai/en/latest/api/vllm/v1/core/block_pool/
encoder_cache_manager https://docs.vllm.ai/en/latest/api/vllm/v1/core/encoder_cache_manager/
kv_cache_coordinator https://docs.vllm.ai/en/latest/api/vllm/v1/core/kv_cache_coordinator/
kv_cache_manager https://docs.vllm.ai/en/latest/api/vllm/v1/core/kv_cache_manager/
kv_cache_metrics https://docs.vllm.ai/en/latest/api/vllm/v1/core/kv_cache_metrics/
kv_cache_utils https://docs.vllm.ai/en/latest/api/vllm/v1/core/kv_cache_utils/
single_type_kv_cache_manager https://docs.vllm.ai/en/latest/api/vllm/v1/core/single_type_kv_cache_manager/
sched https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/
async_scheduler https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/async_scheduler/
interface https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/interface/
output https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/output/
request_queue https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/request_queue/
scheduler https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/scheduler/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/core/sched/utils/
engine https://docs.vllm.ai/en/latest/api/vllm/v1/engine/
async_llm https://docs.vllm.ai/en/latest/api/vllm/v1/engine/async_llm/
coordinator https://docs.vllm.ai/en/latest/api/vllm/v1/engine/coordinator/
core https://docs.vllm.ai/en/latest/api/vllm/v1/engine/core/
core_client https://docs.vllm.ai/en/latest/api/vllm/v1/engine/core_client/
detokenizer https://docs.vllm.ai/en/latest/api/vllm/v1/engine/detokenizer/
exceptions https://docs.vllm.ai/en/latest/api/vllm/v1/engine/exceptions/
input_processor https://docs.vllm.ai/en/latest/api/vllm/v1/engine/input_processor/
llm_engine https://docs.vllm.ai/en/latest/api/vllm/v1/engine/llm_engine/
logprobs https://docs.vllm.ai/en/latest/api/vllm/v1/engine/logprobs/
output_processor https://docs.vllm.ai/en/latest/api/vllm/v1/engine/output_processor/
parallel_sampling https://docs.vllm.ai/en/latest/api/vllm/v1/engine/parallel_sampling/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/engine/utils/
executor https://docs.vllm.ai/en/latest/api/vllm/v1/executor/
abstract https://docs.vllm.ai/en/latest/api/vllm/v1/executor/abstract/
multiproc_executor https://docs.vllm.ai/en/latest/api/vllm/v1/executor/multiproc_executor/
ray_distributed_executor https://docs.vllm.ai/en/latest/api/vllm/v1/executor/ray_distributed_executor/
ray_executor https://docs.vllm.ai/en/latest/api/vllm/v1/executor/ray_executor/
ray_utils https://docs.vllm.ai/en/latest/api/vllm/v1/executor/ray_utils/
uniproc_executor https://docs.vllm.ai/en/latest/api/vllm/v1/executor/uniproc_executor/
kv_offload https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/
abstract https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/abstract/
arc_manager https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/arc_manager/
backend https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/backend/
cpu https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/cpu/
factory https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/factory/
lru_manager https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/lru_manager/
mediums https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/mediums/
spec https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/spec/
backends https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/backends/
cpu https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/backends/cpu/
worker https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/worker/
cpu_gpu https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/worker/cpu_gpu/
worker https://docs.vllm.ai/en/latest/api/vllm/v1/kv_offload/worker/worker/
metrics https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/
loggers https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/loggers/
perf https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/perf/
prometheus https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/prometheus/
ray_wrappers https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/ray_wrappers/
reader https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/reader/
stats https://docs.vllm.ai/en/latest/api/vllm/v1/metrics/stats/
pool https://docs.vllm.ai/en/latest/api/vllm/v1/pool/
metadata https://docs.vllm.ai/en/latest/api/vllm/v1/pool/metadata/
sample https://docs.vllm.ai/en/latest/api/vllm/v1/sample/
metadata https://docs.vllm.ai/en/latest/api/vllm/v1/sample/metadata/
rejection_sampler https://docs.vllm.ai/en/latest/api/vllm/v1/sample/rejection_sampler/
sampler https://docs.vllm.ai/en/latest/api/vllm/v1/sample/sampler/
logits_processor https://docs.vllm.ai/en/latest/api/vllm/v1/sample/logits_processor/
builtin https://docs.vllm.ai/en/latest/api/vllm/v1/sample/logits_processor/builtin/
interface https://docs.vllm.ai/en/latest/api/vllm/v1/sample/logits_processor/interface/
state https://docs.vllm.ai/en/latest/api/vllm/v1/sample/logits_processor/state/
ops https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/
bad_words https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/bad_words/
logprobs https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/logprobs/
penalties https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/penalties/
topk_topp_sampler https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/topk_topp_sampler/
topk_topp_triton https://docs.vllm.ai/en/latest/api/vllm/v1/sample/ops/topk_topp_triton/
spec_decode https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/
draft_model https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/draft_model/
eagle https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/eagle/
medusa https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/medusa/
metadata https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/metadata/
metrics https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/metrics/
ngram_proposer https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/ngram_proposer/
suffix_decoding https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/suffix_decoding/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/spec_decode/utils/
structured_output https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/
backend_guidance https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/backend_guidance/
backend_lm_format_enforcer https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/backend_lm_format_enforcer/
backend_outlines https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/backend_outlines/
backend_types https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/backend_types/
backend_xgrammar https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/backend_xgrammar/
request https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/request/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/structured_output/utils/
worker https://docs.vllm.ai/en/latest/api/vllm/v1/worker/
block_table https://docs.vllm.ai/en/latest/api/vllm/v1/worker/block_table/
cp_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/cp_utils/
cpu_model_runner https://docs.vllm.ai/en/latest/api/vllm/v1/worker/cpu_model_runner/
cpu_worker https://docs.vllm.ai/en/latest/api/vllm/v1/worker/cpu_worker/
dp_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/dp_utils/
ec_connector_model_runner_mixin https://docs.vllm.ai/en/latest/api/vllm/v1/worker/ec_connector_model_runner_mixin/
gpu_input_batch https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu_input_batch/
gpu_model_runner https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu_model_runner/
gpu_ubatch_wrapper https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu_ubatch_wrapper/
gpu_worker https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu_worker/
kv_connector_model_runner_mixin https://docs.vllm.ai/en/latest/api/vllm/v1/worker/kv_connector_model_runner_mixin/
lora_model_runner_mixin https://docs.vllm.ai/en/latest/api/vllm/v1/worker/lora_model_runner_mixin/
mamba_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/mamba_utils/
tpu_input_batch https://docs.vllm.ai/en/latest/api/vllm/v1/worker/tpu_input_batch/
ubatch_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/ubatch_utils/
ubatching https://docs.vllm.ai/en/latest/api/vllm/v1/worker/ubatching/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/utils/
worker_base https://docs.vllm.ai/en/latest/api/vllm/v1/worker/worker_base/
workspace https://docs.vllm.ai/en/latest/api/vllm/v1/worker/workspace/
xpu_model_runner https://docs.vllm.ai/en/latest/api/vllm/v1/worker/xpu_model_runner/
xpu_worker https://docs.vllm.ai/en/latest/api/vllm/v1/worker/xpu_worker/
gpu https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/
async_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/async_utils/
attn_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/attn_utils/
block_table https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/block_table/
buffer_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/buffer_utils/
cudagraph_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/cudagraph_utils/
dp_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/dp_utils/
input_batch https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/input_batch/
kv_connector https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/kv_connector/
lora_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/lora_utils/
model_runner https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/model_runner/
pp_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/pp_utils/
states https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/states/
structured_outputs https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/structured_outputs/
metrics https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/metrics/
logits https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/metrics/logits/
mm https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/mm/
encoder_runner https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/mm/encoder_runner/
mrope_utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/mm/mrope_utils/
sample https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/
bad_words https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/bad_words/
gumbel https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/gumbel/
logit_bias https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/logit_bias/
logprob https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/logprob/
min_p https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/min_p/
output https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/output/
penalties https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/penalties/
prompt_logprob https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/prompt_logprob/
sampler https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/sampler/
states https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/sample/states/
spec_decode https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/spec_decode/
eagle https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/spec_decode/eagle/
eagle_cudagraph https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/spec_decode/eagle_cudagraph/
rejection_sample https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/spec_decode/rejection_sample/
utils https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu/spec_decode/utils/
CLI Reference https://docs.vllm.ai/en/latest/cli/
vllm serve https://docs.vllm.ai/en/latest/cli/serve/
vllm chat https://docs.vllm.ai/en/latest/cli/chat/
vllm complete https://docs.vllm.ai/en/latest/cli/complete/
vllm run-batch https://docs.vllm.ai/en/latest/cli/run-batch/
vllm bench latency https://docs.vllm.ai/en/latest/cli/bench/latency/
vllm bench mm-processor https://docs.vllm.ai/en/latest/cli/bench/mm_processor/
vllm bench serve https://docs.vllm.ai/en/latest/cli/bench/serve/
vllm bench sweep plot https://docs.vllm.ai/en/latest/cli/bench/sweep/plot/
vllm bench sweep plot_pareto https://docs.vllm.ai/en/latest/cli/bench/sweep/plot_pareto/
vllm bench sweep serve https://docs.vllm.ai/en/latest/cli/bench/sweep/serve/
vllm bench sweep serve_sla https://docs.vllm.ai/en/latest/cli/bench/sweep/serve_sla/
vllm bench throughput https://docs.vllm.ai/en/latest/cli/bench/throughput/
Contact Us https://docs.vllm.ai/en/latest/community/contact_us/
Meetups https://docs.vllm.ai/en/latest/community/meetups/
Sponsors https://docs.vllm.ai/en/latest/community/sponsors/
Collaboration Policy https://docs.vllm.ai/en/latest/governance/collaboration/
Committers https://docs.vllm.ai/en/latest/governance/committers/
Governance Process https://docs.vllm.ai/en/latest/governance/process/
Blog https://blog.vllm.ai
Forum https://discuss.vllm.ai
Slack https://slack.vllm.ai
https://github.com/vllm-project/vllm/edit/main/docs/README.md
https://docs.vllm.ai/en/latest/#welcome-to-vllm
https://docs.vllm.ai/en/latest/assets/logos/vllm-logo-text-light.png
https://docs.vllm.ai/en/latest/assets/logos/vllm-logo-text-dark.png
Starhttps://github.com/vllm-project/vllm
Watchhttps://github.com/vllm-project/vllm/subscription
Forkhttps://github.com/vllm-project/vllm/fork
Sky Computing Labhttps://sky.cs.berkeley.edu
Quickstart Guidehttps://docs.vllm.ai/en/latest/getting_started/quickstart/
User Guidehttps://docs.vllm.ai/en/latest/usage/
Developer Guidehttps://docs.vllm.ai/en/latest/contributing/
Roadmaphttps://roadmap.vllm.ai
Releaseshttps://github.com/vllm-project/vllm/releases
PagedAttentionhttps://blog.vllm.ai/2023/06/20/vllm.html
GPTQhttps://arxiv.org/abs/2210.17323
AWQhttps://arxiv.org/abs/2306.00978
vLLM announcing blog posthttps://blog.vllm.ai/2023/06/20/vllm.html
vLLM paperhttps://arxiv.org/abs/2309.06180
How continuous batching enables 23x throughput in LLM inference while reducing p50 latencyhttps://www.anyscale.com/blog/continuous-batching-llm-inference
vLLM Meetupshttps://docs.vllm.ai/en/latest/community/meetups/
Material for MkDocs https://squidfunk.github.io/mkdocs-material/

Viewport: width=device-width,initial-scale=1


URLs of crawlers that visited me.