Title: CUDA out of memory during inference with SQL-R1-14B · Issue #29 · DataArcTech/SQL-R1 · GitHub
Open Graph Title: CUDA out of memory during inference with SQL-R1-14B · Issue #29 · DataArcTech/SQL-R1
X Title: CUDA out of memory during inference with SQL-R1-14B · Issue #29 · DataArcTech/SQL-R1
Description: Hello!! I tried to inference SQL-R1-14B on 1*A100 80GB. No matter I set the gpu_memory_utilization=0.9\0.8\0.5, I always get the CUDA out of memory error. The SQL-R1-3B and 7B have all been successfully ran on my device. Besides, I can a...
Open Graph Description: Hello!! I tried to inference SQL-R1-14B on 1*A100 80GB. No matter I set the gpu_memory_utilization=0.9\0.8\0.5, I always get the CUDA out of memory error. The SQL-R1-3B and 7B have all been success...
X Description: Hello!! I tried to inference SQL-R1-14B on 1*A100 80GB. No matter I set the gpu_memory_utilization=0.9\0.8\0.5, I always get the CUDA out of memory error. The SQL-R1-3B and 7B have all been success...
Opengraph URL: https://github.com/DataArcTech/SQL-R1/issues/29
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"CUDA out of memory during inference with SQL-R1-14B","articleBody":"Hello!! I tried to inference SQL-R1-14B on 1*A100 80GB. No matter I set the gpu_memory_utilization=0.9\\0.8\\0.5, I always get the CUDA out of memory error. The SQL-R1-3B and 7B have all been successfully ran on my device. Besides, I can also run other models around 14B on my device. Do you have any ideas about this error? Thanks :)\n\nHere is the config log of vLLM:\n```\nInitializing an LLM engine (vdev) with config: model='MPX0222forHF/SQL-R1-14B', speculative_config=None, tokenizer='MPX0222forHF/SQL-R1-14B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=MPX0222forHF/SQL-R1-14B, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=False, use_cached_outputs=False, mm_processor_kwargs=None)\n```\n\nHere is the error message:\n```\ntorch.OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 236.69 MiB is free. Process 2457479 has 60.40 GiB memory in use. Including non-PyTorch memory, this process has 18.51 GiB memory in use. Of the allocated memory 18.01 GiB is allocated by PyTorch, and 12.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)\n```","author":{"url":"https://github.com/thatmee","@type":"Person","name":"thatmee"},"datePublished":"2026-01-20T10:27:32.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/29/SQL-R1/issues/29"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:67bc9ed4-995b-8e92-e305-ea2a63bed702 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | DFC4:3AFAA9:B57ECB:F4E5E7:698E42AA |
| html-safe-nonce | 2d80056d1cc1cc985b55233d2216b83d15ac259dd71ce3eef18329bd6d611d0a |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJERkM0OjNBRkFBOTpCNTdFQ0I6RjRFNUU3OjY5OEU0MkFBIiwidmlzaXRvcl9pZCI6IjI3NjYxOTIyMTcyNDg2NDU4MDIiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 1ec5f85502cb53db846f25ad0aad980515d5d71597c9e1cacf06c5b8739c1ecb |
| hovercard-subject-tag | issue:3833075541 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/DataArcTech/SQL-R1/29/issue_layout |
| twitter:image | https://opengraph.githubassets.com/64eab7412456c732190e8e4a709bd2cd961d6384884a553d42e1b9d5ca9cd0b8/DataArcTech/SQL-R1/issues/29 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/64eab7412456c732190e8e4a709bd2cd961d6384884a553d42e1b9d5ca9cd0b8/DataArcTech/SQL-R1/issues/29 |
| og:image:alt | Hello!! I tried to inference SQL-R1-14B on 1*A100 80GB. No matter I set the gpu_memory_utilization=0.9\0.8\0.5, I always get the CUDA out of memory error. The SQL-R1-3B and 7B have all been success... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | thatmee |
| hostname | github.com |
| expected-hostname | github.com |
| None | a5632af64f7fed7bff1d6a428d1aca1b94fa7a48f760de2d39d9b1effdbf0082 |
| turbo-cache-control | no-preview |
| go-import | github.com/DataArcTech/SQL-R1 git https://github.com/DataArcTech/SQL-R1.git |
| octolytics-dimension-user_id | 149999489 |
| octolytics-dimension-user_login | DataArcTech |
| octolytics-dimension-repository_id | 981865038 |
| octolytics-dimension-repository_nwo | DataArcTech/SQL-R1 |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 981865038 |
| octolytics-dimension-repository_network_root_nwo | DataArcTech/SQL-R1 |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 3dda52e29a416820ced574e74040033b820613a2 |
| ui-target | canary-1 |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width