René's URL Explorer Experiment

Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples · GitHub

Open Graph Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples

X Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples

Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most recent call last): File "/opt/conda/lib/p...

Open Graph Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...

X Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...

Opengraph URL: https://github.com/NVIDIA/DeepLearningExamples/issues/830

X: @github

direct link

Domain: patch-diff.githubusercontent.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64","articleBody":"Related to **DLRM/Pytorch** \r\n\r\n**Describe the bug**\r\nChanged embedding size to 64 (default 128)\r\nChanged the last layer of bottom MLP size to 64 (default 128)\r\nThis caused crash as shown below.\r\n```\r\nTraceback (most recent call last):\r\n  File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\r\n    \"__main__\", mod_spec)\r\n  File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 519, in \u003cmodule\u003e\r\n    app.run(main)\r\n  File \"/opt/conda/lib/python3.6/site-packages/absl/app.py\", line 299, in run\r\n    _run_main(main, args)\r\n  File \"/opt/conda/lib/python3.6/site-packages/absl/app.py\", line 250, in _run_main\r\n    sys.exit(main(argv))\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 264, in main\r\n    train(model, loss_fn, optimizer, data_loader_train, data_loader_test, scaled_lr)\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 361, in train\r\n    loss.backward()\r\n  File \"/opt/conda/lib/python3.6/site-packages/torch/tensor.py\", line 184, in backward\r\n    torch.autograd.backward(self, gradient, retain_graph, create_graph)\r\n  File \"/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py\", line 123, in backward\r\n    allow_unreachable=True)  # allow_unreachable flag\r\nRuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`\r\nException raised from createCublasHandle at ../aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):\r\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #1: \u003cunknown function\u003e + 0x327d0c2 (0x7ff4bbe1c0c2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #2: at::cuda::getCurrentCUDABlasHandle() + 0xb82 (0x7ff4bbe1d9d2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #3: \u003cunknown function\u003e + 0x326945f (0x7ff4bbe0845f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #4: at::native::addmm_out_cuda_impl(at::Tensor\u0026, at::Tensor const\u0026, at::Tensor const\u0026, at::Tensor const\u0026, c10::Scalar, c10::Scalar) + 0x78e (0x7ff4bacef5ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #5: at::native::mm_cuda(at::Tensor const\u0026, at::Tensor const\u0026) + 0x15b (0x7ff4bacf04bb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #6: \u003cunknown function\u003e + 0x3293808 (0x7ff4bbe32808 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #7: \u003cunknown function\u003e + 0x330f734 (0x7ff4bbeae734 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #8: \u003cunknown function\u003e + 0x2ba029b (0x7ff537b0b29b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #9: \u003cunknown function\u003e + 0x7a8224 (0x7ff535713224 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #10: at::Tensor c10::Dispatcher::call\u003cat::Tensor, at::Tensor const\u0026, at::Tensor const\u0026\u003e(c10::OperatorHandle const\u0026, at::Tensor const\u0026, at::Tensor const\u0026) const + 0xc5 (0x7ff5c6f346e5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\nframe #11: \u003cunknown function\u003e + 0x28fe447 (0x7ff537869447 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #12: torch::autograd::generated::AddmmBackward::apply(std::vector\u003cat::Tensor, std::allocator\u003cat::Tensor\u003e \u003e\u0026\u0026) + 0x155 (0x7ff5378aeca5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #13: \u003cunknown function\u003e + 0x2ee2f75 (0x7ff537e4df75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #14: torch::autograd::Engine::evaluate_function(std::shared_ptr\u003ctorch::autograd::GraphTask\u003e\u0026, torch::autograd::Node*, torch::autograd::InputBuffer\u0026, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0x1808 (0x7ff537e48f68 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #15: torch::autograd::Engine::thread_main(std::shared_ptr\u003ctorch::autograd::GraphTask\u003e const\u0026, bool) + 0x551 (0x7ff537e49e01 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #16: torch::autograd::Engine::thread_init(int, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0xa3 (0x7ff537e3f863 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #17: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0x50 (0x7ff5c7236b20 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\nframe #18: \u003cunknown function\u003e + 0xbd6df (0x7ff5f4af76df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)\r\nframe #19: \u003cunknown function\u003e + 0x76db (0x7ff5fffcf6db in /lib/x86_64-linux-gnu/libpthread.so.0)\r\nframe #20: clone + 0x3f (0x7ff5ffcf888f in /lib/x86_64-linux-gnu/libc.so.6)\r\n\r\nterminate called after throwing an instance of 'c10::Error'\r\n  what():  CUDA error: an illegal memory access was encountered\r\nException raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):\r\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7ff5f41a5500 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)\r\nframe #2: c10::TensorImpl::release_resources() + 0x4d (0x7ff5f43f2c9d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #3: \u003cunknown function\u003e + 0x59f1e2 (0x7ff5c724b1e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\n\u003comitting python frames\u003e\r\nframe #16: __libc_start_main + 0xe7 (0x7ff5ffbf8b97 in /lib/x86_64-linux-gnu/libc.so.6)\r\n\r\nFatal Python error: Aborted\r\n\r\nThread 0x00007ff59fda0700 (most recent call first):\r\n\r\nThread 0x00007ff56b58b700 (most recent call first):\r\n\r\nCurrent thread 0x00007ff6003fc740 (most recent call first):\r\nAborted\r\n```\r\n\r\n**To Reproduce**\r\nuse the command line:\r\n--embedding_dim 64 --bottom_mlp_sizes 512,256,64\r\n\r\n**Expected behavior**\r\nit should not crash.\r\n\r\n**Environment**\r\nPlease provide at least:\r\n* Container version (e.g. pytorch:20.06-py3):\r\n* GPUs in the system: (e.g. 1x Tesla V100 32GB):\r\n* CUDA driver version (e.g. 418.67):\r\n","author":{"url":"https://github.com/junshi15","@type":"Person","name":"junshi15"},"datePublished":"2021-02-13T06:18:00.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":4},"url":"https://github.com/830/DeepLearningExamples/issues/830"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:ab9f775f-4348-a1b8-aa4d-230fc11e65a7
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	B6A0:3337F8:A6C215:ECD791:6978C43D
html-safe-nonce	81a351328a91c167f77ac607b2fb7d296e989de538a06fa48bc3e32c9171f989
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNkEwOjMzMzdGODpBNkMyMTU6RUNENzkxOjY5NzhDNDNEIiwidmlzaXRvcl9pZCI6IjE3OTU5Mjg0NzMzNzQwODIxMDkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac	e5658a2e88c46c32eb5f6c4a59e708c28865766ec4c978e378110cdc6ae1a9a3
hovercard-subject-tag	issue:807687937
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/NVIDIA/DeepLearningExamples/830/issue_layout
twitter:image	https://opengraph.githubassets.com/2101c68b2e0e1d3e3e65257d8017b7a0c664c246457993d59793596724578337/NVIDIA/DeepLearningExamples/issues/830
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/2101c68b2e0e1d3e3e65257d8017b7a0c664c246457993d59793596724578337/NVIDIA/DeepLearningExamples/issues/830
og:image:alt	Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	junshi15
hostname	github.com
expected-hostname	github.com
None	2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50
turbo-cache-control	no-preview
go-import	github.com/NVIDIA/DeepLearningExamples git https://github.com/NVIDIA/DeepLearningExamples.git
octolytics-dimension-user_id	1728152
octolytics-dimension-user_login	NVIDIA
octolytics-dimension-repository_id	131881622
octolytics-dimension-repository_nwo	NVIDIA/DeepLearningExamples
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	131881622
octolytics-dimension-repository_network_root_nwo	NVIDIA/DeepLearningExamples
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	9520342ca7ead2f1a011aa96eaff82fc054a4970
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830#start-of-content
	https://patch-diff.githubusercontent.com/
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FDeepLearningExamples%2Fissues%2F830
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FDeepLearningExamples%2Fissues%2F830
Sign up	https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=NVIDIA%2FDeepLearningExamples
Reload	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
Reload	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
Reload	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
NVIDIA	https://patch-diff.githubusercontent.com/NVIDIA
DeepLearningExamples	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Notifications	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Fork 3.4k	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Star 14.7k	https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Code	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Issues 253	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues
Pull requests 71	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulls
Actions	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/actions
Projects 0	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/projects
Security 0	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/security
Insights	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulse
Code	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Issues	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues
Pull requests	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulls
Actions	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/actions
Projects	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/projects
Security	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/security
Insights	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulse
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/NVIDIA/DeepLearningExamples/issues/830
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/NVIDIA/DeepLearningExamples/issues/830
[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64	https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830#top
bugSomething isn't working	https://github.com/NVIDIA/DeepLearningExamples/issues?q=state%3Aopen%20label%3A%22bug%22
	https://github.com/junshi15
	https://github.com/junshi15
junshi15	https://github.com/junshi15
on Feb 13, 2021	https://github.com/NVIDIA/DeepLearningExamples/issues/830#issue-807687937
bugSomething isn't working	https://github.com/NVIDIA/DeepLearningExamples/issues?q=state%3Aopen%20label%3A%22bug%22
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.