René's URL Explorer Experiment


Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples · GitHub

Open Graph Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples

X Title: [DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 · Issue #830 · NVIDIA/DeepLearningExamples

Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most recent call last): File "/opt/conda/lib/p...

Open Graph Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...

X Description: Related to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...

Opengraph URL: https://github.com/NVIDIA/DeepLearningExamples/issues/830

X: @github

direct link

Domain: patch-diff.githubusercontent.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64","articleBody":"Related to **DLRM/Pytorch** \r\n\r\n**Describe the bug**\r\nChanged embedding size to 64 (default 128)\r\nChanged the last layer of bottom MLP size to 64 (default 128)\r\nThis caused crash as shown below.\r\n```\r\nTraceback (most recent call last):\r\n  File \"/opt/conda/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\r\n    \"__main__\", mod_spec)\r\n  File \"/opt/conda/lib/python3.6/runpy.py\", line 85, in _run_code\r\n    exec(code, run_globals)\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 519, in \u003cmodule\u003e\r\n    app.run(main)\r\n  File \"/opt/conda/lib/python3.6/site-packages/absl/app.py\", line 299, in run\r\n    _run_main(main, args)\r\n  File \"/opt/conda/lib/python3.6/site-packages/absl/app.py\", line 250, in _run_main\r\n    sys.exit(main(argv))\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 264, in main\r\n    train(model, loss_fn, optimizer, data_loader_train, data_loader_test, scaled_lr)\r\n  File \"/workspace/dlrm/dlrm/scripts/main.py\", line 361, in train\r\n    loss.backward()\r\n  File \"/opt/conda/lib/python3.6/site-packages/torch/tensor.py\", line 184, in backward\r\n    torch.autograd.backward(self, gradient, retain_graph, create_graph)\r\n  File \"/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py\", line 123, in backward\r\n    allow_unreachable=True)  # allow_unreachable flag\r\nRuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`\r\nException raised from createCublasHandle at ../aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):\r\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #1: \u003cunknown function\u003e + 0x327d0c2 (0x7ff4bbe1c0c2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #2: at::cuda::getCurrentCUDABlasHandle() + 0xb82 (0x7ff4bbe1d9d2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #3: \u003cunknown function\u003e + 0x326945f (0x7ff4bbe0845f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #4: at::native::addmm_out_cuda_impl(at::Tensor\u0026, at::Tensor const\u0026, at::Tensor const\u0026, at::Tensor const\u0026, c10::Scalar, c10::Scalar) + 0x78e (0x7ff4bacef5ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #5: at::native::mm_cuda(at::Tensor const\u0026, at::Tensor const\u0026) + 0x15b (0x7ff4bacf04bb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #6: \u003cunknown function\u003e + 0x3293808 (0x7ff4bbe32808 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #7: \u003cunknown function\u003e + 0x330f734 (0x7ff4bbeae734 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)\r\nframe #8: \u003cunknown function\u003e + 0x2ba029b (0x7ff537b0b29b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #9: \u003cunknown function\u003e + 0x7a8224 (0x7ff535713224 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #10: at::Tensor c10::Dispatcher::call\u003cat::Tensor, at::Tensor const\u0026, at::Tensor const\u0026\u003e(c10::OperatorHandle const\u0026, at::Tensor const\u0026, at::Tensor const\u0026) const + 0xc5 (0x7ff5c6f346e5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\nframe #11: \u003cunknown function\u003e + 0x28fe447 (0x7ff537869447 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #12: torch::autograd::generated::AddmmBackward::apply(std::vector\u003cat::Tensor, std::allocator\u003cat::Tensor\u003e \u003e\u0026\u0026) + 0x155 (0x7ff5378aeca5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #13: \u003cunknown function\u003e + 0x2ee2f75 (0x7ff537e4df75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #14: torch::autograd::Engine::evaluate_function(std::shared_ptr\u003ctorch::autograd::GraphTask\u003e\u0026, torch::autograd::Node*, torch::autograd::InputBuffer\u0026, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0x1808 (0x7ff537e48f68 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #15: torch::autograd::Engine::thread_main(std::shared_ptr\u003ctorch::autograd::GraphTask\u003e const\u0026, bool) + 0x551 (0x7ff537e49e01 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #16: torch::autograd::Engine::thread_init(int, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0xa3 (0x7ff537e3f863 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)\r\nframe #17: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr\u003ctorch::autograd::ReadyQueue\u003e const\u0026) + 0x50 (0x7ff5c7236b20 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\nframe #18: \u003cunknown function\u003e + 0xbd6df (0x7ff5f4af76df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)\r\nframe #19: \u003cunknown function\u003e + 0x76db (0x7ff5fffcf6db in /lib/x86_64-linux-gnu/libpthread.so.0)\r\nframe #20: clone + 0x3f (0x7ff5ffcf888f in /lib/x86_64-linux-gnu/libc.so.6)\r\n\r\nterminate called after throwing an instance of 'c10::Error'\r\n  what():  CUDA error: an illegal memory access was encountered\r\nException raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):\r\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string\u003cchar, std::char_traits\u003cchar\u003e, std::allocator\u003cchar\u003e \u003e) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7ff5f41a5500 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)\r\nframe #2: c10::TensorImpl::release_resources() + 0x4d (0x7ff5f43f2c9d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)\r\nframe #3: \u003cunknown function\u003e + 0x59f1e2 (0x7ff5c724b1e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)\r\n\u003comitting python frames\u003e\r\nframe #16: __libc_start_main + 0xe7 (0x7ff5ffbf8b97 in /lib/x86_64-linux-gnu/libc.so.6)\r\n\r\nFatal Python error: Aborted\r\n\r\nThread 0x00007ff59fda0700 (most recent call first):\r\n\r\nThread 0x00007ff56b58b700 (most recent call first):\r\n\r\nCurrent thread 0x00007ff6003fc740 (most recent call first):\r\nAborted\r\n```\r\n\r\n**To Reproduce**\r\nuse the command line:\r\n--embedding_dim 64 --bottom_mlp_sizes 512,256,64\r\n\r\n**Expected behavior**\r\nit should not crash.\r\n\r\n**Environment**\r\nPlease provide at least:\r\n* Container version (e.g. pytorch:20.06-py3):\r\n* GPUs in the system: (e.g. 1x Tesla V100 32GB):\r\n* CUDA driver version (e.g. 418.67):\r\n","author":{"url":"https://github.com/junshi15","@type":"Person","name":"junshi15"},"datePublished":"2021-02-13T06:18:00.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":4},"url":"https://github.com/830/DeepLearningExamples/issues/830"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:ab9f775f-4348-a1b8-aa4d-230fc11e65a7
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idB6A0:3337F8:A6C215:ECD791:6978C43D
html-safe-nonce81a351328a91c167f77ac607b2fb7d296e989de538a06fa48bc3e32c9171f989
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNkEwOjMzMzdGODpBNkMyMTU6RUNENzkxOjY5NzhDNDNEIiwidmlzaXRvcl9pZCI6IjE3OTU5Mjg0NzMzNzQwODIxMDkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmace5658a2e88c46c32eb5f6c4a59e708c28865766ec4c978e378110cdc6ae1a9a3
hovercard-subject-tagissue:807687937
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/NVIDIA/DeepLearningExamples/830/issue_layout
twitter:imagehttps://opengraph.githubassets.com/2101c68b2e0e1d3e3e65257d8017b7a0c664c246457993d59793596724578337/NVIDIA/DeepLearningExamples/issues/830
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/2101c68b2e0e1d3e3e65257d8017b7a0c664c246457993d59793596724578337/NVIDIA/DeepLearningExamples/issues/830
og:image:altRelated to DLRM/Pytorch Describe the bug Changed embedding size to 64 (default 128) Changed the last layer of bottom MLP size to 64 (default 128) This caused crash as shown below. Traceback (most r...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamejunshi15
hostnamegithub.com
expected-hostnamegithub.com
None2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50
turbo-cache-controlno-preview
go-importgithub.com/NVIDIA/DeepLearningExamples git https://github.com/NVIDIA/DeepLearningExamples.git
octolytics-dimension-user_id1728152
octolytics-dimension-user_loginNVIDIA
octolytics-dimension-repository_id131881622
octolytics-dimension-repository_nwoNVIDIA/DeepLearningExamples
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id131881622
octolytics-dimension-repository_network_root_nwoNVIDIA/DeepLearningExamples
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release9520342ca7ead2f1a011aa96eaff82fc054a4970
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FDeepLearningExamples%2Fissues%2F830
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FNVIDIA%2FDeepLearningExamples%2Fissues%2F830
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=NVIDIA%2FDeepLearningExamples
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
Reloadhttps://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830
NVIDIA https://patch-diff.githubusercontent.com/NVIDIA
DeepLearningExampleshttps://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Fork 3.4k https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Star 14.7k https://patch-diff.githubusercontent.com/login?return_to=%2FNVIDIA%2FDeepLearningExamples
Code https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Issues 253 https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues
Pull requests 71 https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulls
Actions https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/actions
Projects 0 https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/projects
Security 0 https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/security
Insights https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulse
Code https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples
Issues https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues
Pull requests https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulls
Actions https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/actions
Projects https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/projects
Security https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/security
Insights https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/pulse
New issuehttps://patch-diff.githubusercontent.com/login?return_to=https://github.com/NVIDIA/DeepLearningExamples/issues/830
New issuehttps://patch-diff.githubusercontent.com/login?return_to=https://github.com/NVIDIA/DeepLearningExamples/issues/830
[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64https://patch-diff.githubusercontent.com/NVIDIA/DeepLearningExamples/issues/830#top
bugSomething isn't workinghttps://github.com/NVIDIA/DeepLearningExamples/issues?q=state%3Aopen%20label%3A%22bug%22
https://github.com/junshi15
https://github.com/junshi15
junshi15https://github.com/junshi15
on Feb 13, 2021https://github.com/NVIDIA/DeepLearningExamples/issues/830#issue-807687937
bugSomething isn't workinghttps://github.com/NVIDIA/DeepLearningExamples/issues?q=state%3Aopen%20label%3A%22bug%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.