Title: Memory consumption issue · Issue #53 · DeepGraphLearning/KnowledgeGraphEmbedding · GitHub
Open Graph Title: Memory consumption issue · Issue #53 · DeepGraphLearning/KnowledgeGraphEmbedding
X Title: Memory consumption issue · Issue #53 · DeepGraphLearning/KnowledgeGraphEmbedding
Description: I use the command: bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de to train RotatE on a 11 GB GPU. I ensure it is completely free. I still get the following error: 2022-03-31 19:32:37,370 INFO negative_...
Open Graph Description: I use the command: bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de to train RotatE on a 11 GB GPU. I ensure it is completely free. I still get the following error...
X Description: I use the command: bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de to train RotatE on a 11 GB GPU. I ensure it is completely free. I still get the following error...
Opengraph URL: https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding/issues/53
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Memory consumption issue","articleBody":"I use the command:\r\n```\r\nbash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de\r\n```\r\nto train RotatE on a 11 GB GPU. I ensure it is completely free.\r\nI still get the following error:\r\n```\r\n2022-03-31 19:32:37,370 INFO negative_adversarial_sampling = False\r\n2022-03-31 19:32:37,370 INFO learning_rate = 0\r\n2022-03-31 19:32:39,079 INFO Training average positive_sample_loss at step 0: 5.635527\r\n2022-03-31 19:32:39,079 INFO Training average negative_sample_loss at step 0: 0.003591\r\n2022-03-31 19:32:39,079 INFO Training average loss at step 0: 2.819559\r\n2022-03-31 19:32:39,079 INFO Evaluating on Valid Dataset...\r\n2022-03-31 19:32:39,552 INFO Evaluating the model... (0/2192)\r\n2022-03-31 19:33:38,650 INFO Evaluating the model... (1000/2192)\r\n2022-03-31 19:34:38,503 INFO Evaluating the model... (2000/2192)\r\n2022-03-31 19:34:49,981 INFO Valid MRR at step 0: 0.005509\r\n2022-03-31 19:34:49,982 INFO Valid MR at step 0: 6894.798660\r\n2022-03-31 19:34:49,982 INFO Valid HITS@1 at step 0: 0.004733\r\n2022-03-31 19:34:49,982 INFO Valid HITS@3 at step 0: 0.005076\r\n2022-03-31 19:34:49,982 INFO Valid HITS@10 at step 0: 0.005646\r\nTraceback (most recent call last):\r\n File \"codes/run.py\", line 371, in \u003cmodule\u003e\r\n main(parse_args())\r\n File \"codes/run.py\", line 315, in main\r\n log = kge_model.train_step(kge_model, optimizer, train_iterator, args)\r\n File \"/home/prachi/related_work/KnowledgeGraphEmbedding/codes/model.py\", line 315, in train_step\r\n loss.backward()\r\n File \"/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/tensor.py\", line 102, in backward\r\n torch.autograd.backward(self, gradient, retain_graph, create_graph)\r\n File \"/home/prachi/anaconda3/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py\", line 90, in backward\r\n allow_unreachable=True) # allow_unreachable flag\r\nRuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 10.92 GiB total capacity; 7.41 GiB already allocated; 1.51 GiB free; 1.52 GiB cached)\r\nrun.sh: line 79: \r\nCUDA_VISIBLE_DEVICES=$GPU_DEVICE python -u $CODE_PATH/run.py --do_train \\\r\n --cuda \\\r\n --do_valid \\\r\n --do_test \\\r\n --data_path $FULL_DATA_PATH \\\r\n --model $MODEL \\\r\n -n $NEGATIVE_SAMPLE_SIZE -b $BATCH_SIZE -d $HIDDEN_DIM \\\r\n -g $GAMMA -a $ALPHA -adv \\\r\n -lr $LEARNING_RATE --max_steps $MAX_STEPS \\\r\n -save $SAVE --test_batch_size $TEST_BATCH_SIZE \\\r\n ${14} ${15} ${16} ${17} ${18} ${19} ${20}\r\n\r\n: No such file or directory\r\n```\r\nI get similar errors on trying to train FB15k using the command in best_config.sh file.\r\nI reduced the batchsize to 500 and it worked but the performance is much less than the numbers reported in the paper.\r\n\r\nI am not sure what is the issue.","author":{"url":"https://github.com/p6jain","@type":"Person","name":"p6jain"},"datePublished":"2022-03-31T14:21:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/53/KnowledgeGraphEmbedding/issues/53"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:7bf90213-6c48-5661-d4a3-edeecbfd6def |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | E942:284320:A10279:D42BC7:696B5657 |
| html-safe-nonce | 1788a5038f3c2c073deafbd3b790cad366d90350eb6ff09c366fe44d3900f198 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFOTQyOjI4NDMyMDpBMTAyNzk6RDQyQkM3OjY5NkI1NjU3IiwidmlzaXRvcl9pZCI6IjE1NjMwODg2MTk2OTU3MjQxMTkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | f0925a7a864eba0b1e6ba14211f1d5d2f5e1c9449f48c6050a18c06fc51a146e |
| hovercard-subject-tag | issue:1188163127 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/DeepGraphLearning/KnowledgeGraphEmbedding/53/issue_layout |
| twitter:image | https://opengraph.githubassets.com/eb6cd899ab15b7464b79e4eda21fda02c90f181c27aaf7df17a54a809269529e/DeepGraphLearning/KnowledgeGraphEmbedding/issues/53 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/eb6cd899ab15b7464b79e4eda21fda02c90f181c27aaf7df17a54a809269529e/DeepGraphLearning/KnowledgeGraphEmbedding/issues/53 |
| og:image:alt | I use the command: bash run.sh train RotatE FB15k-237 0 0 1024 256 1000 9.0 1.0 0.00005 100000 16 -de to train RotatE on a 11 GB GPU. I ensure it is completely free. I still get the following error... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | p6jain |
| hostname | github.com |
| expected-hostname | github.com |
| None | 5f99f7c1d70f01da5b93e5ca90303359738944d8ab470e396496262c66e60b8d |
| turbo-cache-control | no-preview |
| go-import | github.com/DeepGraphLearning/KnowledgeGraphEmbedding git https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding.git |
| octolytics-dimension-user_id | 38018154 |
| octolytics-dimension-user_login | DeepGraphLearning |
| octolytics-dimension-repository_id | 167231866 |
| octolytics-dimension-repository_nwo | DeepGraphLearning/KnowledgeGraphEmbedding |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 167231866 |
| octolytics-dimension-repository_network_root_nwo | DeepGraphLearning/KnowledgeGraphEmbedding |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 82560a55c6b2054555076f46e683151ee28a19bc |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width