Title: Reproducing StarCoder2-Instruct · Issue #6 · bigcode-project/selfcodealign · GitHub
Open Graph Title: Reproducing StarCoder2-Instruct · Issue #6 · bigcode-project/selfcodealign
X Title: Reproducing StarCoder2-Instruct · Issue #6 · bigcode-project/selfcodealign
Description: I am trying to recreate the StarCoder2-Instruct-v0.1 model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct-v0.1 model on HF. I actually see quite ...
Open Graph Description: I am trying to recreate the StarCoder2-Instruct-v0.1 model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct...
X Description: I am trying to recreate the StarCoder2-Instruct-v0.1 model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct...
Opengraph URL: https://github.com/bigcode-project/selfcodealign/issues/6
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Reproducing StarCoder2-Instruct","articleBody":"I am trying to recreate the `StarCoder2-Instruct-v0.1` model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the `StarCoder2-Instruct-v0.1` model on HF. \r\n\r\nI actually see quite a bit of discrepancy between the two models' evaluations: `humaneval` on your HF version is 7 points higher than on my reproduced model (both models were evaluated locally by me in the same environment).\r\n\r\n```bash\r\nMODEL_KEY=bigcode/starcoder2-15b\r\nLR=1e-5\r\nEPOCH=4\r\nSEQ_LEN=1280\r\nWARMUP_RATIO=0.05\r\nOUTPUT_DIR=/path/to/output_model\r\nDATASET_FILE=/path/to/50k-dataset.jsonl\r\naccelerate launch -m star_align.train \\\r\n --model_key $MODEL_KEY \\\r\n --model_name_or_path $MODEL_KEY \\\r\n --use_flash_attention True \\\r\n --datafile_paths $DATASET_FILE \\\r\n --output_dir $OUTPUT_DIR \\\r\n --bf16 True \\\r\n --num_train_epochs $EPOCH \\\r\n --max_training_seq_length $SEQ_LEN \\\r\n --pad_to_max_length False \\\r\n --per_device_train_batch_size 1 \\\r\n --gradient_accumulation_steps 64 \\\r\n --group_by_length False \\\r\n --ddp_find_unused_parameters False \\\r\n --logging_steps 1 \\\r\n --log_level info \\\r\n --optim adafactor \\\r\n --max_grad_norm -1 \\\r\n --warmup_ratio $WARMUP_RATIO \\\r\n --learning_rate $LR \\\r\n --lr_scheduler_type linear\r\n```\r\n\r\nAre the parameters in the README correct for the released model? Are you adding anything in your `accelerate` config? i.e. any model wrappers or something else?\r\n\r\nFor the data, I just ran:\r\n```python\r\n\u003e\u003e\u003e from datasets import load_dataset\r\n\u003e\u003e\u003e load_dataset(\"bigcode/self-oss-instruct-sc2-exec-filter-50k\", split=\"train\").to_json(\"/path/to/50k-dataset.jsonl\", lines=True)\r\n```\r\n\r\nDo you have any ideas on how I can reproduce your model? Thanks!","author":{"url":"https://github.com/mstallone","@type":"Person","name":"mstallone"},"datePublished":"2024-05-04T20:36:28.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":5},"url":"https://github.com/6/selfcodealign/issues/6"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:f02387c4-a7e2-5f04-f162-fc5f1ad85aee |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9486:B32AB:2E5080:3DAA97:697000A1 |
| html-safe-nonce | 6fb1f35af57f46129a5e61699a3bee271c7fb4e45de61e948664d8e38b9a4865 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NDg2OkIzMkFCOjJFNTA4MDozREFBOTc6Njk3MDAwQTEiLCJ2aXNpdG9yX2lkIjoiNzU0MzY2NDM4NTYxNjUxMTEzNyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 7a67d31b8b1fd080dddefd7bac22aee904ff58415c96332aaed480d9557a0374 |
| hovercard-subject-tag | issue:2279187478 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/bigcode-project/selfcodealign/6/issue_layout |
| twitter:image | https://opengraph.githubassets.com/ae6248a12cbd611f7250f41a69dd82b9d17c4c894c5d0a2d9075631f3fd0a5f1/bigcode-project/selfcodealign/issues/6 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/ae6248a12cbd611f7250f41a69dd82b9d17c4c894c5d0a2d9075631f3fd0a5f1/bigcode-project/selfcodealign/issues/6 |
| og:image:alt | I am trying to recreate the StarCoder2-Instruct-v0.1 model; however, the model produced by the provided command in the README (copied below) does not match the evaluation of the StarCoder2-Instruct... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | mstallone |
| hostname | github.com |
| expected-hostname | github.com |
| None | 0366807b865cee6776368231232f84d6c8096e6bce43f701a4fb28ea795ec427 |
| turbo-cache-control | no-preview |
| go-import | github.com/bigcode-project/selfcodealign git https://github.com/bigcode-project/selfcodealign.git |
| octolytics-dimension-user_id | 110470554 |
| octolytics-dimension-user_login | bigcode-project |
| octolytics-dimension-repository_id | 792053261 |
| octolytics-dimension-repository_nwo | bigcode-project/selfcodealign |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 792053261 |
| octolytics-dimension-repository_network_root_nwo | bigcode-project/selfcodealign |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 33f356bb2fb58726ccb2f26395bf8ddc9a2d9eaa |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width