Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench · GitHub
Open Graph Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench
X Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench
Description: Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----------+----------| | web_caption | rouge...
Open Graph Description: Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----...
X Description: Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----...
Opengraph URL: https://github.com/VisualWebBench/VisualWebBench/issues/7
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Cannot reproduce Action-Prediction Score","articleBody":"Thank for a great benchmark!!\r\n\r\n\r\nI tried to reproduce the results of VisualWebBench using `lava-v1.6-vicuna-7b` model.\r\n\r\nThe results are as follows\r\n```\r\n| Task type | Metric | Score |\r\n|-------------------+----------+----------|\r\n| web_caption | rouge_1 | 29.985 |\r\n| webqa | f1 | 39.599 |\r\n| heading_ocr | rouge_1 | 57.33 |\r\n| element_ocr | rouge_1 | 55.5956 |\r\n| element_ground | accuracy | 31.477 |\r\n| action_prediction | accuracy | 1.06762 |\r\n| action_ground | accuracy | 10.6796 |\r\n```\r\n\r\n**This results differ significantly from the paper's score (30.6) with respect to the “Action Prediction” score (my experiment's score is 1.06762).**\r\nAs for the scores for the other tasks, they seem to be well reproduced.\r\n\r\nI would like to know in what environment you experimented.\r\nAlso, has anyone else obtained similar results?","author":{"url":"https://github.com/Ryosuke0104","@type":"Person","name":"Ryosuke0104"},"datePublished":"2024-08-23T08:19:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/7/VisualWebBench/issues/7"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:658b27b2-e0bc-7d51-07f6-1c5131d7c4fc |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 94D8:15DF21:5F8365:845547:698E07D8 |
| html-safe-nonce | 62f8ff2047d3785c516a8fe5be4458a385df80717fafc6bed5841d5707db17f9 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NEQ4OjE1REYyMTo1RjgzNjU6ODQ1NTQ3OjY5OEUwN0Q4IiwidmlzaXRvcl9pZCI6IjI5NTU0NTA5NDMyODQzMTYxMjAiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | c8dde7472e85fa1076b05769962b944cbb9a0b819c865f484dce846c3923a92b |
| hovercard-subject-tag | issue:2482592665 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/VisualWebBench/VisualWebBench/7/issue_layout |
| twitter:image | https://opengraph.githubassets.com/e6de8e02ec53c3c9428c433123a5d2128dbd5ce45ec240055aeac4cb4b9f723d/VisualWebBench/VisualWebBench/issues/7 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/e6de8e02ec53c3c9428c433123a5d2128dbd5ce45ec240055aeac4cb4b9f723d/VisualWebBench/VisualWebBench/issues/7 |
| og:image:alt | Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | Ryosuke0104 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 70a88bfe6d9639434ae7f5a46d15c336b8884978a51526bc55fc57d848f1e3c1 |
| turbo-cache-control | no-preview |
| go-import | github.com/VisualWebBench/VisualWebBench git https://github.com/VisualWebBench/VisualWebBench.git |
| octolytics-dimension-user_id | 165917046 |
| octolytics-dimension-user_login | VisualWebBench |
| octolytics-dimension-repository_id | 780960144 |
| octolytics-dimension-repository_nwo | VisualWebBench/VisualWebBench |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 780960144 |
| octolytics-dimension-repository_network_root_nwo | VisualWebBench/VisualWebBench |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 2519b16db5550494a653f9a8837c14ef7df80804 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width