René's URL Explorer Experiment

Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench · GitHub

Open Graph Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench

X Title: Cannot reproduce Action-Prediction Score · Issue #7 · VisualWebBench/VisualWebBench

Description: Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----------+----------| | web_caption | rouge...

Open Graph Description: Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows | Task type | Metric | Score | |-------------------+----...

Opengraph URL: https://github.com/VisualWebBench/VisualWebBench/issues/7

X: @github

direct link

Domain: patch-diff.githubusercontent.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Cannot reproduce Action-Prediction Score","articleBody":"Thank for a great benchmark!!\r\n\r\n\r\nI tried to reproduce the results of VisualWebBench using `lava-v1.6-vicuna-7b` model.\r\n\r\nThe  results are as follows\r\n```\r\n| Task type         | Metric   |    Score |\r\n|-------------------+----------+----------|\r\n| web_caption       | rouge_1  | 29.985   |\r\n| webqa             | f1       | 39.599   |\r\n| heading_ocr       | rouge_1  | 57.33    |\r\n| element_ocr       | rouge_1  | 55.5956  |\r\n| element_ground    | accuracy | 31.477   |\r\n| action_prediction | accuracy |  1.06762 |\r\n| action_ground     | accuracy | 10.6796  |\r\n```\r\n\r\n**This results differ significantly from the paper's score (30.6) with respect to the “Action Prediction” score (my experiment's score is 1.06762).**\r\nAs for the scores for the other tasks, they seem to be well reproduced.\r\n\r\nI would like to know in what environment you experimented.\r\nAlso, has anyone else obtained similar results?","author":{"url":"https://github.com/Ryosuke0104","@type":"Person","name":"Ryosuke0104"},"datePublished":"2024-08-23T08:19:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/7/VisualWebBench/issues/7"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:658b27b2-e0bc-7d51-07f6-1c5131d7c4fc
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	94D8:15DF21:5F8365:845547:698E07D8
html-safe-nonce	62f8ff2047d3785c516a8fe5be4458a385df80717fafc6bed5841d5707db17f9
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NEQ4OjE1REYyMTo1RjgzNjU6ODQ1NTQ3OjY5OEUwN0Q4IiwidmlzaXRvcl9pZCI6IjI5NTU0NTA5NDMyODQzMTYxMjAiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ==
visitor-hmac	c8dde7472e85fa1076b05769962b944cbb9a0b819c865f484dce846c3923a92b
hovercard-subject-tag	issue:2482592665
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/VisualWebBench/VisualWebBench/7/issue_layout
twitter:image	https://opengraph.githubassets.com/e6de8e02ec53c3c9428c433123a5d2128dbd5ce45ec240055aeac4cb4b9f723d/VisualWebBench/VisualWebBench/issues/7
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/e6de8e02ec53c3c9428c433123a5d2128dbd5ce45ec240055aeac4cb4b9f723d/VisualWebBench/VisualWebBench/issues/7
og:image:alt	Thank for a great benchmark!! I tried to reproduce the results of VisualWebBench using lava-v1.6-vicuna-7b model. The results are as follows \| Task type \| Metric \| Score \| \|-------------------+----...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	Ryosuke0104
hostname	github.com
expected-hostname	github.com
None	70a88bfe6d9639434ae7f5a46d15c336b8884978a51526bc55fc57d848f1e3c1
turbo-cache-control	no-preview
go-import	github.com/VisualWebBench/VisualWebBench git https://github.com/VisualWebBench/VisualWebBench.git
octolytics-dimension-user_id	165917046
octolytics-dimension-user_login	VisualWebBench
octolytics-dimension-repository_id	780960144
octolytics-dimension-repository_nwo	VisualWebBench/VisualWebBench
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	780960144
octolytics-dimension-repository_network_root_nwo	VisualWebBench/VisualWebBench
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	2519b16db5550494a653f9a8837c14ef7df80804
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues/7#start-of-content
	https://patch-diff.githubusercontent.com/
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FVisualWebBench%2FVisualWebBench%2Fissues%2F7
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FVisualWebBench%2FVisualWebBench%2Fissues%2F7
Sign up	https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=VisualWebBench%2FVisualWebBench
Reload	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues/7
Reload	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues/7
Reload	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues/7
VisualWebBench	https://patch-diff.githubusercontent.com/VisualWebBench
VisualWebBench	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench
Notifications	https://patch-diff.githubusercontent.com/login?return_to=%2FVisualWebBench%2FVisualWebBench
Fork 5	https://patch-diff.githubusercontent.com/login?return_to=%2FVisualWebBench%2FVisualWebBench
Star 63	https://patch-diff.githubusercontent.com/login?return_to=%2FVisualWebBench%2FVisualWebBench
Code	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench
Issues 7	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues
Pull requests 0	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/pulls
Actions	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/actions
Projects 0	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/projects
Security 0	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/security
Insights	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/pulse
Code	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench
Issues	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues
Pull requests	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/pulls
Actions	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/actions
Projects	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/projects
Security	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/security
Insights	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/pulse
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/VisualWebBench/VisualWebBench/issues/7
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/VisualWebBench/VisualWebBench/issues/7
Cannot reproduce Action-Prediction Score	https://patch-diff.githubusercontent.com/VisualWebBench/VisualWebBench/issues/7#top
	https://github.com/Ryosuke0104
	https://github.com/Ryosuke0104
Ryosuke0104	https://github.com/Ryosuke0104
on Aug 23, 2024	https://github.com/VisualWebBench/VisualWebBench/issues/7#issue-2482592665
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.