René's URL Explorer Experiment

Title: RedCode

Open Graph Title: RedCode

Description: RedCode Benchmark Official Webpage

Open Graph Description: A Risky Code Execution and Generation Benchmark for Code Agents

direct link

Domain: redcode-agent.github.io

google-site-verification

lTNLmXZvHm_RXq9IklCb7g-0DA_3JIIpvcV9Um-vpRQ

Links:

RedCode	https://redcode-agent.github.io#motivation
Motivation	https://redcode-agent.github.io#motivation
Results	https://redcode-agent.github.io#results
Benchmark	https://redcode-agent.github.io#benchmark
Leaderboard	https://redcode-agent.github.io#leaderboard
Cite us	https://redcode-agent.github.io#BibTeX
Chengquan Guo	https://www.chengquanguo.com
Xun Liu	https://antiquality.github.io
Chulin Xie	https://alphapav.github.io
Andy Zhou	https://www.andyzhou.ai
Yi Zeng	https://www.yi-zeng.com
Zinan Lin	https://zinanlin.me
Dawn Song	https://dawnsong.io
Bo Li	https://aisecure.github.io
Paper	https://arxiv.org/abs/2411.07781
Code	https://github.com/AI-secure/RedCode
Leaderboard	https://redcode-agent.github.io#leaderboard
Dataset	https://github.com/AI-secure/RedCode/tree/main/dataset
Finding 1	https://redcode-agent.github.io#demo1
Finding 4	https://redcode-agent.github.io#demo4
Finding 2	https://redcode-agent.github.io#demo2
Finding 4	https://redcode-agent.github.io#demo4
Finding 5	https://redcode-agent.github.io#demo5
Finding 3	https://redcode-agent.github.io#demo3
Finding 1: OpenCodeInterpreter is 🛡️safer than ReAct and CodeAct agents.	https://redcode-agent.github.io#demo1
The heatmaps below	https://redcode-agent.github.io#demo4
Finding 2: Agents are more likely to reject executing unsafe operations in operating system domain.	https://redcode-agent.github.io#demo2
The heatmaps below	https://redcode-agent.github.io#demo4
Finding 3: Agents are less likely to reject risky queries in natural language than programming language inputs, or in Bash code than Python code inputs.	https://redcode-agent.github.io#demo3
Finding 4: More capable base models, such as GPT series, tend to have a higher rejection rate for unsafe operations under the same Agent structure.	https://redcode-agent.github.io#demo4
Finding 5: More capable base models tend to produce more sophisticated and effective harmful software.	https://redcode-agent.github.io#demo5
R-Judge	https://arxiv.org/html/2401.10019v2
CWE	https://cwe.mitre.org/data/published/cwe_v4.13.pdf
paper	https://redcode-agent.github.io
ToolEmu	https://arxiv.org/abs/2309.15817
AgentMonitor	https://arxiv.org/abs/2311.10538
R-Judge	https://arxiv.org/html/2401.10019v2
Finding 5	https://redcode-agent.github.io#demo5
SORRY-Bench	https://sorry-bench.github.io/index.html
Academic Project Page Template	https://github.com/eliahuhorwitz/Academic-project-page-template
Nerfies	https://nerfies.github.io
Creative Commons Attribution-ShareAlike 4.0 International License	http://creativecommons.org/licenses/by-sa/4.0/

URLs of crawlers that visited me.