Title: Feature Request: Integrate Scalpel for Call Graph and Control/Data Flow Analysis · Issue #11 · codellm-devkit/codeanalyzer-python · GitHub
Open Graph Title: Feature Request: Integrate Scalpel for Call Graph and Control/Data Flow Analysis · Issue #11 · codellm-devkit/codeanalyzer-python
X Title: Feature Request: Integrate Scalpel for Call Graph and Control/Data Flow Analysis · Issue #11 · codellm-devkit/codeanalyzer-python
Description: Is your feature request related to a problem? Please describe. Currently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implemented for --analysis-level=2). However, it ...
Open Graph Description: Is your feature request related to a problem? Please describe. Currently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implement...
X Description: Is your feature request related to a problem? Please describe. Currently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implement...
Opengraph URL: https://github.com/codellm-devkit/codeanalyzer-python/issues/11
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Feature Request: Integrate Scalpel for Call Graph and Control/Data Flow Analysis","articleBody":"## Is your feature request related to a problem? Please describe.\n\nCurrently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implemented for `--analysis-level=2`). However, it lacks crucial program flow analysis capabilities that are essential for understanding code behavior and dependencies:\n\n- **Call Graph Construction**: While planned, the current implementation doesn't provide comprehensive call graph analysis that handles Python's dynamic features (higher-order functions, nested definitions, dynamic calls)\n- **Control Flow Graphs (CFG)**: No support for intra-procedural or inter-procedural control flow analysis\n- **Data Flow Analysis**: Missing data flow tracking capabilities for understanding how data moves through the program\n\nThese limitations prevent users from performing advanced static analysis tasks like vulnerability propagation analysis, refactoring impact assessment, and comprehensive dependency tracking.\n\n## Describe the solution you'd like\n\nI would like to integrate specific components from the **Scalpel Python Static Analysis Framework** (https://github.com/SMAT-Lab/Scalpel) to enhance codeanalyzer-python with robust graph-based analysis:\n\n### 1. Enhanced Analysis Levels\n\n```bash\n--analysis-level 2 # Call graph analysis (implement using Scalpel)\n--analysis-level 3 # Call graph + Control flow graphs \n--analysis-level 4 # Call graph + CFG + Data flow analysis\n```\n\n### 2. New CLI Options\n\n```bash\n--call-graph # Generate comprehensive call graphs\n--control-flow # Generate control flow graphs\n--data-flow # Perform data flow analysis\n--inter-procedural # Enable inter-procedural analysis\n```\n\n### 3. Scalpel Integration Focus\n\nTarget specific Scalpel capabilities:\n\n- **Function 8: Call Graph Construction** - Handles Python's dynamic features like higher-order functions and nested definitions\n- **Function 2: Control-Flow Graph Construction** - Generates intra-procedural CFGs that can be combined for inter-procedural analysis\n- **Function 5: Constant Propagation** - Provides data flow analysis capabilities\n\n### 4. Enhanced Output Schema\n\n```python\nclass PyCallGraph(BaseModel):\n nodes: List[CallNode] # Function/method nodes\n edges: List[CallEdge] # Call relationships\n entry_points: List[str] # Program entry points\n \nclass PyControlFlowGraph(BaseModel):\n function_cfgs: Dict[str, CFG] # Per-function CFGs\n basic_blocks: List[BasicBlock] # Code basic blocks\n \nclass PyDataFlow(BaseModel):\n def_use_chains: Dict[str, List] # Variable definitions and uses\n reaching_definitions: Dict # Reaching definition analysis\n```\n\n## Describe alternatives you've considered\n\n### 1. NetworkX-based custom implementation\nThe project already uses NetworkX, but building CFG/call graph analysis from scratch would be time-intensive and error-prone.\n\n### 2. AST-only analysis\nPython's AST module provides basic structure but lacks the sophisticated analysis needed for accurate call graphs in dynamic Python code.\n\n### 3. Existing call graph tools\n- **pycg**: Good for call graphs but limited CFG support\n- **code2flow**: Visualization-focused, not programmatic analysis\n- **vulture**: Dead code detection, not comprehensive flow analysis\n\n## Additional context\n\n### Specific Scalpel Advantages for Graph Analysis\n\n- **Call Graph**: Handles Python's complex dynamic features (decorators, metaclasses, dynamic imports)\n- **CFG Construction**: Provides precise basic block identification and control flow edges\n- **Inter-procedural Analysis**: Can combine function-level CFGs into program-wide flow graphs\n\n### Current Project Readiness\n\n- Already has placeholder for call graph analysis (`--analysis-level=2`)\n- Uses NetworkX for graph operations\n- Extensible CLI architecture with typer\n- Established pattern for multiple analysis backends\n\n### Implementation Plan\n\n```\n# New module: codeanalyzer/semantic_analysis/scalpel/\n├── __init__.py\n├── scalpel_analyzer.py # Main integration class\n├── call_graph_builder.py # Scalpel call graph integration\n├── cfg_builder.py # Control flow graph integration\n└── data_flow_analyzer.py # Data flow analysis integration\n```\n\n### Expected Output Enhancement\n\n```bash\n# Current (Level 1)\ncodeanalyzer --input project --analysis-level 1 # Symbol table only\n\n# Enhanced (Levels 2-4 with Scalpel)\ncodeanalyzer --input project --analysis-level 2 # + Call graphs\ncodeanalyzer --input project --analysis-level 3 # + Control flow graphs \ncodeanalyzer --input project --analysis-level 4 # + Data flow analysis\n```\n\n### Example Usage Scenarios\n\n1. **Security Analysis**:\n ```bash\n codeanalyzer --input webapp --analysis-level 4 --data-flow\n # Trace data flow from user inputs to sensitive operations\n ```\n\n2. **Refactoring Impact Assessment**:\n ```bash\n codeanalyzer --input legacy_code --call-graph --inter-procedural\n # Understand function dependencies before refactoring\n ```\n\n3. **Performance Analysis**:\n ```bash\n codeanalyzer --input application --control-flow --analysis-level 3\n # Identify performance bottlenecks through CFG analysis\n ```\n\n### Benefits\n\n- **Comprehensive Analysis**: Complete the missing call graph functionality and add powerful control/data flow analysis\n- **Python-Specific**: Handles Python's dynamic nature better than generic tools\n- **Research-Backed**: Scalpel is published research (arXiv:2202.11840) with proven effectiveness\n- **Compatible**: Both projects use Python 3.12+ and have compatible licenses\n- **Modular**: Can integrate specific components without full framework overhead\n\nThis focused integration would complete the missing call graph functionality and add powerful control/data flow analysis capabilities, making codeanalyzer-python a comprehensive tool for program flow analysis without overwhelming complexity.\n\n---\n\n## References\n\n- [Scalpel Framework](https://github.com/SMAT-Lab/Scalpel)\n- [Scalpel Documentation](https://python-scalpel.readthedocs.io/)\n- [Scalpel Research Paper](https://arxiv.org/abs/2202.11840)","author":{"url":"https://github.com/rahlk","@type":"Person","name":"rahlk"},"datePublished":"2025-07-11T15:44:04.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/11/codeanalyzer-python/issues/11"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:7b75ad48-0483-6230-1195-a63685cee3db |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A9C6:3F9D07:2C77AC:3E1E68:698DF04B |
| html-safe-nonce | 0051976a6e1e39c6fc4a45a74fdc95a1af322c47da30c0ae05ad928d8dfb4cac |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBOUM2OjNGOUQwNzoyQzc3QUM6M0UxRTY4OjY5OERGMDRCIiwidmlzaXRvcl9pZCI6IjIzNzQxMjIyODc5MzI1MDIwOTEiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 9028e723a9ec240beee925b13d41230882a760aab232442ca16c44a113289881 |
| hovercard-subject-tag | issue:3223373440 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/codellm-devkit/codeanalyzer-python/11/issue_layout |
| twitter:image | https://opengraph.githubassets.com/88cdc224dcdb67098168c08ef2d80e7db2f644f617443a92f946b2590b26cb4f/codellm-devkit/codeanalyzer-python/issues/11 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/88cdc224dcdb67098168c08ef2d80e7db2f644f617443a92f946b2590b26cb4f/codellm-devkit/codeanalyzer-python/issues/11 |
| og:image:alt | Is your feature request related to a problem? Please describe. Currently, codeanalyzer-python provides basic symbol table generation and has planned call graph analysis (marked as not yet implement... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | rahlk |
| hostname | github.com |
| expected-hostname | github.com |
| None | 929d0ce8b653d60df0698366d7e9012f9423ea1bace40816e16e5b007242aae4 |
| turbo-cache-control | no-preview |
| go-import | github.com/codellm-devkit/codeanalyzer-python git https://github.com/codellm-devkit/codeanalyzer-python.git |
| octolytics-dimension-user_id | 197800760 |
| octolytics-dimension-user_login | codellm-devkit |
| octolytics-dimension-repository_id | 978344904 |
| octolytics-dimension-repository_nwo | codellm-devkit/codeanalyzer-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 978344904 |
| octolytics-dimension-repository_network_root_nwo | codellm-devkit/codeanalyzer-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 143e58641f5eb460a02eda3a18cc1ef28e8c5188 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width