René's URL Explorer Experiment


Title: Feature: Add file/directory exclusion feature with glob pattern support by fishmingyu · Pull Request #199 · sourcegraph/scip-python · GitHub

Open Graph Title: Feature: Add file/directory exclusion feature with glob pattern support by fishmingyu · Pull Request #199 · sourcegraph/scip-python

X Title: Feature: Add file/directory exclusion feature with glob pattern support by fishmingyu · Pull Request #199 · sourcegraph/scip-python

Description: Motivation The motivation of the PR is in many repositories; we don't want to include some files, e.g., tests*) and it may also include a file that would be either broken or meaningless. However, all these files will not only affect the processing time of pyright-scip, but also will cause abortion. One example I showed below is the failed log while I process the sympy repo. I also attached the success log after applying our new exclude pattern. Summary This PR adds the ability to exclude files and directories from SCIP indexing using command-line flags or a configuration file. The exclusion feature supports both exact paths and glob patterns (e.g., test_*), and works as a filter that gracefully handles non-matching patterns without errors. Changes Made 1. MainCommand.ts Added exclude?: string[] to IndexOptions interface Added excludeConfig?: string to IndexOptions interface Added --exclude flag to accept multiple file/directory paths Added --exclude-config flag to accept a config file with exclusion paths 2. indexer.ts Added import { minimatch } from 'minimatch' for glob pattern matching Implemented exclusion logic after targetOnly filtering (lines 122-179) Reads patterns from --exclude flag Reads patterns from config file if --exclude-config is provided Config file supports: Pattern matching features: Exact path matching (original functionality) Glob patterns: dir*, file*, tests/**, etc. Relative and absolute paths Works as a filter - patterns matching nothing don't cause errors 3. package.json Added minimatch dependency for glob pattern matching Usage Exclude specific files/directories via command line:* scip-python index --project-name=myproject --exclude path/to/broken.py --exclude path/to/circular/ Exclude using patterns:* scip-python index --project-name=myproject --exclude "test_*" "build/**" Exclude using a config file:* scip-python index --project-name=myproject --exclude-config=.scipignore Example config file format (.scipignore):* # Broken files src/broken_module.py # Directories with circular dependencies src/experimental/ tests/broken/ # Glob patterns test_* build/** # Another problematic file lib/legacy.py Benefits Flexibility: Supports both exact paths and glob patterns Robustness: Works as a filter - no errors if patterns match nothing Usability: Config file support for managing complex exclusion rules Consistency: Follows the same pattern as existing --target-only flag Testing The feature can be tested by: Using --exclude with exact paths Using --exclude with glob patterns like test_* Using --exclude-config with a file containing mixed patterns and comments Verifying that non-matching patterns don't cause errors Log when directly indexing the sympy (11:57:21) pyproject.toml file found at /home/zhongming/.codeminer/sympy_sympy. (11:57:21) Loading pyproject.toml file at /home/zhongming/.codeminer/sympy_sympy/pyproject.toml Assuming Python version 3.11 Assuming Python platform Linux Auto-excluding **/node_modules Auto-excluding **/**pycache** Auto-excluding **/.* (11:57:21) Total Project Files 1522 (11:57:21) Indexing /home/zhongming/.codeminer/sympy_sympy with version d293133e81194adc11177729af91c970f092a6e7 (11:57:21) Evaluating python environment dependencies (11:57:21) Gathering environment information (11:57:22) Parse and search for dependencies (11:57:32) 152 / 1522 (11:57:43) 211 / 1522 (11:57:57) 377 / 1522 (11:58:07) 577 / 1522 (11:58:17) 864 / 1522 (11:58:27) 958 / 1522 (11:58:51) 1084 / 1522 (11:59:01) 1276 / 1522 (11:59:11) 1419 / 1522 (11:59:14) Index workspace and track project files (11:59:14) Analyze project and dependencies (11:59:26) 76 / 1524 (11:59:37) 114 / 1524 (11:59:47) 165 / 1524 (11:59:57) 224 / 1524 (12:00:08) 264 / 1524 (12:00:33) 301 / 1524 (12:00:43) 432 / 1524 (12:00:53) 477 / 1524 (12:01:03) 526 / 1524 (12:01:13) 584 / 1524 (12:01:25) 614 / 1524 (12:01:37) 642 / 1524 <--- Last few GCs ---> [2024902:0x7c30a30] 258240 ms: Mark-Compact 3985.9 (4128.9) -> 3970.3 (4129.4) MB, 1982.51 / 0.00 ms (average mu = 0.180, current mu = 0.020) allocation failure; scavenge might not succeed [2024902:0x7c30a30] 260659 ms: Mark-Compact 3986.5 (4129.4) -> 3970.9 (4129.9) MB, 2370.20 / 0.00 ms (average mu = 0.103, current mu = 0.020) allocation failure; scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace ----- 1: 0xb8d0a3 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node] 2: 0xf06250 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 3: 0xf06537 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 4: 0x11180d5 [node] 5: 0x1118664 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node] 6: 0x112f554 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node] 7: 0x112fd6c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node] 8: 0x1106071 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node] 9: 0x1107205 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node] 10: 0x10e4856 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node] 11: 0x1540686 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node] 12: 0x7ecdc3cd9ef6 Log after the exclude feature applied INFO Running in conda environment: ['scip-python', 'index', '--cwd', '/home/zhongming/.codeminer/sympy_sympy', '--project-name', 'test_swebench', '--output', '/home/zhongming/.codeminer/sympy__sympy-27223/index.scip', '--exclude', 'sympy/polys/numberfields/resolvent_lookup.py', '--exclude', 'test_*'] (13:28:16) No configuration file found. (13:28:16) pyproject.toml file found at /home/zhongming/.codeminer/sympy_sympy. (13:28:16) Loading pyproject.toml file at /home/zhongming/.codeminer/sympy_sympy/pyproject.toml Assuming Python version 3.11 Assuming Python platform Linux Auto-excluding **/node_modules Auto-excluding **/**pycache** Auto-excluding **/.* (13:28:16) Total Project Files 915 (13:28:16) Indexing /home/zhongming/.codeminer/sympy_sympy with version d293133e81194adc11177729af91c970f092a6e7 (13:28:16) Evaluating python environment dependencies (13:28:17) Gathering environment information (13:28:17) Parse and search for dependencies (13:28:28) 101 / 915 (13:28:43) 226 / 915 (13:28:53) 515 / 915 (13:29:04) 591 / 915 (13:29:14) 786 / 915 (13:29:17) Index workspace and track project files (13:29:17) Analyze project and dependencies (13:29:27) 28 / 917 (13:29:37) 145 / 917 (13:29:49) 163 / 917 (13:29:59) 480 / 917 (13:30:11) 508 / 917 (13:30:21) 684 / 917 (13:30:28) Parse and emit SCIP (13:30:29) - (14/916): /home/zhongming/.codeminer/sympy_sympy/sympy/assumptions/facts.py (13:30:30) - (48/916): /home/zhongming/.codeminer/sympy_sympy/sympy/calculus/tests/**init**.py (13:30:31) - (74/916): /home/zhongming/.codeminer/sympy_sympy/sympy/combinatorics/free_groups.py (13:30:33) - (85/916): /home/zhongming/.codeminer/sympy_sympy/sympy/combinatorics/permutations.py (13:30:34) - (120/916): /home/zhongming/.codeminer/sympy_sympy/sympy/core/expr.py (13:30:35) - (129/916): /home/zhongming/.codeminer/sympy_sympy/sympy/core/multidimensional.py (13:30:36) - (177/916): /home/zhongming/.codeminer/sympy_sympy/sympy/functions/elementary/exponential.py (13:30:38) - (218/916): /home/zhongming/.codeminer/sympy_sympy/sympy/holonomic/holonomicerrors.py (13:30:39) - (264/916): /home/zhongming/.codeminer/sympy_sympy/sympy/logic/inference.py (13:30:40) - (339/916): /home/zhongming/.codeminer/sympy_sympy/sympy/ntheory/generate.py (13:30:41) - (355/916): /home/zhongming/.codeminer/sympy_sympy/sympy/parsing/autolev/_listener_autolev_antlr.py (13:30:42) - (381/916): /home/zhongming/.codeminer/sympy_sympy/sympy/parsing/latex/**init**.py (13:30:43) - (474/916): /home/zhongming/.codeminer/sympy_sympy/sympy/physics/quantum/qft.py (13:30:44) - (573/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/polyconfig.py (13:30:46) - (581/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/polyutils.py (13:30:47) - (654/916): /home/zhongming/.codeminer/sympy_sympy/sympy/polys/numberfields/galoisgroups.py (13:30:48) - (699/916): /home/zhongming/.codeminer/sympy_sympy/sympy/printing/pretty/pretty_symbology.py (13:30:49) - (771/916): /home/zhongming/.codeminer/sympy_sympy/sympy/solvers/solveset.py (13:30:50) - (808/916): /home/zhongming/.codeminer/sympy_sympy/sympy/stats/sampling/**init**.py (13:30:51) - (832/916): /home/zhongming/.codeminer/sympy_sympy/sympy/tensor/toperators.py (13:30:52) - (902/916): /home/zhongming/.codeminer/sympy_sympy/sympy/vector/deloperator.py (13:30:53) Writing external symbols to SCIP index (13:30:53) Sucessfully wrote SCIP index to /home/zhongming/.codeminer/sympy__sympy-27223/index.scip

Open Graph Description: Motivation The motivation of the PR is in many repositories; we don't want to include some files, e.g., tests*) and it may also include a file that would be either broken or meaningless. Howeve...

X Description: Motivation The motivation of the PR is in many repositories; we don&#39;t want to include some files, e.g., tests*) and it may also include a file that would be either broken or meaningless. Ho...

Opengraph URL: https://github.com/sourcegraph/scip-python/pull/199

X: @github

direct link

Domain: patch-diff.githubusercontent.com

route-pattern/:user_id/:repository/pull/:id/checks(.:format)
route-controllerpull_requests
route-actionchecks
fetch-noncev2:860a5f46-7525-4365-2a71-8dd6de790203
current-catalog-service-hash87dc3bc62d9b466312751bfd5f889726f4f1337bdff4e8be7da7c93d6c00a25a
request-idCDFA:178552:13B736D:193B8FA:69919BAF
html-safe-nonce098a060cac1dbc65503ece4b407cc4b6d770fb711d444c19f260abf4658bc05e
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDREZBOjE3ODU1MjoxM0I3MzZEOjE5M0I4RkE6Njk5MTlCQUYiLCJ2aXNpdG9yX2lkIjoiODc4NDA1NDc4NDY5OTI0MzQzOSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac04a1c8f9cee631a85fb33c1d676a2fea64265b3dbba04521f0a4a61de8d87fcc
hovercard-subject-tagpull_request:2905652842
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,checks,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///pull_requests/show/checks
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/sourcegraph/scip-python/pull/199/checks
twitter:imagehttps://avatars.githubusercontent.com/u/46808682?s=400&v=4
twitter:cardsummary_large_image
og:imagehttps://avatars.githubusercontent.com/u/46808682?s=400&v=4
og:image:altMotivation The motivation of the PR is in many repositories; we don't want to include some files, e.g., tests*) and it may also include a file that would be either broken or meaningless. Howeve...
og:site_nameGitHub
og:typeobject
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-preview
go-importgithub.com/sourcegraph/scip-python git https://github.com/sourcegraph/scip-python.git
octolytics-dimension-user_id3979584
octolytics-dimension-user_loginsourcegraph
octolytics-dimension-repository_id443942523
octolytics-dimension-repository_nwosourcegraph/scip-python
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id443942523
octolytics-dimension-repository_network_root_nwosourcegraph/scip-python
turbo-body-classeslogged-out env-production page-responsive full-width full-width-p-0
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release848bc6032dcc93a9a7301dcc3f379a72ba13b96e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks#start-of-content
https://patch-diff.githubusercontent.com/
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsourcegraph%2Fscip-python%2Fpull%2F199%2Fchecks
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsourcegraph%2Fscip-python%2Fpull%2F199%2Fchecks
Sign up https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fpull_requests%2Fshow%2Fchecks&source=header-repo&source_repo=sourcegraph%2Fscip-python
Reloadhttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
Reloadhttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
Reloadhttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
sourcegraph https://patch-diff.githubusercontent.com/sourcegraph
scip-pythonhttps://patch-diff.githubusercontent.com/sourcegraph/scip-python
Notifications https://patch-diff.githubusercontent.com/login?return_to=%2Fsourcegraph%2Fscip-python
Fork 41 https://patch-diff.githubusercontent.com/login?return_to=%2Fsourcegraph%2Fscip-python
Star 79 https://patch-diff.githubusercontent.com/login?return_to=%2Fsourcegraph%2Fscip-python
Code https://patch-diff.githubusercontent.com/sourcegraph/scip-python
Issues 9 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/issues
Pull requests 22 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pulls
Actions https://patch-diff.githubusercontent.com/sourcegraph/scip-python/actions
Security 0 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/security
Insights https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pulse
Code https://patch-diff.githubusercontent.com/sourcegraph/scip-python
Issues https://patch-diff.githubusercontent.com/sourcegraph/scip-python/issues
Pull requests https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pulls
Actions https://patch-diff.githubusercontent.com/sourcegraph/scip-python/actions
Security https://patch-diff.githubusercontent.com/sourcegraph/scip-python/security
Insights https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pulse
Sign up for GitHub https://patch-diff.githubusercontent.com/signup?return_to=%2Fsourcegraph%2Fscip-python%2Fissues%2Fnew%2Fchoose
terms of servicehttps://docs.github.com/terms
privacy statementhttps://docs.github.com/privacy
Sign inhttps://patch-diff.githubusercontent.com/login?return_to=%2Fsourcegraph%2Fscip-python%2Fissues%2Fnew%2Fchoose
fishmingyuhttps://patch-diff.githubusercontent.com/fishmingyu
sourcegraph:sciphttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/tree/scip
fishmingyu:exclude-confighttps://patch-diff.githubusercontent.com/fishmingyu/scip-python/tree/exclude-config
Conversation 0 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199
Commits 3 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/commits
Checks 1 https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
Files changed https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/files
Please reload this pagehttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
Please reload this pagehttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
Feature: Add file/directory exclusion feature with glob pattern support https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks#top
Please reload this pagehttps://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks
semgrep-cloud-platform/scan https://patch-diff.githubusercontent.com/sourcegraph/scip-python/pull/199/checks?check_run_id=52493912802
View more details on Semgrep Code - sourcegraph https://semgrep.dev/orgs/sourcegraph/projects/3641495/scans/99540387
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.