Title: Remove pyarrow as required dependency, relying on Arrow PyCapsule Interface · Issue #1227 · apache/datafusion-python · GitHub
Open Graph Title: Remove pyarrow as required dependency, relying on Arrow PyCapsule Interface · Issue #1227 · apache/datafusion-python
X Title: Remove pyarrow as required dependency, relying on Arrow PyCapsule Interface · Issue #1227 · apache/datafusion-python
Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, and, until the latest versions (I think?) also required numpy...
Open Graph Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, and, until the latest...
X Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, and, until the lat...
Opengraph URL: https://github.com/apache/datafusion-python/issues/1227
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Remove pyarrow as required dependency, relying on Arrow PyCapsule Interface","articleBody":"**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**\n\nPyArrow is a massive dependency. Unpacked, it tends to be \u003e100MB in size, and, until the latest versions (I think?) also required numpy as its own non-optional dependency.\n\nIt's also, in effect the only current dependency\nhttps://github.com/apache/datafusion-python/blob/f0bbad7543717c5f08ba2acb92d42c9d30fd2355/pyproject.toml#L46\n\nIt would be great if we could remove it, and that would greatly lessen the minimal environment size for datafusion python.\n\n[Many other Python Arrow libraries](https://github.com/apache/arrow/issues/39195#issuecomment-2245718008) implement the PyCapsule Interface, so the user can use nanoarrow, arro3, Polars, DuckDB, etc, or pyarrow. Whatever is best for them.\n\n**Describe the solution you'd like**\n\nThe Arrow PyCapsule Interface is a lightweight, decentralized protocol for sharing Arrow data between Python libraries. We already implement the PyCapsule Interface, so it's just a matter of removing places where we hard-code use of pyarrow.\n\n**Describe alternatives you've considered**\n\nKeep pyarrow dependency.\n\n**Additional context**\n","author":{"url":"https://github.com/kylebarron","@type":"Person","name":"kylebarron"},"datePublished":"2025-09-03T17:16:12.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":14},"url":"https://github.com/1227/datafusion-python/issues/1227"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:dba89533-afc4-73a9-e145-e9614c00ee77 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9126:9B160:127E80:196676:69774659 |
| html-safe-nonce | c7738c718fc5cdc6b43d8a23195a199da4f0f6f1d4d684820d75c71f9de42a53 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5MTI2OjlCMTYwOjEyN0U4MDoxOTY2NzY6Njk3NzQ2NTkiLCJ2aXNpdG9yX2lkIjoiNjY4Nzc3MTExMTEyMDY1MTg2NSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 4422b80ba6648825591852f70a41eaf6247238743612f0b108def60b21e36f89 |
| hovercard-subject-tag | issue:3380639441 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/apache/datafusion-python/1227/issue_layout |
| twitter:image | https://opengraph.githubassets.com/30ee2da0183560f667d82afed0e7c1fff2502b761919f4f38cdfc457307c3299/apache/datafusion-python/issues/1227 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/30ee2da0183560f667d82afed0e7c1fff2502b761919f4f38cdfc457307c3299/apache/datafusion-python/issues/1227 |
| og:image:alt | Is your feature request related to a problem or challenge? Please describe what you are trying to do. PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, and, until the latest... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | kylebarron |
| hostname | github.com |
| expected-hostname | github.com |
| None | 01d198479908d09a841b2febe8eb105a81af2af7d81830960fe0971e1f4adc09 |
| turbo-cache-control | no-preview |
| go-import | github.com/apache/datafusion-python git https://github.com/apache/datafusion-python.git |
| octolytics-dimension-user_id | 47359 |
| octolytics-dimension-user_login | apache |
| octolytics-dimension-repository_id | 515951203 |
| octolytics-dimension-repository_nwo | apache/datafusion-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 515951203 |
| octolytics-dimension-repository_network_root_nwo | apache/datafusion-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | d37e99eca9fba24ee37da98481d573461cc1ab7d |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width