Title: Add support for scalar values with extension types · Issue #1301 · apache/datafusion-python · GitHub
Open Graph Title: Add support for scalar values with extension types · Issue #1301 · apache/datafusion-python
X Title: Add support for scalar values with extension types · Issue #1301 · apache/datafusion-python
Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. Suppose I have a pyarrow scalar value that contains an extension type. If I try turning that into a literal expression in datafusion, w...
Open Graph Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. Suppose I have a pyarrow scalar value that contains an extension type. If I try turning that in...
X Description: Is your feature request related to a problem or challenge? Please describe what you are trying to do. Suppose I have a pyarrow scalar value that contains an extension type. If I try turning that in...
Opengraph URL: https://github.com/apache/datafusion-python/issues/1301
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Add support for scalar values with extension types","articleBody":"**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**\n\nSuppose I have a pyarrow scalar value that contains an extension type. If I try turning that into a literal expression in datafusion, we should get the associated metadata transparently to the user.\n\nConsider this minimal example:\n\n```python\nimport pyarrow as pa\nimport uuid\nfrom datafusion import lit\n\nvalue = pa.scalar(uuid.uuid4().bytes, pa.uuid())\n\nprint(lit(value))\n```\n\nThis currently fails with `ArrowTypeError: Expected bytes, got a 'UUID' object`. That can be overcome with the simple patch\n\n```patch\n--- a/src/pyarrow_util.rs\n+++ b/src/pyarrow_util.rs\n@@ -30,7 +30,11 @@ impl FromPyArrow for PyScalarValue {\n fn from_pyarrow_bound(value: \u0026Bound\u003c'_, PyAny\u003e) -\u003e PyResult\u003cSelf\u003e {\n let py = value.py();\n let typ = value.getattr(\"type\")?;\n- let val = value.call_method0(\"as_py\")?;\n+ let val = if value.hasattr(\"value\")? {\n+ value.getattr(\"value\")?\n+ } else {\n+ value.call_method0(\"as_py\")?\n+ };\n```\n\nBut then we still don't have the metadata. It is lost and we get a bare fixed sized binary.\n\n**Describe the solution you'd like**\n\nThe above code should *just work*. I have done a little investigation and using the pycapsule interface we *can* get the schema of the array we generate inside `PyScalarValue::from_pyarrow_bound`. We can then plumb this through when calling `lit()`.\n\nIdeally we would take this opportunity to ensure that when we call `PyScalarValue::from_pyarrow_bound` we are also supporting other libraries besides just `pyarrow`. There has been a complaint a few times that we are too tightly coupled to `pyarrow`. In particular it would be good to demonstrate that when converting a Python object that is a scalar value it works for:\n\n- pyarrow\n- nanoarrow\n- arro3\n- polars\n\nI don't think we necessarily need to support pandas since they are not an Arrow library.\n\n**Describe alternatives you've considered**\n\nAlternatively the user can manually turn their data into the underlying storage and then attach the metadata from their extension type. This feels like a poor user experience.\n\n**Additional context**\n\nThis came up during a different investigation:\n\n\u003e Also worth evaluating while we're doing this: For scalar values, is it possible for them to contain metadata? If I do `pa.scalar(uuid.uuid4().bytes, type=pa.uuid())` and I check the `type` I should have the extension data. Maybe this is already supported, but as part of this PR I want to evaluate that as well.\n\n_Originally posted by @timsaucer in https://github.com/apache/datafusion-python/issues/1299#issuecomment-3497558869_\n","author":{"url":"https://github.com/timsaucer","@type":"Person","name":"timsaucer"},"datePublished":"2025-11-06T22:19:42.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/1301/datafusion-python/issues/1301"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:db39617e-68c6-ab3a-8cee-ba7b05feee64 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 88D2:12A68A:17F1DC1:1F511D5:6976C5FF |
| html-safe-nonce | 411a4c3ef6756ccad1b67793d4c82dbbaa87567f26060c92962d2da66423ce1e |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4OEQyOjEyQTY4QToxN0YxREMxOjFGNTExRDU6Njk3NkM1RkYiLCJ2aXNpdG9yX2lkIjoiMzA0ODM4MDIxODY1ODQ0MDcwMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 15eed6eeaa9fa7b0d0e0b1aa66c5fcccdf1c02561ac83c11d74a96d0000daba8 |
| hovercard-subject-tag | issue:3597871719 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/apache/datafusion-python/1301/issue_layout |
| twitter:image | https://opengraph.githubassets.com/c46dfe412246092ca6a77215d3618c3ba493c399bf89bb2fd225583435faa782/apache/datafusion-python/issues/1301 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/c46dfe412246092ca6a77215d3618c3ba493c399bf89bb2fd225583435faa782/apache/datafusion-python/issues/1301 |
| og:image:alt | Is your feature request related to a problem or challenge? Please describe what you are trying to do. Suppose I have a pyarrow scalar value that contains an extension type. If I try turning that in... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | timsaucer |
| hostname | github.com |
| expected-hostname | github.com |
| None | 032152924a283b83384255d9489e7b93b54ba01da8d380b05ecd3953b3212411 |
| turbo-cache-control | no-preview |
| go-import | github.com/apache/datafusion-python git https://github.com/apache/datafusion-python.git |
| octolytics-dimension-user_id | 47359 |
| octolytics-dimension-user_login | apache |
| octolytics-dimension-repository_id | 515951203 |
| octolytics-dimension-repository_nwo | apache/datafusion-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 515951203 |
| octolytics-dimension-repository_network_root_nwo | apache/datafusion-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 5b577f6be6482e336e3c30e8daefa30144947b17 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width