Title: Cannot do udaf that returns list of timestamps · Issue #1339 · apache/datafusion-python · GitHub
Open Graph Title: Cannot do udaf that returns list of timestamps · Issue #1339 · apache/datafusion-python
X Title: Cannot do udaf that returns list of timestamps · Issue #1339 · apache/datafusion-python
Description: Describe the bug I'm trying to generate a udaf that returns multiple timestamps for each partition id. To Reproduce import datafusion as dfn from datafusion import udf, udaf, Accumulator, col import pyarrow as pa import pyarrow.compute a...
Open Graph Description: Describe the bug I'm trying to generate a udaf that returns multiple timestamps for each partition id. To Reproduce import datafusion as dfn from datafusion import udf, udaf, Accumulator, col impor...
X Description: Describe the bug I'm trying to generate a udaf that returns multiple timestamps for each partition id. To Reproduce import datafusion as dfn from datafusion import udf, udaf, Accumulator, col i...
Opengraph URL: https://github.com/apache/datafusion-python/issues/1339
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Cannot do udaf that returns list of timestamps","articleBody":"**Describe the bug**\nI'm trying to generate a udaf that returns multiple timestamps for each partition id.\n\n**To Reproduce**\n```python\nimport datafusion as dfn\nfrom datafusion import udf, udaf, Accumulator, col\nimport pyarrow as pa\nimport pyarrow.compute as pc\nimport numpy as np\n\nclass ResampleAccumulator(Accumulator):\n def __init__(self):\n self._min = float('inf')\n self._max = 0\n # 10 Hz\n self._timestep = 100 # ms\n\n def update(self, array):\n # Logic to update the sum and count from an input array\n # In a real implementation, you would process the pyarrow array efficiently\n print(\"Enter update\")\n local_min, local_max = pc.min_max(array).values()\n local_min_ns = local_min.cast(pa.timestamp('ns')).value\n local_max_ns = local_max.cast(pa.timestamp('ns')).value\n\n self._min = min(local_min_ns, self._min)\n self._max = max(local_max_ns, self._max)\n print(f\"update {self._min=}, {self._max=}\")\n\n def merge(self, states_array):\n print(\"Enter merge\")\n # Is there a better way to do this with pc?\n # or maybe just throw it into numpy\n self._min = min(states_array[0][0].as_py(), self._min)\n self._max = max(states_array[1][0].as_py(), self._max)\n print(f\"merge {self._min=}, {self._max=}\")\n\n\n def state(self):\n print(\"Enter state\")\n # Return the current state as a list of scalars\n return pa.array([self._min, self._max], type=pa.int64())\n\n def evaluate(self):\n print(\"Enter evaluate\")\n desired_timestamps = np.arange(np.datetime64(self._min, 'ns'), np.datetime64(self._max, 'ns'), np.timedelta64(self._timestep, \"ms\"))\n print(f\"{len(desired_timestamps)=}\")\n array_result = pa.array(desired_timestamps, type=pa.timestamp('ns'))\n print(array_result)\n return array_result\nresample_udaf = udaf(ResampleAccumulator, [pa.timestamp('ns')], pa.list_(pa.timestamp('ns')), [pa.int64(), pa.int64()], volatility=\"stable\")\n\nctx = dfn.SessionContext()\n\ndf = ctx.from_pydict({\"id\": [0,1], \"time\": [np.datetime64(0, 'ns'), np.datetime64(1_000_000_000, 'ns')]})\nprint(df)\n\nresult = df.aggregate(\n \"id\",\n [resample_udaf(col(\"time\"))]\n)\nprint(result.schema())\nresult.collect()\n```\nOutput\n```bash\nTraceback (most recent call last):\n File \"\u003cpath\u003e/\u003cfile\u003e.py\", line 60, in \u003cmodule\u003e\n result.collect()\n File \"\u003cpath\u003e/.venv/lib/python3.12/site-packages/datafusion/dataframe.py\", line 729, in collect\n return self.df.collect()\n ^^^^^^^^^^^^^^^^^\nException: DataFusion error: Execution(\"ArrowTypeError: object of type \u003cclass 'pyarrow.lib.TimestampArray'\u003e cannot be converted to int\")\n```\n\n**Expected behavior**\nThis works or provides a clearer error.\n\n**Additional context**\nFails on datafusion 51. If I return just a single timestamp and update the udaf call then this works.","author":{"url":"https://github.com/ntjohnson1","@type":"Person","name":"ntjohnson1"},"datePublished":"2026-01-15T15:53:22.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/1339/datafusion-python/issues/1339"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:c51dc5dc-25f2-3385-41b3-60ebc9f4671f |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | C434:DA633:E97F640:12F1206C:69772E10 |
| html-safe-nonce | 549fc8c17643b842d19c3c9e28931054c059cdffa3cc18e09ea84708ed59f18f |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDNDM0OkRBNjMzOkU5N0Y2NDA6MTJGMTIwNkM6Njk3NzJFMTAiLCJ2aXNpdG9yX2lkIjoiNDEwNzc2MDQyNjY3OTc4MjU2IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | 4db1ab2fc88bb79f3bc0f6bfdee46901148cde275d165c037765d0fef7eedd8a |
| hovercard-subject-tag | issue:3818124290 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/apache/datafusion-python/1339/issue_layout |
| twitter:image | https://opengraph.githubassets.com/ccec414f6b986cbb39712e48e34e0f2a85a704056f4f7b07b71f8080916d40e3/apache/datafusion-python/issues/1339 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/ccec414f6b986cbb39712e48e34e0f2a85a704056f4f7b07b71f8080916d40e3/apache/datafusion-python/issues/1339 |
| og:image:alt | Describe the bug I'm trying to generate a udaf that returns multiple timestamps for each partition id. To Reproduce import datafusion as dfn from datafusion import udf, udaf, Accumulator, col impor... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ntjohnson1 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 01d198479908d09a841b2febe8eb105a81af2af7d81830960fe0971e1f4adc09 |
| turbo-cache-control | no-preview |
| go-import | github.com/apache/datafusion-python git https://github.com/apache/datafusion-python.git |
| octolytics-dimension-user_id | 47359 |
| octolytics-dimension-user_login | apache |
| octolytics-dimension-repository_id | 515951203 |
| octolytics-dimension-repository_nwo | apache/datafusion-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 515951203 |
| octolytics-dimension-repository_network_root_nwo | apache/datafusion-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | f752335dbbea672610081196a1998e39aec5e14b |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width