Title: first_value doesn't work when applied to window function output · Issue #1300 · apache/datafusion-python · GitHub
Open Graph Title: first_value doesn't work when applied to window function output · Issue #1300 · apache/datafusion-python
X Title: first_value doesn't work when applied to window function output · Issue #1300 · apache/datafusion-python
Description: Describe the bug A clear and concise description of what the bug is. If I generate a column based on a window function then try to filter and select the first value it barfs To Reproduce Steps to reproduce the behavior: import datafusion...
Open Graph Description: Describe the bug A clear and concise description of what the bug is. If I generate a column based on a window function then try to filter and select the first value it barfs To Reproduce Steps to r...
X Description: Describe the bug A clear and concise description of what the bug is. If I generate a column based on a window function then try to filter and select the first value it barfs To Reproduce Steps to r...
Opengraph URL: https://github.com/apache/datafusion-python/issues/1300
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"first_value doesn't work when applied to window function output","articleBody":"**Describe the bug**\nA clear and concise description of what the bug is.\nIf I generate a column based on a window function then try to filter and select the first value it barfs\n\n**To Reproduce**\nSteps to reproduce the behavior:\n```python\nimport datafusion as dfn\nfrom datafusion import lit, col, functions as F\nfrom datafusion.expr import Window, WindowFrame\n\ndef main() -\u003e None:\n ctx = dfn.SessionContext()\n df = ctx.from_pydict(\n {\"any_row\": list(range(10))},\n )\n df = df.select(\n \"any_row\",\n lit(1).alias(\"ones\"),\n )\n df = df.select(\n \"any_row\",\n F.sum(col(\"ones\"))\\\n .over(Window(window_frame=WindowFrame(\"rows\", None, 0), order_by=col(\"any_row\").sort(ascending=True))) \\\n .alias(\"forward_row_sum\"),\n F.sum(col(\"ones\"))\\\n .over(Window(window_frame=WindowFrame(\"rows\", None, 0), order_by=col(\"any_row\").sort(ascending=False))) \\\n .alias(\"reverse_row_sum\"),\n )\n df.collect()\n df.select(\n F.first_value(col(\"forward_row_sum\"), order_by=col(\"any_row\"))\n ).collect()\n\n df.select(\n F.last_value(col(\"reverse_row_sum\"), filter=col(\"reverse_row_sum\") \u003e= 5, order_by=col(\"any_row\").sort(ascending=True))\n ).collect()\n\nif __name__ == \"__main__\":\n main()\n```\n\n```console\nTraceback (most recent call last):\n File \"/Users/nick/repos/bug.py\", line 39, in \u003cmodule\u003e\n main()\n ~~~~^^\n File \"/Users/nick/repos/bug.py\", line 26, in main\n ).collect()\n ~~~~~~~^^\n File \"/Users/nick/repos/.venv/lib/python3.13/site-packages/datafusion/dataframe.py\", line 681, in collect\n return self.df.collect()\n ~~~~~~~~~~~~~~~^^\nException: DataFusion error: NotImplemented(\"Physical plan does not support logical expression AggregateFunction(AggregateFunction { func: AggregateUDF { inner: FirstValue { name: \\\"first_value\\\", signature: Signature { type_signature: Any(1), volatility: Immutable }, accumulator: \\\"\u003cFUNC\u003e\\\" } }, params: AggregateFunctionParams { args: [Column(Column { relation: None, name: \\\"sum(ones) ORDER BY [c19e557aec20e49b985bb070e969ba68f.any_row ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING\\\" })], distinct: false, filter: None, order_by: [Sort { expr: Column(Column { relation: Some(Bare { table: \\\"c19e557aec20e49b985bb070e969ba68f\\\" }), name: \\\"any_row\\\" }), asc: true, nulls_first: true }], null_treatment: Some(RespectNulls) } })\")\n```\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\nThat I get the first (or last) value.\n\n**Additional context**\nAdd any other context about the problem here.\n```python\nimport datafusion as dfn\n\u003e\u003e\u003e dfn.__version__\n'50.1.0'\n```","author":{"url":"https://github.com/ntjohnson1","@type":"Person","name":"ntjohnson1"},"datePublished":"2025-11-05T20:22:38.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":5},"url":"https://github.com/1300/datafusion-python/issues/1300"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:0dc4caa6-572c-12e6-3aa0-3b65254c0dfb |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | C370:2C22F5:7F5D89:A5885B:6976DA1D |
| html-safe-nonce | 388d82740e5a5ae5b07c1a8af5d3b22c4b735416608dcdf41980188e20d6753d |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDMzcwOjJDMjJGNTo3RjVEODk6QTU4ODVCOjY5NzZEQTFEIiwidmlzaXRvcl9pZCI6IjQ4MzU0NDA3MDgwNzgxOTcyNzciLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 8da504213c7d71109efdfa2752fdd8bf637e646cf46e566c63d1f09bbafab451 |
| hovercard-subject-tag | issue:3592659688 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/apache/datafusion-python/1300/issue_layout |
| twitter:image | https://opengraph.githubassets.com/da45c6f77ae971813a4effab250a3b23851032320d8a82f8d940381cd3217179/apache/datafusion-python/issues/1300 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/da45c6f77ae971813a4effab250a3b23851032320d8a82f8d940381cd3217179/apache/datafusion-python/issues/1300 |
| og:image:alt | Describe the bug A clear and concise description of what the bug is. If I generate a column based on a window function then try to filter and select the first value it barfs To Reproduce Steps to r... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ntjohnson1 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 01d198479908d09a841b2febe8eb105a81af2af7d81830960fe0971e1f4adc09 |
| turbo-cache-control | no-preview |
| go-import | github.com/apache/datafusion-python git https://github.com/apache/datafusion-python.git |
| octolytics-dimension-user_id | 47359 |
| octolytics-dimension-user_login | apache |
| octolytics-dimension-repository_id | 515951203 |
| octolytics-dimension-repository_nwo | apache/datafusion-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 515951203 |
| octolytics-dimension-repository_network_root_nwo | apache/datafusion-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | f752335dbbea672610081196a1998e39aec5e14b |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width