Title: Dispatched sql from dry run does not match sql from Dataframe attribute · Issue #2191 · googleapis/python-bigquery-dataframes · GitHub
Open Graph Title: Dispatched sql from dry run does not match sql from Dataframe attribute · Issue #2191 · googleapis/python-bigquery-dataframes
X Title: Dispatched sql from dry run does not match sql from Dataframe attribute · Issue #2191 · googleapis/python-bigquery-dataframes
Description: I have noticed a wild discrepancy between the sql obtained from DataFrame.sql and the dispatched sql from DataFrame.to_pandas(dry_run=True). The sql code I get from DataFrame.sql is much cleaner and, accordingly to the big query UI, it w...
Open Graph Description: I have noticed a wild discrepancy between the sql obtained from DataFrame.sql and the dispatched sql from DataFrame.to_pandas(dry_run=True). The sql code I get from DataFrame.sql is much cleaner an...
X Description: I have noticed a wild discrepancy between the sql obtained from DataFrame.sql and the dispatched sql from DataFrame.to_pandas(dry_run=True). The sql code I get from DataFrame.sql is much cleaner an...
Opengraph URL: https://github.com/googleapis/python-bigquery-dataframes/issues/2191
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Dispatched sql from dry run does not match sql from Dataframe attribute","articleBody":"I have noticed a wild discrepancy between the sql obtained from `DataFrame.sql` and the dispatched sql from `DataFrame.to_pandas(dry_run=True)`. The sql code I get from `DataFrame.sql` is much cleaner and, accordingly to the big query UI, it would consume much less bytes when executed (in my example 156.35 MB). The dispatched sql obtained via `DataFrame.to_pandas(dry_run=True)` is much heavier and less optimised, requiring a full table scan (~2TB in my example). As far as I understood, the sql from `DataFrame.sql` does not rely on any cached table.\n\nIt would be nice if the dispatched sql would be the optimised one in order to avoid a full table scan on partitioned tables.\n\n#### Environment details\n\n - OS type and version: Debian GNU/Linux 11 (bullseye)\n - Python version: 3.10.16\n - pip version: `pip 25.0.1`\n - `bigframes` version: `2.26.0`\n\n\n```python\nPython: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0]\nbigframes==2.26.0\ngoogle-cloud-bigquery==3.38.0\npandas==2.3.3\npyarrow==21.0.0\nsqlglot==27.28.1\n```\n\n#### Code example\n\n```python\nimport datetime\nimport bigframes.pandas as bpd\n\nbpd.options.bigquery.project = \"\u003credacted\u003e\"\ndf = bpd.read_gbq(\"bigquery-public-data.crypto_bitcoin.transactions\")\n# This table is partitioned by block_timestamp_month\nselection = df[df.block_timestamp_month == datetime.date(2025, 10, 1)][[\"block_timestamp_month\", \"size\"]]\ndry_run = selection.to_pandas(dry_run=True)\n```\n\n```python\ndry_run[dry_run.index == \"totalBytesProcessed\"].values[0]\n#Output: 2365056405079\n# ~2.3TB\n```\n\n```python\nselection.sql\n```\nOutputs\n```sql\nSELECT\n`bfuid_col_12` AS `block_timestamp_month`,\n`bfuid_col_5` AS `size`\nFROM\n(SELECT\n `t0`.`size`,\n `t0`.`block_timestamp_month`,\n `t0`.`size` AS `bfuid_col_5`,\n `t0`.`block_timestamp_month` AS `bfuid_col_12`,\n `t0`.`block_timestamp_month` = DATE(2025, 10, 1) AS `bfuid_col_21`\nFROM (\n SELECT\n `size`,\n `block_timestamp_month`\n FROM `bigquery-public-data.crypto_bitcoin.transactions` FOR SYSTEM_TIME AS OF TIMESTAMP('2025-10-23T12:47:48.073960+00:00')\n) AS `t0`\nWHERE\n `t0`.`block_timestamp_month` = DATE(2025, 10, 1))\n```\n\nWhich gives an estimated processed bytes of 156.35 MB when pasted on the big query editor.\n\nChecking the dispatched sql I get:\n\n```python\nprint(dry_run[dry_run.index == \"dispatchedSql\"].values[0])\n```\n\n```sql\nSELECT\n`bfuid_col_3` AS `bfuid_col_3`,\n`bfuid_col_12` AS `bfuid_col_12`,\n`bfuid_col_5` AS `bfuid_col_5`\nFROM\n(SELECT\n `t1`.`bfuid_col_3`,\n `t1`.`bfuid_col_12`,\n `t1`.`bfuid_col_5`,\n `t1`.`bfuid_col_28` AS `bfuid_col_29`\nFROM (\n SELECT\n ROW_NUMBER() OVER (\n ORDER BY CONCAT(\n CAST(FARM_FINGERPRINT(\n CONCAT(\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`version` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`lock_time` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`block_hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_number` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT(\n '\\\\',\n REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month` AS STRING), ''), '\\\\', '\\\\\\\\')\n ),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`fee` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`), ''), '\\\\', '\\\\\\\\'))\n )\n ) AS STRING),\n CAST(FARM_FINGERPRINT(\n CONCAT(\n CONCAT(\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`version` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`lock_time` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`block_hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_number` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT(\n '\\\\',\n REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month` AS STRING), ''), '\\\\', '\\\\\\\\')\n ),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`fee` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`), ''), '\\\\', '\\\\\\\\'))\n ),\n '_'\n )\n ) AS STRING),\n CAST(RAND() AS STRING)\n ) ASC\n ) - 1 AS `bfuid_col_3`,\n `t0`.`size` AS `bfuid_col_5`,\n `t0`.`block_timestamp_month` AS `bfuid_col_12`,\n `t0`.`block_timestamp_month` = DATE(2025, 10, 1) AS `bfuid_col_21`,\n ROW_NUMBER() OVER (\n ORDER BY CONCAT(\n CAST(FARM_FINGERPRINT(\n CONCAT(\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`version` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`lock_time` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`block_hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_number` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT(\n '\\\\',\n REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month` AS STRING), ''), '\\\\', '\\\\\\\\')\n ),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`fee` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`), ''), '\\\\', '\\\\\\\\'))\n )\n ) AS STRING),\n CAST(FARM_FINGERPRINT(\n CONCAT(\n CONCAT(\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`version` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`lock_time` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(`t0`.`block_hash`, ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_number` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT(\n '\\\\',\n REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month` AS STRING), ''), '\\\\', '\\\\\\\\')\n ),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_count` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`input_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`output_value` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(CAST(`t0`.`fee` AS STRING), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`), ''), '\\\\', '\\\\\\\\')),\n CONCAT('\\\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`), ''), '\\\\', '\\\\\\\\'))\n ),\n '_'\n )\n ) AS STRING),\n CAST(RAND() AS STRING)\n ) ASC\n ) - 1 AS `bfuid_col_28`\n FROM (\n SELECT\n `hash`,\n `size`,\n `virtual_size`,\n `version`,\n `lock_time`,\n `block_hash`,\n `block_number`,\n `block_timestamp`,\n `block_timestamp_month`,\n `input_count`,\n `output_count`,\n `input_value`,\n `output_value`,\n `is_coinbase`,\n `fee`,\n `inputs`,\n `outputs`\n FROM `bigquery-public-data.crypto_bitcoin.transactions` FOR SYSTEM_TIME AS OF TIMESTAMP('2025-10-23T12:47:48.073960+00:00')\n ) AS `t0`\n) AS `t1`\nWHERE\n `t1`.`bfuid_col_21`)\nORDER BY `bfuid_col_29` ASC NULLS LAST\n```\n","author":{"url":"https://github.com/ClaudioSalvatoreArcidiacono","@type":"Person","name":"ClaudioSalvatoreArcidiacono"},"datePublished":"2025-10-23T13:12:09.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/2191/python-bigquery-dataframes/issues/2191"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:1801e5db-1b05-e675-5f4f-8b9f755e4757 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | DC0A:36CCCB:14843E8:1BA2F21:6964A61D |
| html-safe-nonce | a39a8dbadab44ae9f5eb0b7290cd4f054ab8cab5278652ef2b301c106869b5c1 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEQzBBOjM2Q0NDQjoxNDg0M0U4OjFCQTJGMjE6Njk2NEE2MUQiLCJ2aXNpdG9yX2lkIjoiNjIyNjkyMzE0NzQ4MzE5NDkwOSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 8b4b5560b334a85691d40fe28b56b05cc1f2b598927d024c1bd8cd719456f593 |
| hovercard-subject-tag | issue:3544644936 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/googleapis/python-bigquery-dataframes/2191/issue_layout |
| twitter:image | https://opengraph.githubassets.com/b40b8c2aad0ae87e3cbcad4b78618886b47679bc6125ff81f1ea032864581e16/googleapis/python-bigquery-dataframes/issues/2191 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/b40b8c2aad0ae87e3cbcad4b78618886b47679bc6125ff81f1ea032864581e16/googleapis/python-bigquery-dataframes/issues/2191 |
| og:image:alt | I have noticed a wild discrepancy between the sql obtained from DataFrame.sql and the dispatched sql from DataFrame.to_pandas(dry_run=True). The sql code I get from DataFrame.sql is much cleaner an... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ClaudioSalvatoreArcidiacono |
| hostname | github.com |
| expected-hostname | github.com |
| None | 15579c46431b7fd25941c3b09010f74fd1890c7a35226839bbbf40ce70fb3057 |
| turbo-cache-control | no-preview |
| go-import | github.com/googleapis/python-bigquery-dataframes git https://github.com/googleapis/python-bigquery-dataframes.git |
| octolytics-dimension-user_id | 16785467 |
| octolytics-dimension-user_login | googleapis |
| octolytics-dimension-repository_id | 667598363 |
| octolytics-dimension-repository_nwo | googleapis/python-bigquery-dataframes |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 667598363 |
| octolytics-dimension-repository_network_root_nwo | googleapis/python-bigquery-dataframes |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 499abb347cb197601d399c346cfeb4b3fa135d5c |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width