Title: How to insert into VARIANT column? · Issue #681 · databricks/databricks-sql-python · GitHub
Open Graph Title: How to insert into VARIANT column? · Issue #681 · databricks/databricks-sql-python
X Title: How to insert into VARIANT column? · Issue #681 · databricks/databricks-sql-python
Description: As we all know, Databricks has a VARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column using databricks-sql-connector==4.0.5 thoug...
Open Graph Description: As we all know, Databricks has a VARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column usi...
X Description: As we all know, Databricks has a VARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column...
Opengraph URL: https://github.com/databricks/databricks-sql-python/issues/681
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"How to insert into VARIANT column?","articleBody":"As we all know, Databricks has a [VARIANT data type](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-datatypes) that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column using databricks-sql-connector==4.0.5 though.\n\nIf you want to write to a [PostgreSQL JSON column in psycopg2](https://stackoverflow.com/a/31796487), you use json dump. This trick did not seem to work here. I also tried custom escaped JSON and simply dict, but also no.\n\n```python\nimport databricks.sql\nimport json\n\n# Replace with your Databricks SQL warehouse details\nserver_hostname = \"\u003cSERVER_HOSTNAME\u003e\"\nhttp_path = \"\u003cHTTP_PATH\u003e\"\naccess_token = \"\u003cACCESS_TOKEN\u003e\"\n\n# Example data to insert\ncontent = {\n \"age\": 29,\n \"city\": \"New York\"\n}\ndata = [\n (1, \"Alice\", json.dumps(content)),\n 2, \"Bob\", None), # (2, \"Bob\", data) # Raises error\n (3, \"Charlie\", \"\"\"{\"age\": 29, \"city\": \"New York\"}\"\"\"),\n]\n\n# Connect to Databricks SQL\nwith databricks.sql.connect(\n server_hostname=server_hostname,\n http_path=http_path,\n access_token=access_token\n) as connection:\n with connection.cursor() as cursor:\n # Insert data into Delta table\n cursor.execute(\"CREATE OR REPLACE TABLE test_catalog.ad_hoc.variant_test (id INT, name STRING, content VARIANT) USING DELTA\")\n cursor.executemany(\n \"INSERT INTO prod_catalog.ad_hoc.variant_test (id, name, content) VALUES (?, ?, ?)\",\n data\n )\n```\n\nIf I query the resulting table,\n```sql\nSELECT *, content:age, parse_json(\"{\\\"age\\\": 29, \\\"city\\\": \\\"New York\\\"}\") FROM ad_hoc.variant_test ORDER BY id;\n```\n\n1 and 3 succeeds, but it seems to end up as a \"flat\"`STRING` (`schema_of_variant(content)`) instead of `OBJECT\u003cage: BIGINT, city: STRING\u003e`.\n\n\u003cimg width=\"815\" height=\"255\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/4bc58cfe-1a69-47d9-af25-4eb728a76c57\" /\u003e\n\nAttempt 2 (insert a dict) raises the error\n\n```databricks.sql.exc.ServerOperationError: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve \"content\" due to data type mismatch: cannot cast \"MAP\u003cVOID, VOID\u003e\" to \"VARIANT\". SQLSTATE: 42K09; line 1 pos 0```\n\nCould we add documentation for how this is supposed to work? I guess MAP, STRUCT and VARIANT all map to Python type dict in some sense. It is only that VARIANT don't have a predefined schema like MAP or STRUCT, and unlike MAP can be nested?","author":{"url":"https://github.com/excavator-matt","@type":"Person","name":"excavator-matt"},"datePublished":"2025-08-20T07:35:42.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/681/databricks-sql-python/issues/681"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:5b30ffc4-bc86-a4b6-49f7-ceeb07ad81d2 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9032:35E6BA:14ED12B:1D33B52:6970D36C |
| html-safe-nonce | 2182f50413c6dbfdc3a44d41ad955399fdb4e9f0b831997c06ad4b5e65d7c6a4 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5MDMyOjM1RTZCQToxNEVEMTJCOjFEMzNCNTI6Njk3MEQzNkMiLCJ2aXNpdG9yX2lkIjoiNzA3MjkxOTU4ODk2OTY5ODE1NyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 0130b536604a306c604915233976b374a948a76b05176ba823e659907010715c |
| hovercard-subject-tag | issue:3336933851 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/databricks/databricks-sql-python/681/issue_layout |
| twitter:image | https://avatars.githubusercontent.com/u/4998052?s=400&v=4 |
| twitter:card | summary |
| og:image | https://avatars.githubusercontent.com/u/4998052?s=400&v=4 |
| og:image:alt | As we all know, Databricks has a VARIANT data type that is more performant than standard JSON strings for nested data. However, I don't understand how I am supposed to insert into such a column usi... |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | excavator-matt |
| hostname | github.com |
| expected-hostname | github.com |
| None | 034aaee9edccd455c18591b13122193c6e12dc773e8a203be73abe934f3e3a72 |
| turbo-cache-control | no-preview |
| go-import | github.com/databricks/databricks-sql-python git https://github.com/databricks/databricks-sql-python.git |
| octolytics-dimension-user_id | 4998052 |
| octolytics-dimension-user_login | databricks |
| octolytics-dimension-repository_id | 493695132 |
| octolytics-dimension-repository_nwo | databricks/databricks-sql-python |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 493695132 |
| octolytics-dimension-repository_network_root_nwo | databricks/databricks-sql-python |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 52d1d534d0b88c64972e852deb460d09b433b8b2 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width