Title: Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM) · Issue #5671 · feast-dev/feast · GitHub
Open Graph Title: Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM) · Issue #5671 · feast-dev/feast
X Title: Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM) · Issue #5671 · feast-dev/feast
Description: Is your feature request related to a problem? Please describe. When running store.materialize_incremental() with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tables using toPandas() or collect(). This l...
Open Graph Description: Is your feature request related to a problem? Please describe. When running store.materialize_incremental() with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tabl...
X Description: Is your feature request related to a problem? Please describe. When running store.materialize_incremental() with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tabl...
Opengraph URL: https://github.com/feast-dev/feast/issues/5671
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Allow Feast Spark materialization to use staging for large dataset spill (avoid driver OOM)","articleBody":"**Is your feature request related to a problem? Please describe.**\n\nWhen running `store.materialize_incremental()` with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tables using `toPandas()` or `collect()`.\nThis loads the entire dataset into driver memory, causing **`OutOfMemoryError`** or **`spark.driver.maxResultSize` exceeded** for large datasets.\n\nAlthough Feast already supports a `staging` configuration for exporting data, it is **not yet used during materialization**. As a result, large-scale jobs cannot leverage staging to spill data to disk or remote storage.\n\n**Describe the solution you'd like**\n\nEnable the existing `staging` configuration to be used for **materialization** in the Spark offline store.\nWhen enabled, Feast should write intermediate Spark DataFrames to the staging location (e.g. local disk or S3) before reading them back as Arrow tables for online ingestion.\n\nExample configuration:\n\n```yaml\noffline_store:\n type: spark\n staging_location: s3://bucket/tmp/feast_arrow\n staging_allow_materialize: true\n```\n\nThis would allow Feast to handle large datasets safely without driver OOM, by spilling intermediate data to the configured staging location.\n\n **Describe alternatives you've considered**\n\n* Increasing driver memory (`spark.driver.memory`, `spark.driver.maxResultSize`) — only delays the problem.\n* Using `toLocalIterator()` — too slow and still limited by memory.\n\n **Additional context**\n\nDuring `feast materialize` with large Spark datasets, jobs fail due to driver OOM even though executors have available resources.\nExample error:\n\n```\nTotal size of serialized results (2.1 GiB) is bigger than spark.driver.maxResultSize (2.0 GiB)\nCaused by: java.lang.OutOfMemoryError: Java heap space\n```\n\nAllowing materialization to use `staging` would make Feast’s Spark integration far more scalable and production-ready.\n","author":{"url":"https://github.com/chimeyrock999","@type":"Person","name":"chimeyrock999"},"datePublished":"2025-10-16T07:56:25.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/5671/feast/issues/5671"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:e9eb798f-2345-8a57-347e-98caaa01f2e5 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | B126:3F5F77:8B10847:B6EBF74:696DE5FC |
| html-safe-nonce | 6eae3f84b27e9562428315fcdab880e02e08addac5045b0583abacb481bd3985 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCMTI2OjNGNUY3Nzo4QjEwODQ3OkI2RUJGNzQ6Njk2REU1RkMiLCJ2aXNpdG9yX2lkIjoiNzE1Mzk1MzczNzY3MDUxODI2OCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 5ec1430699a9b7b51d3892739bec0c421418ab437bf1f6aaef01b96235ff1a45 |
| hovercard-subject-tag | issue:3520752077 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/5671/issue_layout |
| twitter:image | https://opengraph.githubassets.com/0da70bf81ac2cab0a208e5671bb4a5da587d32ffb039d798fb4c2a251f6d61e1/feast-dev/feast/issues/5671 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/0da70bf81ac2cab0a208e5671bb4a5da587d32ffb039d798fb4c2a251f6d61e1/feast-dev/feast/issues/5671 |
| og:image:alt | Is your feature request related to a problem? Please describe. When running store.materialize_incremental() with a Spark-based offline store, Feast currently converts Spark DataFrames to Arrow tabl... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | chimeyrock999 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 4922b452d03cd8dbce479d866a11bc25b59ef6ee2da23aa9b0ddefa6bd4d0064 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 7e5ae23c70136152637ceee8d6faceb35596ec46 |
| ui-target | canary-1 |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width