Title: Make materialization more scalable + performant · Issue #2594 · feast-dev/feast · GitHub
Open Graph Title: Make materialization more scalable + performant · Issue #2594 · feast-dev/feast
X Title: Make materialization more scalable + performant · Issue #2594 · feast-dev/feast
Description: This issue discusses common issues users face when materializing features to the online store in Feast. User problems Generally, users with large datasets can face issues on reliably loading data into the online store to meet their onlin...
Open Graph Description: This issue discusses common issues users face when materializing features to the online store in Feast. User problems Generally, users with large datasets can face issues on reliably loading data i...
X Description: This issue discusses common issues users face when materializing features to the online store in Feast. User problems Generally, users with large datasets can face issues on reliably loading data i...
Opengraph URL: https://github.com/feast-dev/feast/issues/2594
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Make materialization more scalable + performant","articleBody":"This issue discusses common issues users face when materializing features to the online store in Feast.\r\n\r\n## User problems\r\nGenerally, users with large datasets can face issues on reliably loading data into the online store to meet their online needs.\r\n\r\n### 1. Materialization in the default provider is not scalable\r\nAs per https://github.com/feast-dev/feast/issues/2071, \r\n\r\n\u003e Currently, the materialization process loads all the data from the Offline Store to an Arrow table, then converts all the data to Protobuf, then writes all the data to the Online Store. This process requires holding the entire dataset in memory which is not practical.\r\n\r\n### 2. Materialization can be slow\r\nFor users that aren't working with a small number of feature views or large number of unique entities, Feast's python based materialization works fine. However, this does not hold true for many users.\r\n\r\nThe default provider is slow to materialize data. Users can report multiple hours to do incremental materialization, or worse materialization never completes.\r\n\r\nUsers have had to build custom providers to solve this (e.g. by kicking off Dataflow or Spark jobs to more quickly materialize large amounts of data)\r\n\r\n### 3. Materialization not always reliable\r\nThere are several datastore specific issues such as https://github.com/feast-dev/feast/issues/2027 and https://github.com/feast-dev/feast/issues/2323, where batch write transactions can time out:\r\n\r\n```\r\nFile \"/usr/local/lib/python3.7/dist-packages/google/api_core/grpc_helpers.py\", line 69, in error_remapped_callable\r\nsix.raise_from(exceptions.from_grpc_error(exc), exc)\r\nFile \"\", line 3, in raise_from\r\ngoogle.api_core.exceptions.InvalidArgument: 400 The referenced transaction has expired or is no longer valid.\r\n```\r\n\r\nIn datastore, there are also contention errors (https://github.com/feast-dev/feast/issues/1575):\r\n\r\n```\r\nMaterializing 1 feature views from 2021-04-29 21:19:00-07:00 to 2021-04-29 21:19:05-07:00 into the datastore online store.\r\nmy_fv:\r\n 8%|████▋ | 2720/33120 [00:07\u003c01:18, 388.10it/s]\r\n\r\ngoogle.api_core.exceptions.Aborted: 409 too much contention on these datastore entities. please try again. entity groups:\r\n```\r\n\r\n\r\n\r\n## Describe the solution you'd like\r\nA clear and concise description of what you want to happen.\r\n\r\nThere are multiple ways of addressing this. Some ideas\r\n- Materialize feature views in parallel instead of sequentially (e.g. https://github.com/feast-dev/feast/issues/2421)\r\n- Speed up via PyPy or Cython to optimize slowest python parts.\r\n- Speed up with online store specific bulk loading options (e.g. https://redis.io/docs/reference/patterns/bulk-loading/ or https://cloud.google.com/datastore/docs/reference/admin/rpc/google.datastore.admin.v1#google.datastore.admin.v1.DatastoreAdmin.ImportEntities)\r\n- Materialize via Spark \r\n- Materialize with serverless functions e.g. AWS Lambda\r\n- Spin up separate tasks to materialize each feature view separately\r\n- Spin up separate tasks to materialize smaller intervals of data\r\n- Fix Datastore materialization: https://github.com/feast-dev/feast/issues/1575","author":{"url":"https://github.com/adchia","@type":"Person","name":"adchia"},"datePublished":"2022-04-21T21:25:32.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/2594/feast/issues/2594"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:c0877e83-8219-a10c-a2db-fe1ee423b759 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | A670:3276F1:70DB19:9CCD04:6978BD53 |
| html-safe-nonce | 8b51351ea7360b2acc2a811059b862a25b857b9e2f2d1a0a9fdd2020b460b080 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBNjcwOjMyNzZGMTo3MERCMTk6OUNDRDA0OjY5NzhCRDUzIiwidmlzaXRvcl9pZCI6IjU0MzEyMjI3Nzg3ODQ2OTM1ODciLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 069c392ff1bfaf2f885e6bb48a1149018149206a3135493f50c4a1056d1643eb |
| hovercard-subject-tag | issue:1211568729 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/2594/issue_layout |
| twitter:image | https://opengraph.githubassets.com/3df141e05c78bb34ffec6541c92d5099f49b0d21c0e3933c657d5f06853717f8/feast-dev/feast/issues/2594 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/3df141e05c78bb34ffec6541c92d5099f49b0d21c0e3933c657d5f06853717f8/feast-dev/feast/issues/2594 |
| og:image:alt | This issue discusses common issues users face when materializing features to the online store in Feast. User problems Generally, users with large datasets can face issues on reliably loading data i... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | adchia |
| hostname | github.com |
| expected-hostname | github.com |
| None | 2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 9520342ca7ead2f1a011aa96eaff82fc054a4970 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width