Title: Feast API: Adding a new historical store · Issue #482 · feast-dev/feast · GitHub
Open Graph Title: Feast API: Adding a new historical store · Issue #482 · feast-dev/feast
X Title: Feast API: Adding a new historical store · Issue #482 · feast-dev/feast
Description: 1. Introduction We've had a lot of demand for either open source or AWS batch stores (#367, #259). Folks from the community have asked us how they can contribute code to add their stores types. In this issue I will walk through how batch...
Open Graph Description: 1. Introduction We've had a lot of demand for either open source or AWS batch stores (#367, #259). Folks from the community have asked us how they can contribute code to add their stores types. In ...
X Description: 1. Introduction We've had a lot of demand for either open source or AWS batch stores (#367, #259). Folks from the community have asked us how they can contribute code to add their stores types....
Opengraph URL: https://github.com/feast-dev/feast/issues/482
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Feast API: Adding a new historical store","articleBody":"## 1. Introduction\r\nWe've had a lot of demand for either open source or AWS batch stores (#367, #259). Folks from the community have asked us how they can contribute code to add their stores types. \r\n\r\nIn this issue I will walk through how batch stores are currently being used and how a new batch store type can be added.\r\n\r\n## 2. Overview\r\nFeast interacts with a batch store in two places\r\n- **Data ingestion:** Ingestion jobs that load data into stores must be able to locate stores, apply migrations, and write data into feature set tables.\r\n- **Feature serving (batch):** Feast serving executes batch retrieval jobs in order for users to export historical feature data.\r\n\r\n## 3. Data ingestion\r\nFeast creates and manages population jobs that stream in data from upstream data sources. Currently Feast only supports Kafka as a data source, meaning these jobs are all long running. Batch ingestion pushes data to Kafka topics after which they are picked up by these \"population\" jobs.\r\n\r\nIn order for the ingestion + population flow to complete, the destination store must be writable. This means that Feast must be able to create the appropriate tables/schemas in the store and also write data from the population job into the store.\r\n\r\nCurrently Feast Core starts and manages these population jobs that ingest data into stores, although we are planning to move this responsibility to the serving layer. Feast Core starts an Apache Beam job which synchronously runs migrations on the destination store and subsequently starts consuming from Kafka and publishing records.\r\n\r\nBelow is a \"happy-path\" example of a batch ingestion process:\r\n\r\n\r\nIn order to accommodate a new store type, the Apache Beam job needs to be updated to support\r\n- Setup (create tables/schemas): The current implementation for BigQuery/Redis is captured in [StoreUtil.java](https://github.com/gojek/feast/blob/master/ingestion/src/main/java/feast/ingestion/utils/StoreUtil.java)\r\n- Writes: A store specific client needs to be implemented that can write to a new store type in [WriteToStore.java](https://github.com/gojek/feast/blob/master/ingestion/src/main/java/feast/ingestion/transform/WriteToStore.java#L90)\r\n\r\n## 4. Feature serving (batch)\r\n\r\nFeast Serving is a web service that allows for the retrieval of feature data from a batch feature store. Below is a sequence diagram for a typical feature request from a batch store.\r\n\r\n\r\n\r\nCurrently we only have support for BigQuery has a batch store. The entry point for this implementation is the [BigQueryServingService](https://github.com/gojek/feast/blob/master/serving/src/main/java/feast/serving/service/BigQueryServingService.java), which extends the [ServingService](https://github.com/gojek/feast/blob/master/serving/src/main/java/feast/serving/service/ServingService.java#L28:18) interface.\r\n\r\n```\r\npublic interface ServingService {\r\n GetFeastServingInfoResponse getFeastServingInfo(GetFeastServingInfoRequest getFeastServingInfoRequest);\r\n GetOnlineFeaturesResponse getOnlineFeatures(GetOnlineFeaturesRequest getFeaturesRequest);\r\n GetBatchFeaturesResponse getBatchFeatures(GetBatchFeaturesRequest getFeaturesRequest);\r\n GetJobResponse getJob(GetJobRequest getJobRequest);\r\n}\r\n```\r\n\r\nThe ServingService is called from the wrapping gRPC service [ServingService](https://github.com/gojek/feast/blob/master/protos/feast/serving/ServingService.proto#L29), where the functionality is more clearly described.\r\n\r\nThe interface defines the following methods\r\n- **getFeastServingInfo**: Get the store type, either online or offline.\r\n- **getOnlineFeatures**: Get online features synchronously.\r\n- **getBatchFeatures**: Get batch features asynchronously. Retrieval for batch features always happens asynchronously, because of the time taken for an export to complete. This method returns immediately with a JobId to the client. The client can then poll the job status until the query has reached a terminal state (succeeded or failed). \r\n- **getJob**: Should return the Job status for a specific Job Id\r\n\r\n**Notes on the current design:**\r\nAlthough the actual functionality will be retained, the structure of these interfaces will probably change away from extending a `service` interface and towards having a `store` interface. There are various problems with the current implementation\r\n1. Batch and online stores share a single interface. I believe the intention here was to allow some stores to support both online and historical/batch storage, but for most stores this isn't the case. There is also no reason why we can't have two interfaces here. Ideally this should be split in two.\r\n2. The current approach is to extend services for each new store type, but this seems to be a poor abstractions. Ideally we would have both a batch and online store interface (not service interface), which is called from a single serving implementation. This approach would be a clearer separation of concerns and would prevent things like job management happening within a service implementation.\r\n","author":{"url":"https://github.com/woop","@type":"Person","name":"woop"},"datePublished":"2020-02-19T06:55:50.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":34},"url":"https://github.com/482/feast/issues/482"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:55e876c2-8a15-458c-2888-5d760d9cf3fc |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | C8FC:3CF7C2:1D4FC28:277B34B:697AAA8E |
| html-safe-nonce | 0fc72537e2a25e7a4e06f5c4f62a271f450ff0bbaacb6eb868b494700c581591 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDOEZDOjNDRjdDMjoxRDRGQzI4OjI3N0IzNEI6Njk3QUFBOEUiLCJ2aXNpdG9yX2lkIjoiNTYyMDI3ODU4MDUwMDY3MTExOCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 790723d206fe9ed8ad2e33487fbfa4d30bf47f902c01473bd4a46424ef14c282 |
| hovercard-subject-tag | issue:567355830 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/482/issue_layout |
| twitter:image | https://opengraph.githubassets.com/3c54dc5c6d4a3a59677241447058648a7de50c15890f5288ac13e9c71ce9ea95/feast-dev/feast/issues/482 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/3c54dc5c6d4a3a59677241447058648a7de50c15890f5288ac13e9c71ce9ea95/feast-dev/feast/issues/482 |
| og:image:alt | 1. Introduction We've had a lot of demand for either open source or AWS batch stores (#367, #259). Folks from the community have asked us how they can contribute code to add their stores types. In ... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | woop |
| hostname | github.com |
| expected-hostname | github.com |
| None | 4d375b6c612de26fd037e0638eaf57e32cf9b16daf53ab68c25c04cd3b058113 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 5b310e2c1221fb24ffd3f38a097000863d2dfdd4 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width