Title: Feast API: Feature references, concept hierarchy, and data model · Issue #479 · feast-dev/feast · GitHub
Open Graph Title: Feast API: Feature references, concept hierarchy, and data model · Issue #479 · feast-dev/feast
X Title: Feast API: Feature references, concept hierarchy, and data model · Issue #479 · feast-dev/feast
Description: This issue is meant to be a discussion of the current Feast API as it relates to feature references, a key component of the user facing API. Additionally, it will also discuss the current data model and our concept hierarchy. 1. Backgrou...
Open Graph Description: This issue is meant to be a discussion of the current Feast API as it relates to feature references, a key component of the user facing API. Additionally, it will also discuss the current data mode...
X Description: This issue is meant to be a discussion of the current Feast API as it relates to feature references, a key component of the user facing API. Additionally, it will also discuss the current data mode...
Opengraph URL: https://github.com/feast-dev/feast/issues/479
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Feast API: Feature references, concept hierarchy, and data model","articleBody":"This issue is meant to be a discussion of the current Feast API as it relates to `feature references`, a key component of the user facing API. Additionally, it will also discuss the current data model and our concept hierarchy. \r\n\r\n## 1. Background\r\nThe Feast user facing API and data model changed dramatically from 0.1 to 0.2+. The original intention was to simplify the API as much as possible and gradually evolve it as new user requirements available.\r\n\r\nTwo important reference documents on this topic are\r\n* [Feast 0.3 RFC](https://docs.google.com/document/d/1QnUQWhwJ1fDnMQ4sZdUdLa_4C-x5Hhpw3C8nYLg4nsY/edit)\r\n* [Feast Projects RFC](https://docs.google.com/document/d/14-QBz9X8zK_aGAY2ti43a7PqMxvqp0035ec7QYgmEBM/edit#)\r\n\r\n## 2. Problem statement\r\nThe Feast API is evolving as more and more teams adopt the software and share their requirements with us. In most cases this means an expansion of the API, but in some cases it means a reversal. \r\n\r\nWith the introduction of projects into Feast ([Feast Projects RFC](https://docs.google.com/document/d/14-QBz9X8zK_aGAY2ti43a7PqMxvqp0035ec7QYgmEBM/edit#)), our API has evolved again. This change has affected feature references, the data model, and concept hierarchy.\r\n\r\nThe most critical feedback on this change has been that it introduces unnecessary complexity to address problems (isolation, namespacing, security), that could be solved in a different way.\r\n\r\n## 3. Objective\r\nThe point of this GitHub issue is to settle our API for feature references, our concept hierarchy, and data model in such a way that we\r\n* Meet all our known requirements for future development\r\n* Minimize user facing changes and migration requirements\r\n* Maintain flexibility in accepting new user requirements and evolving our API\r\n\r\nPut simply, we want to make sure that we are on the right path and make the necessary changes now when its least disruptive.\r\n\r\n## 4. What are feature references?\r\nFeature references (previously Feature Ids) are strings/objects within Feast that allows Feast and users of Feast to reference specific features. Feature references are primarily used as a means of indicating to Feast which features a user would like to retrieve.\r\n\r\nOriginally, feature references were defined as follows\r\n`\u003cfeature-set\u003e:\u003cfeature-name\u003e:\u003cfeature-version\u003e`\r\nAll parts of the above reference were required at the time.\r\n\r\nFeature references have recently been updated (as part of the [Projects RFC](https://docs.google.com/document/d/14-QBz9X8zK_aGAY2ti43a7PqMxvqp0035ec7QYgmEBM/edit#))\r\n\r\nThe move towards project namespaces now moves feature sets and features/entities into the following hierarchy\r\n\r\n\r\nFeature references are now defined as: `\u003cproject\u003e/\u003cfeature-name\u003e:\u003cfeature-version\u003e`\r\n\r\nThe following constraints apply\r\n- Versions are optional. If no version is provided then the latest version of a feature is used.\r\n- Feature names must be unique within a project (even across feature sets within that project).\r\n- Entity names must be unique within a project (but can be reused across feature sets).\r\n\r\nOne of our primary motivations was to allow users to reference features directly by name. With `versions` becoming optional and allowing the `project` to be set externally, this is now possible. Users can provide features as a list of feature names\r\n\r\nAn example of feature references being used below (from the Python SDK):\r\n```\r\nonline_features = client.get_online_features(\r\n feature_refs=[\r\n f\"daily_transactions\",\r\n f\"total_transactions\",\r\n ],\r\n entity_rows=entity_rows,\r\n)\r\n```\r\n\r\n## 5. How are feature references used?\r\n### 5.1 During online serving\r\nDuring online serving the user will provide two sets of information to Feast during feature retrieval. \r\n- A list of feature references \r\n- A list of entities\r\n\r\nFeast wants to construct a response object with all of the data from these features on all of these entities.\r\n\r\nFor example, if a user sends a request with a single feature reference as `daily_transactions`, Feast will attempt to add the missing information. It will add the `project` id (which currently must be provided by the user), it will then determine the `feature set` that contains that feature name, and then finally it will determine the latest `version` of the feature set in which the feature occurs. \r\n\r\nInternally, Feast is left with something that resembles the following\r\n`my_customer_project/my_customer_feature_set:daily_transactions:3`\r\n\r\nSince features are stored based on feature sets, Feast first converts the above into what we can informally define as a feature set reference, resembling the following\r\n`\u003cproject\u003e/\u003cfeature-set-name\u003e:\u003cfeature-set-version\u003e`\r\nor tangibly\r\n`my_customer_project/my_customer_feature_set:3`\r\n\r\nIn the case of Redis, Feast will use the above feature set reference, along with the entities the user has provided, to construct a list of keys to look up. The responses from the database are then used to build a response object that is returned to the user.\r\n\r\n### 5.2 During batch serving\r\nThe batch serving case is very similar to the online serving case, but with more complexity on queries and joins.\r\n\r\nThe user provides the following during batch retrieval\r\n- A list of feature references \r\n- A list of entities paired with timestamps\r\n\r\nFeature references are converted into their full form, as well as used to create feature set references (as in online serving). In the case of BigQuery, the feature set reference maps directly to a table. For each feature set table that Feast needs to query features from, Feast runs a point in time correct query using the entities+timestamps for the specific feature columns. This produces a resultant table with the users requested feature data, over the timestamps and features, but one specific feature set. \r\n\r\nFeast then uses the entity columns in each feature set table as a means of joining the results of these sub-queries into a single resultant dataframe. \r\n\r\n### 5.3 During ingestion of data into stores\r\nWhen loading data into Feast, data first needs to be converted into [FeatureRow](https://github.com/gojek/feast/blob/master/protos/feast/types/FeatureRow.proto#L28) format and then pushed into a Kafka stream. \r\n\r\nDuring this conversion to feature row form, it is necessary to set a field called `feature_set` with the feature set reference. To reiterate, the feature set reference looks something like:\r\n`\u003cproject\u003e/\u003cfeature-set-name\u003e:\u003cfeature-set-version\u003e`\r\n\r\nIngestion jobs that pick up these rows are then able to easily identify the row as belonging to a specific project and feature set. The jobs then write all of these rows to all of the stores that subscribe to these feature sets. \r\n\r\n## 6. Problems with the current implementation\r\n### 6.1 **Feature set versions are unnecessary:**\r\nThe concept of feature set versions was introduced in order to allow users to reuse feature set names. However, they add additional complexity at both ingestion time as well as retrieval time. Users need to maintain a knowledge of the correct version of feature set to ingest data to and to retrieve data from. If they dont pin their retrieval to a specific version then they risk having their system go down at a version increment.\r\n\r\n### 6.2 **Projects could be unnecessary at the top of the concept hierarchy:**\r\nProjects as a concept was introduced to provide a means of \r\n* **Isolation between users:** Users can register the same feature sets and features within their own project namespace without conflicts arrising between users.\r\n* **Access control:** Projects provide a top level hierarchy that makes access control more convenient to implement\r\n* **Ease of feature retrieval:** By introducing naming constraints at the project level, it is easier to logically group and reference feature by name. Thus, projects provide a way of grouping based on retrieval where feature sets provide a means of grouping based on ingestion.\r\n\r\nThe problem with `projects` is that it introduces a layer into the concept hierarchy that makes Feast harder to understand and could be introducing unnecessary complexity. It's possible that all of the above requirements for introducing projects could be addressed while still maintaining feature sets as the top level concept.\r\n\r\n### 6.3 **Projects are a cause for code smell in the data model:**\r\nThere are currently three locations where projects occur.\r\n1. Ingestion (FeatureRows)\r\n2. Stores (tables and keys)\r\n3. Serving/retrieval (incoming queries)\r\n\r\nThe current approach has code smell in the fact that FeatureRows have to know their own identity. Today, having each FeatureRow know its own identify allows Feast to consume from topics that contain mixed feature sets (versions and names). Feast is able to differentiate FeatureRows from each other and can know how to interpret their contents based on a feature reference contained within the row.\r\n\r\nHowever, In the case that Feast were to consume features from an external stream that it had no control over (not even the data model), Feast would not have the feature set reference conveniently available inside the event payload. \r\n\r\nThe second occurrence of projects is in the store. Tables are currently named according to `projectName_featureSet_version`. Projects are a necessity here since feature set names can be duplicated across projects. However, projects are not essential complexity in the same way a feature set is, and doesnt seem natural to encode into the data model itself.\r\n\r\n### 6.4 **Feature sets are a leaky abstraction**: \r\nFeature sets are a core part of the existing data model. Feature data is stored on a feature set within a feast store like Redis or BigQuery. In order to find the features a user is looking for, it is still necessary to determine the feature set they need from their `feature reference`. This seems to work at retrieval time since Feast Serving can maintain a cache of available feature sets (albeit introducing a new inefficiency during lookup). Two problems exist here:\r\n\r\n1. There is a disconnect between how users are producing data (`feature set references`) and how users are consuming data (`feature references`). Users are loading in FeatureRows into feature sets, but they are querying out features from projects. Ideally these two concepts wouldn't be so distinct.\r\n2. Currently, feature references are defined as follows: `\u003cproject\u003e/\u003cfeature-name\u003e:\u003cfeature-version\u003e`. However, the concept of a `feature-version` doesn't exist. Feature are currently inheriting their version from their feature set. So right now a `feature references` still contain trace information about the parent feature set.","author":{"url":"https://github.com/woop","@type":"Person","name":"woop"},"datePublished":"2020-02-18T04:48:11.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":23},"url":"https://github.com/479/feast/issues/479"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:c71ecac1-a278-3962-e07e-5ef300213c14 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 94F4:1D13BD:B62EB3:F1E959:697BFD24 |
| html-safe-nonce | c3a5c1c03667505574a1b3c73ea3b22cb40df8560a5099221b1dd4f2e94dae7c |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NEY0OjFEMTNCRDpCNjJFQjM6RjFFOTU5OjY5N0JGRDI0IiwidmlzaXRvcl9pZCI6IjY3Nzk2NjY4NDk4NjE0Njc0MjgiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 5e5e791ac1315446c09c1ab524e4df110892c2a10edba1996f335fa6b7726a13 |
| hovercard-subject-tag | issue:566643466 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/479/issue_layout |
| twitter:image | https://opengraph.githubassets.com/3a06a14dbdc9296d937fc9bcaa458df475976926a38c9137f0b8f24faec28b51/feast-dev/feast/issues/479 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/3a06a14dbdc9296d937fc9bcaa458df475976926a38c9137f0b8f24faec28b51/feast-dev/feast/issues/479 |
| og:image:alt | This issue is meant to be a discussion of the current Feast API as it relates to feature references, a key component of the user facing API. Additionally, it will also discuss the current data mode... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | woop |
| hostname | github.com |
| expected-hostname | github.com |
| None | da4f0ee56809799586f8ee546b27f94fe9b5893edfbf87732e82be45be013b52 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 2d0972e08a3f8dfff1c4bf1f3d026a7d3a209c26 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width