Title: Undefined features should be rejected when being fetched via `get_historical_features` / `get_online_features` · Issue #2576 · feast-dev/feast · GitHub
Open Graph Title: Undefined features should be rejected when being fetched via `get_historical_features` / `get_online_features` · Issue #2576 · feast-dev/feast
X Title: Undefined features should be rejected when being fetched via `get_historical_features` / `get_online_features` · Issue #2576 · feast-dev/feast
Description: Context I want to create versioned feature views. Through various versions, features could be added or removed. Expected Behavior When doing feast.get_historical_features, features that are not defined should be rejected. Current Behavio...
Open Graph Description: Context I want to create versioned feature views. Through various versions, features could be added or removed. Expected Behavior When doing feast.get_historical_features, features that are not def...
X Description: Context I want to create versioned feature views. Through various versions, features could be added or removed. Expected Behavior When doing feast.get_historical_features, features that are not def...
Opengraph URL: https://github.com/feast-dev/feast/issues/2576
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Undefined features should be rejected when being fetched via `get_historical_features` / `get_online_features`","articleBody":"## Context\r\n\r\nI want to create versioned feature views. Through various versions, features could be added or removed.\r\n\r\n## Expected Behavior \r\n\r\nWhen doing `feast.get_historical_features`, features that are not defined should be rejected.\r\n\r\n## Current Behavior\r\n\r\nThe features get returned even though they have not been defined. \r\n\r\n## Steps to reproduce\r\n\r\n1. Initialize a new feast repository \r\n2. Define the features:\r\n\r\n```python\r\ndriver_hourly_stats = FileSource(\r\n path=\"/home/benjamintan/workspace/feast-workflow-demo/feature_repo/data/driver_stats.parquet\",\r\n timestamp_field=\"event_timestamp\",\r\n created_timestamp_column=\"created\",\r\n)\r\n\r\ndriver = Entity(name=\"driver_id\", value_type=ValueType.INT64, description=\"driver id\",)\r\n\r\ndriver_hourly_stats_view_v1 = FeatureView(\r\n name=\"driver_hourly_stats_v1\",\r\n entities=[\"driver_id\"],\r\n ttl=timedelta(days=1),\r\n schema=[\r\n Field(name=\"avg_daily_trips\", dtype=Int64),\r\n ],\r\n online=True,\r\n batch_source=driver_hourly_stats,\r\n tags={},\r\n)\r\n\r\ndriver_hourly_stats_view_v2 = FeatureView(\r\n name=\"driver_hourly_stats_v2\",\r\n entities=[\"driver_id\"],\r\n ttl=timedelta(days=1),\r\n schema=[\r\n Field(name=\"conv_rate\", dtype=Float32),\r\n Field(name=\"acc_rate\", dtype=Float32),\r\n Field(name=\"avg_daily_trips\", dtype=Int64),\r\n ],\r\n online=True,\r\n batch_source=driver_hourly_stats,\r\n tags={},\r\n)\r\n```\r\n3. `feast apply`\r\n4. Querying Feast:\r\n\r\n```python\r\nfs = FeatureStore(repo_path='.')\r\n\r\nentity_df = pd.DataFrame(\r\n {\r\n \"event_timestamp\": [\r\n pd.Timestamp(dt, unit=\"ms\", tz=\"UTC\").round(\"ms\")\r\n for dt in pd.date_range(\r\n start=datetime.now() - timedelta(days=3),\r\n end=datetime.now(),\r\n periods=3,\r\n )\r\n ],\r\n \"driver_id\": [1001, 1002, 1003],\r\n }\r\n)\r\n```\r\nI _do not_ expect the following to work:\r\n\r\n```python\r\n# THIS PART SHOULDN'T WORK\r\nfeatures_wrong = ['driver_hourly_stats_v1:conv_rate', # doesn't exist in V1\r\n 'driver_hourly_stats_v1:acc_rate', # doesn't exist in V1\r\n 'driver_hourly_stats_v1:avg_daily_trips',\r\n ]\r\n\r\nhist_features_wrong = fs.get_historical_features(\r\n entity_df=entity_df,\r\n features=features_wrong,\r\n)\r\n```\r\n\r\nBut I do get results:\r\n\r\n```\r\n event_timestamp driver_id ... acc_rate avg_daily_trips\r\n0 2022-04-17 09:35:35.658000+00:00 1001 ... 0.536431 742.0\r\n1 2022-04-18 21:35:35.658000+00:00 1002 ... 0.496901 678.0\r\n2 2022-04-20 09:35:35.658000+00:00 1003 ... NaN \r\n```\r\n\r\nI do not expect this to work because `driver_hourly_stats_v1:conv_rate` and `driver_hourly_stats_v1:acc_rate` were not defined in the `driver_hourly_stats_view_v1` FeatureView. \r\n\r\nAnd just to double check that `driver_hourly_stats_v1` only has `avg_daily_trips` defined:\r\n\r\n```\r\n➜ feast feature-views describe driver_hourly_stats_v1\r\nspec:\r\n name: driver_hourly_stats_v1\r\n entities:\r\n - driver_id\r\n features:\r\n - name: avg_daily_trips\r\n valueType: INT64\r\n ttl: 86400s\r\n```\r\n\r\n### Specifications\r\n\r\n- Version: 0.20 (tested this on 0.19 and 0.18)\r\n- Platform: Linux\r\n- Subsystem: Ubuntu\r\n\r\n## Possible Solution\r\n\r\nThe list of features being passed in should be checked against the registry. Currently the feature view name and feature name pairs are not validated _together_. Here's an example that modifies `get_historical_features`:\r\n\r\n```python\r\n @log_exceptions_and_usage\r\n def get_historical_features(\r\n self,\r\n entity_df: Union[pd.DataFrame, str],\r\n features: Union[List[str], FeatureService],\r\n full_feature_names: bool = False,\r\n ) -\u003e RetrievalJob:\r\n\r\n # Build a dictionary of feature view names -\u003e feature names (not sure if this function already exists ...)\r\n fv_name_features = dict([(fv.name, [f.name.split('-')[0] for f in fv.features]) for fv in self.list_feature_views()])\r\n \r\n # Check that input features are found in the `fv_name_features` dictionary\r\n feature_views_not_found = []\r\n for feature in features:\r\n k, v = feature.split(\":\")\r\n if v not in fv_name_features[k]:\r\n feature_views_not_found.append(f'{k}:{v}')\r\n\r\n if feature_views_not_found:\r\n raise FeatureViewNotFoundException(', '.join(feature_views_not_found))\r\n```\r\n\r\nThis returns:\r\n\r\n```python\r\nfeast.errors.FeatureViewNotFoundException: Feature view driver_hourly_stats_v1:conv_rate, driver_hourly_stats_v1:acc_rate does not exist\r\n```\r\n\r\nThis doesn't handle the case when a `FeatureService` is passed in but it shouldn't be too hard.\r\n\r\n\r\nThis should also apply to `get_online_features`. \r\n","author":{"url":"https://github.com/benjamintanweihao","@type":"Person","name":"benjamintanweihao"},"datePublished":"2022-04-20T02:43:29.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/2576/feast/issues/2576"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:b5a1c72b-a0ea-8005-4aa9-90c4077fd664 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 8F30:2BE309:2DA9DB:40777E:697886D5 |
| html-safe-nonce | d0a39fd2b753ba2fa1ca1a0642b2f018165d9384e62b1e289940acb61b3f7822 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4RjMwOjJCRTMwOToyREE5REI6NDA3NzdFOjY5Nzg4NkQ1IiwidmlzaXRvcl9pZCI6IjYxNDg5MzQ3MjMzMDE4MzY1MDEiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | a82560fd67293aa1ae0b1022912db897730946a232be6fc4708de1ea303ba3f1 |
| hovercard-subject-tag | issue:1209091819 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/2576/issue_layout |
| twitter:image | https://opengraph.githubassets.com/350443f295187428370c8ff2fc015c477f76cfa48ab72acad229d7bb2d4ceee2/feast-dev/feast/issues/2576 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/350443f295187428370c8ff2fc015c477f76cfa48ab72acad229d7bb2d4ceee2/feast-dev/feast/issues/2576 |
| og:image:alt | Context I want to create versioned feature views. Through various versions, features could be added or removed. Expected Behavior When doing feast.get_historical_features, features that are not def... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | benjamintanweihao |
| hostname | github.com |
| expected-hostname | github.com |
| None | 2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 73a426593e896c8afeb40b1706b74d04068aca2d |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width