Title: OnDemandFeatureViews & online async feature retrieval & passing unknown entity values --> raises TypeError when it should not · Issue #4473 · feast-dev/feast · GitHub
Open Graph Title: OnDemandFeatureViews & online async feature retrieval & passing unknown entity values --> raises TypeError when it should not · Issue #4473 · feast-dev/feast
X Title: OnDemandFeatureViews & online async feature retrieval & passing unknown entity values --> raises TypeError when it should not · Issue #4473 · feast-dev/feast
Description: Context Factors to Consider I call the get_online_features_async method on the FeatureStore. I am passing an "entity value" that is not materialized in the online store. For example, only driver_id 1 and 2 are materialized in the online ...
Open Graph Description: Context Factors to Consider I call the get_online_features_async method on the FeatureStore. I am passing an "entity value" that is not materialized in the online store. For example, only driver_id...
X Description: Context Factors to Consider I call the get_online_features_async method on the FeatureStore. I am passing an "entity value" that is not materialized in the online store. For example, only...
Opengraph URL: https://github.com/feast-dev/feast/issues/4473
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"OnDemandFeatureViews \u0026 online async feature retrieval \u0026 passing unknown entity values --\u003e raises TypeError when it should not","articleBody":"## Context\r\n\r\n### Factors to Consider\r\n1. I call the `get_online_features_async` method on the `FeatureStore`.\r\n2. I am passing an \"entity value\" that is not materialized in the online store. For example, only `driver_id` `1 `and `2` are materialized in the online store. However, I’m passing `99999`.\r\n3. I’m using an `OnDemandFeatureView`.\r\n\r\n## Expected Behavior\r\nWhen only points 1 and 2 are true, I get a response that has `None` values for all the features. This functionality works as expected.\r\n\r\nWhen all three points are true, I would also expect to get a response that has `None` values for all the features. However, this is not the case.\r\n\r\n## Current Behavior\r\nWhen only points 1 and 2 are true, I get a response that has `None` values for all the features. This functionality works as expected.\r\n\r\nI would expect this to be true as well when using `OnDemandFeatureViews` (ODFVs). However, when all three points are true, the following error is raised: `TypeError: Couldn't infer value type from empty value`.\r\n\r\n## Steps to Reproduce\r\n[This branch](https://github.com/feast-dev/feast/compare/master...job-almekinders:feast:odfv-unknown-entity-id-type-error?expand=1) contains an additional integration test named `test_async_online_retrieval_with_event_timestamps_null_only`, which acts as a minimal failing example.\r\n\r\nThe error is raised when the `python_values_to_proto_values` [method](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/utils.py#L475) is called in the `sdk/python/feast/utils.py` file, which is invoked by the `get_online_features_async` [method](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/online_stores/online_store.py#L275) in the `sdk/python/feast/infra/online_stores/online_store.py` file (these locations are marked with `# breakpoint()` in the linked branch).\r\n\r\nFurthermore, this error is only happening when we are \"only\" passing unknown entity values. For example, if we are only passing the unknown entity value 99999, it will fail. If we pass the known entity value 1 and the unknown value 99999, it will be successful. \r\n\r\nYou can run this test by creating a virtual environment, and run this command in the shell: \r\n```bash\r\nPYTHONPATH='.' \\\r\n FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.postgres_repo_configuration \\\r\n PYTEST_PLUGINS=sdk.python.tests.integration.feature_repos.universal.online_store.postgres \\\r\n python -m pytest --integration sdk/python/tests/integration/online_store/test_universal_online.py::test_async_online_retrieval_with_event_timestamps_null_only\r\n```\r\n\r\n### Another method to re-produce:\r\n\r\ndocker-compose.yml\r\n```yml\r\n---\r\nversion: \"3\"\r\nservices:\r\n offline_store:\r\n image: postgres:16-alpine\r\n container_name: offline_store\r\n ports:\r\n - \"6543:5432\"\r\n environment:\r\n - POSTGRES_DB=offline_store\r\n - POSTGRES_USER=postgres\r\n - POSTGRES_PASSWORD=postgres\r\n volumes:\r\n - ./postgres_init:/docker-entrypoint-initdb.d\r\n online_store:\r\n image: postgres:16-alpine\r\n container_name: online_store\r\n ports:\r\n - \"5432:5432\"\r\n environment:\r\n - POSTGRES_DB=online_store\r\n - POSTGRES_USER=postgres\r\n - POSTGRES_PASSWORD=postgres\r\n\r\n```\r\n\r\nfeature_store.yml\r\n```yml\r\nproject: feast_tryout\r\nprovider: local\r\nregistry:\r\n registry_type: sql\r\n path: postgresql+psycopg2://postgres:postgres@0.0.0.0:5432/online_store\r\n cache_ttl_seconds: 60\r\nonline_store:\r\n type: postgres\r\n host: 0.0.0.0\r\n port: 5432\r\n database: online_store\r\n db_schema: online\r\n user: postgres\r\n password: postgres\r\noffline_store:\r\n type: postgres\r\n host: 0.0.0.0\r\n port: 6543\r\n database: offline_store\r\n db_schema: offline\r\n user: postgres\r\n password: postgres\r\nentity_key_serialization_version: 2\r\n```\r\n\r\nInsert into offline store (postgres)\r\npostgres_init/create-offline-store-database.sql\r\n```python\r\nCREATE SCHEMA offline;\r\n\r\nCREATE TABLE offline.features (\r\n \"ENTITY_ID\" VARCHAR,\r\n \"EVENT_TIMESTAMP\" TIMESTAMP,\r\n \"ENTITY_FLOAT\" FLOAT,\r\n);\r\n\r\nINSERT INTO offline.features\r\nSELECT *\r\nFROM (\r\n VALUES ('11111111', '2024-01-01 13:00:00' :: TIMESTAMP, 1.1),\r\n ('11111111', '2024-01-01 14:00:00' :: TIMESTAMP, 1.11),\r\n ('11111111', '2024-01-01 15:00:00' :: TIMESTAMP, 1.111),\r\n ('22222222', '2024-01-01 13:00:00' :: TIMESTAMP, 2.2),\r\n ('22222222', '2024-01-01 14:00:00' :: TIMESTAMP, 2.22),\r\n ('33333333', '2024-01-01 13:00:00' :: TIMESTAMP, 3.3),\r\n ('44444444', '2024-01-02 22:00:00' :: TIMESTAMP, 4.4)\r\n )\r\n```\r\n\r\nbootstrap.py\r\n```python\r\nfrom datetime import timedelta\r\nfrom typing import Any\r\n\r\nimport pandas as pd\r\nfrom feast import (\r\n Entity,\r\n FeatureService,\r\n FeatureStore,\r\n FeatureView,\r\n Field,\r\n RequestSource,\r\n ValueType,\r\n)\r\nfrom feast.infra.offline_stores.contrib.postgres_offline_store.postgres_source import (\r\n PostgreSQLSource as PostgresSource,\r\n)\r\nfrom feast.on_demand_feature_view import on_demand_feature_view\r\nfrom feast.types import Float32, Float64\r\n\r\nfeature_store = FeatureStore()\r\n\r\nfeatures_entity = Entity(\r\n name=\"entity_id\",\r\n join_keys=[\"ENTITY_ID\"],\r\n value_type=ValueType.STRING,\r\n)\r\n\r\nfeatures_source = PostgresSource(\r\n name=\"features\",\r\n timestamp_field=\"EVENT_TIMESTAMP\",\r\n table=\"offline.features\",\r\n)\r\n\r\nfeatures_feature_view = FeatureView(\r\n name=\"features_feature_view\",\r\n entities=[features_entity],\r\n ttl=timedelta(days=0),\r\n schema=[Field(name=\"ENTITY_FLOAT\", dtype=Float32)],\r\n online=True,\r\n source=features_source,\r\n)\r\n\r\nrequest_source = RequestSource(\r\n name=\"request_feature\",\r\n schema=[Field(name=\"REQUEST_FLOAT\", dtype=Float32)],\r\n)\r\n\r\n\r\n@on_demand_feature_view(\r\n sources=[features_feature_view, request_source],\r\n schema=[\r\n Field(name=\"ENTITY_FLOAT_TRANSFORMED_PANDAS\", dtype=Float64),\r\n Field(name=\"ENTITY_FLOAT_PLUS_REQUEST_SOURCE\", dtype=Float64),\r\n ],\r\n mode=\"pandas\",\r\n)\r\ndef odfv_pandas(input: pd.DataFrame) -\u003e pd.DataFrame:\r\n output = pd.DataFrame()\r\n output[\"ENTITY_FLOAT_TRANSFORMED_PANDAS\"] = input[\"ENTITY_FLOAT\"] * 2\r\n output[\"ENTITY_FLOAT_PLUS_REQUEST_SOURCE\"] = (\r\n input[\"ENTITY_FLOAT\"] * input[\"REQUEST_FLOAT\"]\r\n )\r\n return output\r\n\r\n\r\n@on_demand_feature_view(\r\n sources=[features_feature_view, request_source],\r\n schema=[Field(name=\"ENTITY_FLOAT_TRANSFORMED_PYTHON\", dtype=Float64)],\r\n mode=\"python\",\r\n)\r\ndef odfv_python(input: dict[str, Any]) -\u003e dict[str, Any]:\r\n output = {}\r\n output[\"ENTITY_FLOAT_TRANSFORMED_PYTHON\"] = [\r\n value * 2 if value is not None else None for value in input[\"ENTITY_FLOAT\"]\r\n ]\r\n\r\n output[\"ENTITY_FLOAT_PLUS_REQUEST_SOURCE_PYTHON\"] = [\r\n (e + r) if e is not None and r is not None else None\r\n for e, r in zip(input[\"ENTITY_FLOAT\"], input[\"REQUEST_FLOAT\"])\r\n ]\r\n return output\r\n\r\n\r\nfeatures_feature_service_pandas = FeatureService(\r\n name=\"features_feature_service_pandas\",\r\n features=[features_feature_view, odfv_pandas],\r\n)\r\n\r\nfeatures_feature_service_python = FeatureService(\r\n name=\"features_feature_service_python\",\r\n features=[features_feature_view, odfv_python],\r\n)\r\n\r\nfeature_store.apply(\r\n [\r\n features_entity,\r\n features_source,\r\n features_feature_view,\r\n odfv_pandas,\r\n odfv_python,\r\n features_feature_service_pandas,\r\n features_feature_service_python,\r\n ]\r\n)\r\n```\r\n\r\nmaterialize\r\n```python\r\nfrom datetime import datetime\r\n\r\nfrom feast import FeatureStore\r\n\r\nfeature_store = FeatureStore()\r\nfeature_store.materialize(\r\n start_date=datetime(1900, 1, 1),\r\n end_date=datetime(9999, 1, 1),\r\n feature_views=[\"features_feature_view\"],\r\n)\r\n\r\n```\r\n\r\ninference\r\n```python\r\nimport pandas as pd\r\nfrom feast import FeatureStore\r\n\r\nfeature_store = FeatureStore()\r\nfeature_service_pandas = feature_store.get_feature_service(\r\n name=\"features_feature_service_pandas\"\r\n)\r\nfeature_service_python = feature_store.get_feature_service(\r\n name=\"features_feature_service_python\"\r\n)\r\n\r\nentity_rows = [\r\n # This entity ID is not in the offline or online store\r\n {\"ENTITY_ID\": \"1\", \"REQUEST_FLOAT\": 1.0},\r\n]\r\nentity_df = pd.DataFrame(entity_rows)\r\nentity_df[\"event_timestamp\"] = pd.to_datetime(\"now\", utc=True)\r\n\r\n# This works.\r\nprint(\"offline with pandas\")\r\noffline_features = feature_store.get_historical_features(\r\n entity_df=entity_df,\r\n features=feature_service_pandas,\r\n).to_df()\r\nprint(list(offline_features.to_dict().keys()))\r\n\r\n## This doesn't work, raises the error\r\n# print(\"online with pandas\")\r\n# online_features = feature_store.get_online_features(\r\n# entity_rows=entity_rows,\r\n# features=feature_service_pandas,\r\n# ).to_dict()\r\n# print(list(online_features.keys()))\r\n\r\n## This doesn't work, raises the error\r\n# print(\"online with python\")\r\n# online_features = feature_store.get_online_features(\r\n# entity_rows=entity_rows,\r\n# features=feature_service_python,\r\n# ).to_dict()\r\n# print(list(online_features.keys()))\r\n\r\n```\r\n\r\n### Specifications\r\n- **Version**: 0.36.0\r\n- **Platform**: macOS - M1\r\n- **Subsystem**: Sonoma 14.1.1\r\n\r\n## Possible Solution\r\nI’m not entirely sure why `ValueType.UNKNOWN` is passed to the `feature_type` argument of the `python_values_to_proto_values` method. If we were to pass another value, I believe the method would succeed, as the `if` statement that raises the error would not be triggered.\r\n","author":{"url":"https://github.com/job-almekinders","@type":"Person","name":"job-almekinders"},"datePublished":"2024-09-02T13:03:45.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":12},"url":"https://github.com/4473/feast/issues/4473"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:3a96657e-e71c-1c8f-83eb-7c6f2716bf84 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 8118:330F09:268DD4A:3604CD0:6970FA47 |
| html-safe-nonce | 144a2617440823a64eba7b8d86c1b1c63f5332abe2883603795327f8fb0878d1 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4MTE4OjMzMEYwOToyNjhERDRBOjM2MDRDRDA6Njk3MEZBNDciLCJ2aXNpdG9yX2lkIjoiNTUzNTUxNjc1MjM3MzQxNDQ3MSIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 2ee61153b0c638a426c32dc91d874a416cca351e87cce90dcd1ce66d5756c0b3 |
| hovercard-subject-tag | issue:2500931076 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/4473/issue_layout |
| twitter:image | https://opengraph.githubassets.com/43f3c323cace1034c78a797d13e8e74409e7eee2281cc157c532722daf8d1532/feast-dev/feast/issues/4473 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/43f3c323cace1034c78a797d13e8e74409e7eee2281cc157c532722daf8d1532/feast-dev/feast/issues/4473 |
| og:image:alt | Context Factors to Consider I call the get_online_features_async method on the FeatureStore. I am passing an "entity value" that is not materialized in the online store. For example, only driver_id... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | job-almekinders |
| hostname | github.com |
| expected-hostname | github.com |
| None | 1b239ebed690c3053869ff31a3b7597834c25673659d63e7b6fd6a9b5d7853de |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | aeacfd55297f3de5395c83f200ac35d1f474115e |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width