Title: get_historical_features fails with dask error for file offline store · Issue #2865 · feast-dev/feast · GitHub
Open Graph Title: get_historical_features fails with dask error for file offline store · Issue #2865 · feast-dev/feast
X Title: get_historical_features fails with dask error for file offline store · Issue #2865 · feast-dev/feast
Description: Expected Behavior feature_store.get_historical_features(df, features=fs_columns).to_df() where feature_store is a feature store with file offline store and fs_columns is a list of column names, and df is a Pandas data frame, should work....
Open Graph Description: Expected Behavior feature_store.get_historical_features(df, features=fs_columns).to_df() where feature_store is a feature store with file offline store and fs_columns is a list of column names, and...
X Description: Expected Behavior feature_store.get_historical_features(df, features=fs_columns).to_df() where feature_store is a feature store with file offline store and fs_columns is a list of column names, and...
Opengraph URL: https://github.com/feast-dev/feast/issues/2865
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"get_historical_features fails with dask error for file offline store","articleBody":"## Expected Behavior \r\n\r\n```\r\nfeature_store.get_historical_features(df, features=fs_columns).to_df()\r\n```\r\n\r\nwhere `feature_store` is a feature store with file offline store and `fs_columns` is a list of column names, and `df` is a Pandas data frame, should work.\r\n\r\n## Current Behavior\r\n\r\nIt currently raises an error inside of dask:\r\n\r\n```\r\nE NotImplementedError: dd.DataFrame.apply only supports axis=1\r\nE Try: df.apply(func, axis=1)\r\n```\r\n\r\nStacktrace:\r\n\r\n```\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/infra/offline_stores/offline_store.py:81: in to_df\r\n features_df = self._to_df_internal()\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/usage.py:280: in wrapper\r\n raise exc.with_traceback(traceback)\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/usage.py:269: in wrapper\r\n return func(*args, **kwargs)\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/infra/offline_stores/file.py:75: in _to_df_internal\r\n df = self.evaluation_function().compute()\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/infra/offline_stores/file.py:231: in evaluate_historical_retrieval\r\n df_to_join = _normalize_timestamp(\r\n../../.cache/pypoetry/virtualenvs/w3-search-letor-SCEBvDm1-py3.9/lib/python3.9/site-packages/feast/infra/offline_stores/file.py:530: in _normalize_timestamp\r\n df_to_join[timestamp_field] = df_to_join[timestamp_field].apply(\r\n```\r\n\r\n## Steps to reproduce\r\n\r\nHere is my feature store definition:\r\n\r\n```python\r\nfrom feast import FeatureStore, RepoConfig, FileSource, FeatureView, ValueType, Entity, Feature\r\nfrom feast.infra.offline_stores.file import FileOfflineStoreConfig\r\nfrom google.protobuf.duration_pb2 import Duration\r\n\r\nsource_path = tmp_path / \"source.parquet\"\r\ntimestamp = datetime.datetime(year=2022, month=4, day=29, tzinfo=datetime.timezone.utc)\r\ndf = pd.DataFrame(\r\n {\r\n \"entity\": [0, 1, 2, 3, 4],\r\n \"f1\": [1.0, 1.1, 1.2, 1.3, 1.4],\r\n \"f2\": [\"a\", \"b\", \"c\", \"d\", \"e\"],\r\n \"timestamp\": [\r\n timestamp,\r\n # this one should not be fetched as it is too far into the past\r\n timestamp - datetime.timedelta(days=2),\r\n timestamp,\r\n timestamp,\r\n timestamp,\r\n ],\r\n }\r\n)\r\ndf.to_parquet(source_path)\r\nsource = FileSource(\r\n path=str(source_path),\r\n event_timestamp_column=\"timestamp\",\r\n created_timestamp_column=\"timestamp\",\r\n)\r\nentity = Entity(\r\n name=\"entity\",\r\n value_type=ValueType.INT64,\r\n description=\"Entity\",\r\n)\r\n\r\nview = FeatureView(\r\n name=\"view\",\r\n entities=[\"entity\"],\r\n ttl=Duration(seconds=86400 * 1),\r\n features=[\r\n Feature(name=\"f1\", dtype=ValueType.FLOAT),\r\n Feature(name=\"f2\", dtype=ValueType.STRING),\r\n ],\r\n online=True,\r\n batch_source=source,\r\n tags={},\r\n)\r\n\r\nconfig = RepoConfig(\r\n registry=str(tmp_path / \"registry.db\"),\r\n project=\"hello\",\r\n provider=\"local\",\r\n offline_store=FileOfflineStoreConfig(),\r\n)\r\n\r\nstore = FeatureStore(config=config)\r\nstore.apply([entity, view])\r\n\r\nexpected = pd.DataFrame(\r\n {\r\n \"event_timestamp\": timestamp,\r\n \"entity\": [0, 1, 2, 3, 5],\r\n \"someval\": [0.0, 0.1, 0.2, 0.3, 0.5],\r\n \"f1\": [1.0, np.nan, 1.2, 1.3, np.nan],\r\n \"f2\": [\"a\", np.nan, \"c\", \"d\", np.nan],\r\n }\r\n)\r\n```\r\n\r\n### Specifications\r\n\r\n- Version: 0.21.3\r\n- Platform: Linux\r\n- Subsystem: Python 3.9\r\n\r\n## Possible Solution\r\n\r\nThis works fine in at least version 0.18.1, but I think it fails for any \u003e0.20\r\n\r\nIt might have something to do with adding Dask requirement, maybe the version is insufficient? I used to use 2022.2 before, but the requirement is now for 2022.1.1. But this is just a guess, really.","author":{"url":"https://github.com/elshize","@type":"Person","name":"elshize"},"datePublished":"2022-06-27T20:31:54.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":6},"url":"https://github.com/2865/feast/issues/2865"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:8b03e502-2d8a-66be-7df8-6db5f7600385 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | D1C4:1F61EE:18B3AAB:2322405:697A4B6E |
| html-safe-nonce | c6f49826b677c103ff9fadee5d133a6388c0087ad97a7fd8ca542f371c9d5e33 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEMUM0OjFGNjFFRToxOEIzQUFCOjIzMjI0MDU6Njk3QTRCNkUiLCJ2aXNpdG9yX2lkIjoiODQ0MTM3OTM0NTgxMjU3MzAzOCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 3bb0b3299562a90c23a25c639c616b645a7bf713e98f5977aded376d6262af92 |
| hovercard-subject-tag | issue:1286322456 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/2865/issue_layout |
| twitter:image | https://opengraph.githubassets.com/c9ea4e1d375d8b6c919f118c46d9f0ef3f9e606345ffe83b92974402150ebca5/feast-dev/feast/issues/2865 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/c9ea4e1d375d8b6c919f118c46d9f0ef3f9e606345ffe83b92974402150ebca5/feast-dev/feast/issues/2865 |
| og:image:alt | Expected Behavior feature_store.get_historical_features(df, features=fs_columns).to_df() where feature_store is a feature store with file offline store and fs_columns is a list of column names, and... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | elshize |
| hostname | github.com |
| expected-hostname | github.com |
| None | 553d32486a978372da5772d723ffbf66cac04403ab1794b7225c035f1d23252c |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 8d7fdc30d680a0be278b8d9916215e64c987f09a |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width