Title: Parquet Schema Inference only supports File, not directory · Issue #2685 · feast-dev/feast · GitHub
Open Graph Title: Parquet Schema Inference only supports File, not directory · Issue #2685 · feast-dev/feast
X Title: Parquet Schema Inference only supports File, not directory · Issue #2685 · feast-dev/feast
Description: When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error: feast/sdk/python/feast/infra/offline_stores/file_source.py Lines 182 to 184 in 0...
Open Graph Description: When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error: feast/sdk/python/feast/infra/offline_sto...
X Description: When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error: feast/sdk/python/feast/infra/offline_sto...
Opengraph URL: https://github.com/feast-dev/feast/issues/2685
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Parquet Schema Inference only supports File, not directory","articleBody":"When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error:\r\n\r\nhttps://github.com/feast-dev/feast/blob/01d3568168bb9febb9fbda4988283b3886c32a31/sdk/python/feast/infra/offline_stores/file_source.py#L182-L184\r\n\r\n`OSError: Expected file path, but /home/ubuntu/project/data/driver_stats_partitioned is a directory`\r\n\r\nHow to replicate:\r\n\r\n1. Start with a demo feast project (`feast init`)\r\n2. Create a partitioned Parquet Dataset. Use the following to create a dataset with only a single timestamp for inference\r\n```\r\nimport pyarrow.parquet as pq\r\ndf = pq.read_table(\"./data/driver_stats.parquet\")\r\ndf = df.drop([\"created\"])\r\npq.write_to_dataset(df, \"./data/driver_stats_partitioned\")\r\n```\r\n3. Update the file source in `example.py` to look like this:\r\n```\r\ndriver_hourly_stats = FileSource(\r\n path=\"/home/ubuntu/cado-feast/feature_store/exciting_sunbeam/data/driver_stats_partitioned2\",\r\n)\r\n```\r\n\r\n4. Run `feast apply`\r\nFor now, I've been able to fix by updating the above lines to:\r\n```\r\nschema = ParquetDataset(\r\n path if filesystem is None else filesystem.open_input_file(path)\r\n).schema.to_arrow_schema()\r\n```","author":{"url":"https://github.com/dvanbrug","@type":"Person","name":"dvanbrug"},"datePublished":"2022-05-13T19:56:03.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/2685/feast/issues/2685"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:085cfc64-1243-9904-9be4-da6a49e06940 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9448:1B11A3:39CDF2:51ABA1:6978A2E8 |
| html-safe-nonce | c3bfc9d39784110e6148a446f838e1ed31a68cccaf20a1a431d2b1650769a8f5 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NDQ4OjFCMTFBMzozOUNERjI6NTFBQkExOjY5NzhBMkU4IiwidmlzaXRvcl9pZCI6Ijg2NzU4NjM5NDQzMzEwNDM1NjAiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | 7d98020bd3450977deedc1007b9998b95248f4ecc83d3ad115bc2690434cfb21 |
| hovercard-subject-tag | issue:1235633914 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/2685/issue_layout |
| twitter:image | https://opengraph.githubassets.com/9d1e8aa441dcdbd53d4493bea4b4d057acb68493834e97538c583ac3d600082e/feast-dev/feast/issues/2685 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/9d1e8aa441dcdbd53d4493bea4b4d057acb68493834e97538c583ac3d600082e/feast-dev/feast/issues/2685 |
| og:image:alt | When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error: feast/sdk/python/feast/infra/offline_sto... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | dvanbrug |
| hostname | github.com |
| expected-hostname | github.com |
| None | 2981c597c945c1d90ac6fa355ce7929b2f413dfe7872ca5c435ee53a24a1de50 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | f8aa86d87c47054170094daaf9699b27a28a8448 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width