René's URL Explorer Experiment


Title: In BigQueryRetrievalJob.to_remote_storage(), return value is incorrect (includes all parquet files created in gcs_staging_location, not those those created in that specific call) · Issue #3712 · feast-dev/feast · GitHub

Open Graph Title: In BigQueryRetrievalJob.to_remote_storage(), return value is incorrect (includes all parquet files created in gcs_staging_location, not those those created in that specific call) · Issue #3712 · feast-dev/feast

X Title: In BigQueryRetrievalJob.to_remote_storage(), return value is incorrect (includes all parquet files created in gcs_staging_location, not those those created in that specific call) · Issue #3712 · feast-dev/feast

Description: Expected Behavior In BigQueryRetrievalJob, when I call to_remote_storage(), the return value that I would expect would be the paths of the parquet files that have been written to GCS... Current Behavior ...however, it turns out the the p...

Open Graph Description: Expected Behavior In BigQueryRetrievalJob, when I call to_remote_storage(), the return value that I would expect would be the paths of the parquet files that have been written to GCS... Current Beh...

X Description: Expected Behavior In BigQueryRetrievalJob, when I call to_remote_storage(), the return value that I would expect would be the paths of the parquet files that have been written to GCS... Current Beh...

Opengraph URL: https://github.com/feast-dev/feast/issues/3712

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"In BigQueryRetrievalJob.to_remote_storage(), return value is incorrect (includes all parquet files created in gcs_staging_location, not those those created in that specific call)","articleBody":"## Expected Behavior \r\nIn [BigQueryRetrievalJob](https://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L402), when I call [to_remote_storage](https://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L553)(), the [return value](https://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L588) that I would expect would be the paths of the parquet files that have been written to GCS...\r\n\r\n## Current Behavior\r\n...however, it turns out the the paths that are returned are all parquets ever that you have written to the bucket that you are using to store these parquets.\r\n\r\nFor example, say you set you gcs_staging_location in your feature_store.yaml to `feast-materialize-dev` and project_id to `my_feature-store`, then the self._gcs_path, as defined [here](https://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L428-L432) will be: `gs://feast-materialize-dev/my_feature_store/export/ff67c43e-7174-475f-a02c-6c7587d89731` (or some other uuid string, but you get the idea). However, the rest of the code in the to_remote_storage method returns all paths that are in the path `gs://feast-materialize-dev/export` which is not we we want, as the parquets are written to the self._gcs_path.\r\n\r\n## Steps to reproduce\r\nYou can see that the code is wrong with a simple example:\r\n\r\nCurrent code (pretty much from [this](https://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L579C9-L588). In this example you might imagine there are parquets created from the to-remote_storage call under `gs://feast-materialize-dev/ki_feature_store/export/19a1c772-1f91-44da-8486-ea476f027d93/` but from a previous call there are also some at `gs://feast-materialize-dev/ki_feature_store/export/e00597db-78d5-40e1-b125-eac903802acd/`:\r\n\r\n```python\r\n\u003e\u003e\u003e from google.cloud.storage import Client as StorageClient\r\n\u003e\u003e\u003e _gcs_path = \"gs://feast-materialize-dev/my_feature_store/export/ff67c43e-7174-475f-a02c-6c7587d89731\"\r\n\u003e\u003e\u003e bucket, prefix = _gcs_path[len(\"gs://\") :].split(\"/\", 1)\r\n\u003e\u003e\u003e print(bucket)\r\n'feast-materialize-dev'\r\n\u003e\u003e\u003e print(prefix)\r\n'my_feature_store/export/ff67c43e-7174-475f-a02c-6c7587d89731'\r\n\u003e\u003e\u003e prefix = prefix.rsplit(\"/\", 1)[0]  # THIS IS THE LINE THAT WE DO NOT WANT\r\n\u003e\u003e\u003e print(prefix)\r\n'my_feature_store/export'\r\n\u003e\u003e\u003e if prefix.startswith(\"/\"):\r\n\u003e\u003e\u003e     prefix = prefix[1:]\r\n\u003e\u003e\u003e print(prefix)\r\n'my_feature_store/export'\r\n\r\n\u003e\u003e\u003e storage_client = StorageClient()\r\n\u003e\u003e\u003e blobs =  storage_client.list_blobs(bucket, prefix=prefix)`\r\n\u003e\u003e\u003e results = []\r\n\u003e\u003e\u003e for b in blobs:\r\n\u003e\u003e\u003e     results.append(f\"gs://{b.bucket.name}/{b.name}\")\r\n\u003e\u003e\u003e print(results)\r\n[\"gs://feast-materialize-dev/my_feature_store/export/19a1c772-1f91-44da-8486-ea476f027d93/000000000000.parquet\", \"gs://feast-materialize-dev/my_feature_store/export/19a1c772-1f91-44da-8486-ea476f027d93/000000000001.parquet\", \"gs://feast-materialize-dev/my_feature_store/export/e00597db-78d5-40e1-b125-eac903802acd/000000000000.parquet\", \"gs://feast-materialize-dev/my_feature_store/export/e00597db-78d5-40e1-b125-eac903802acd/000000000001.parquet\"] \r\n```\r\n\r\nYou can see in this example, there are parquets paths returned that are not [art of the self._gcs_path and therefore the write to gcs that occurred in this call. This is not what i would expect.\r\n\r\n## Possible Solution\r\nThe corrected code would simply not include the line `prefix = prefix.rsplit(\"/\", 1)[0]`\r\n\r\n","author":{"url":"https://github.com/crispin-ki","@type":"Person","name":"crispin-ki"},"datePublished":"2023-08-07T17:28:47.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":2},"url":"https://github.com/3712/feast/issues/3712"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:4d90c84e-1a2f-cdd4-8349-7683f0ee46e2
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idA144:344C46:2BF551C:3E3555F:697913C0
html-safe-nonce38126b7aebc98d226770ee9f60533e3e80fa5936031227b58f400aabf5ccc7a4
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJBMTQ0OjM0NEM0NjoyQkY1NTFDOjNFMzU1NUY6Njk3OTEzQzAiLCJ2aXNpdG9yX2lkIjoiMTcyODkxMTc4NjcwODIyMzM2IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmac87935b97ca967912bf876898749e48d495123b42e2d034e5ee29da39c86f90ca
hovercard-subject-tagissue:1839925268
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/3712/issue_layout
twitter:imagehttps://opengraph.githubassets.com/55e19bde4c0c29519df0da36ec6a02f593eb8484e6e43c56d9b94dd972914271/feast-dev/feast/issues/3712
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/55e19bde4c0c29519df0da36ec6a02f593eb8484e6e43c56d9b94dd972914271/feast-dev/feast/issues/3712
og:image:altExpected Behavior In BigQueryRetrievalJob, when I call to_remote_storage(), the return value that I would expect would be the paths of the parquet files that have been written to GCS... Current Beh...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernamecrispin-ki
hostnamegithub.com
expected-hostnamegithub.com
Nonedb675ffbe86f3a08023aaf76f083fc7f65e074708cdc617650b84119176f1009
turbo-cache-controlno-preview
go-importgithub.com/feast-dev/feast git https://github.com/feast-dev/feast.git
octolytics-dimension-user_id57027613
octolytics-dimension-user_loginfeast-dev
octolytics-dimension-repository_id161133770
octolytics-dimension-repository_nwofeast-dev/feast
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id161133770
octolytics-dimension-repository_network_root_nwofeast-dev/feast
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release3e6c9f597d227b0490794716e8b9dddd21a41ead
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/feast-dev/feast/issues/3712#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F3712
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F3712
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=feast-dev%2Ffeast
Reloadhttps://github.com/feast-dev/feast/issues/3712
Reloadhttps://github.com/feast-dev/feast/issues/3712
Reloadhttps://github.com/feast-dev/feast/issues/3712
feast-dev https://github.com/feast-dev
feasthttps://github.com/feast-dev/feast
Notifications https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Fork 1.2k https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Star 6.7k https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Code https://github.com/feast-dev/feast
Issues 181 https://github.com/feast-dev/feast/issues
Pull requests 67 https://github.com/feast-dev/feast/pulls
Discussions https://github.com/feast-dev/feast/discussions
Actions https://github.com/feast-dev/feast/actions
Security 0 https://github.com/feast-dev/feast/security
Insights https://github.com/feast-dev/feast/pulse
Code https://github.com/feast-dev/feast
Issues https://github.com/feast-dev/feast/issues
Pull requests https://github.com/feast-dev/feast/pulls
Discussions https://github.com/feast-dev/feast/discussions
Actions https://github.com/feast-dev/feast/actions
Security https://github.com/feast-dev/feast/security
Insights https://github.com/feast-dev/feast/pulse
New issuehttps://github.com/login?return_to=https://github.com/feast-dev/feast/issues/3712
New issuehttps://github.com/login?return_to=https://github.com/feast-dev/feast/issues/3712
#3730https://github.com/feast-dev/feast/pull/3730
In BigQueryRetrievalJob.to_remote_storage(), return value is incorrect (includes all parquet files created in gcs_staging_location, not those those created in that specific call)https://github.com/feast-dev/feast/issues/3712#top
#3730https://github.com/feast-dev/feast/pull/3730
kind/bughttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fbug%22
priority/p2https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22priority%2Fp2%22
https://github.com/crispin-ki
https://github.com/crispin-ki
crispin-kihttps://github.com/crispin-ki
on Aug 7, 2023https://github.com/feast-dev/feast/issues/3712#issue-1839925268
BigQueryRetrievalJobhttps://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L402
to_remote_storagehttps://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L553
return valuehttps://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L588
herehttps://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L428-L432
thishttps://github.com/feast-dev/feast/blob/c75a01fce2d52cd18479ace748b8eb2e6c81c988/sdk/python/feast/infra/offline_stores/bigquery.py#L579C9-L588
kind/bughttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fbug%22
priority/p2https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22priority%2Fp2%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.