René's URL Explorer Experiment


Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast · GitHub

Open Graph Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast

X Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast

Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit less straightforward than Google Cloud ...

Open Graph Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit...

X Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a...

Opengraph URL: https://github.com/feast-dev/feast/issues/362

X: @github

direct link

Domain: github.com


Hey, it has json ld scripts:
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Ingestion with Spark: Job Management for Beam Spark Runner","articleBody":"We would like to run ingestion on Spark (Streaming), i.e. with [the Beam Spark Runner][1]. Thus, an implementation of Feast's job management is needed.\r\n\r\nThere are a couple of factors that make this a bit less straightforward than Google Cloud Dataflow:\r\n\r\n1. There is not a standard remote/HTTP API for job submission and management built into Spark*.\r\n1. The Beam Spark Runner does not upload your executable job artifact and submit it for you like it does for Dataflow, because of 1 and because there is no assumption of a cloud service like GCS for where to put it—conventions vary depending on how \u0026 where organizations run Spark: they might use S3, HDFS, or an artifact repository to ferry job packages to where they're accessible from the runtime (YARN, Mesos, Kubernetes, EMR).\r\n\r\n\\* *Other than starting a [SparkContext] connected to the remote cluster, in-process in Feast Core. I feel that isn't workable for a number of reasons, not least of which are heavy dependencies on Spark as a library, and the lifecycle of streaming ingestion jobs being unnecessarily coupled to that of the Feast Core instance.*\r\n\r\n### Planned Approach\r\n\r\n#### Job Management\r\n\r\nWe initially plan to implement `JobManager` using ~~the Java client library for~~ [Apache Livy][2], a REST interface to Spark. This will use only an HTTP client, so it is light on dependencies and shouldn't get in the way of alternative `JobManager`s for Spark, should another organization wish to implement one for something other than Livy. _(Edit: turns out that Livy's `livy-http-client` artifact still depends on Spark as a library, it's not a plain REST client, so we'll avoid that…)_\r\n\r\nWe have internal experience and precedent using Livy, but not for Spark Streaming applications, so we have some uncertainties about whether it can work well. In case that it doesn't, we'll probably look to try [spark-jobserver] which does explicitly claim support for Streaming jobs.\r\n\r\n#### Ingestion Job Artifact\r\n\r\nWe're a bit less certain about how users should get the Feast ingestion Beam job artifact to their Spark cluster, due to the above mentioned variation in deployments.\r\n\r\nRoughly speaking, Feast Ingestion would be packaged as an assembly JAR that includes `beam-runners-spark` as well. So, a new `ingestion-spark` module may be added to the Maven build which is simply a POM for doing just that.\r\n\r\nDeployment itself may then need to rely on documentation.\r\n\r\n#### Beam Spark Runner\r\n\r\nA minor note, but we will use the \"legacy\", non-portable Beam Spark Runner. As [the Beam docs][1] cover, the runner based on Spark Structured Streaming is incomplete and only supports batch jobs, and the non-portable runner is still recommended for Java-only needs.\r\n\r\nIn theory this is runtime configuration for Feast users: if they want to try the portable runner, it should be possible, but we'll most likely be testing with the non-portable one.\r\n\r\ncc @smadarasmi \r\n\r\nReference issues to keep tabs on during implementation: #302, #361.\r\n\r\n[1]: https://beam.apache.org/documentation/runners/spark/\r\n[2]: https://livy.incubator.apache.org/\r\n[SparkContext]: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkContext.html\r\n[spark-jobserver]: https://github.com/spark-jobserver/spark-jobserver","author":{"url":"https://github.com/ches","@type":"Person","name":"ches"},"datePublished":"2019-12-13T02:15:42.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":6},"url":"https://github.com/362/feast/issues/362"}

route-pattern/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controllervoltron_issues_fragments
route-actionissue_layout
fetch-noncev2:730b9570-65f1-cd92-be1a-9444c0d95e40
current-catalog-service-hash81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-idDBD6:3E41F6:26450BC:3390158:697AC7BE
html-safe-nonceec003b1b46a092c728963490dce901a672d1c82eaf5f60f7773058d587721067
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEQkQ2OjNFNDFGNjoyNjQ1MEJDOjMzOTAxNTg6Njk3QUM3QkUiLCJ2aXNpdG9yX2lkIjoiODExODMzOTU5OTc1NjYwOTQ3MCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmacb43067dc2bb0cf6f6908c41bf85807e4f9ae70ef66baf8c8f8607f0b5f7e7465
hovercard-subject-tagissue:537310994
github-keyboard-shortcutsrepository,issues,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/issues_fragments/issue_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/362/issue_layout
twitter:imagehttps://opengraph.githubassets.com/e5025ae6a043f402aacb6349801970aed4bab248823cc0cd026b45ebcea1282a/feast-dev/feast/issues/362
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/e5025ae6a043f402aacb6349801970aed4bab248823cc0cd026b45ebcea1282a/feast-dev/feast/issues/362
og:image:altWe would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernameches
hostnamegithub.com
expected-hostnamegithub.com
None4af1ba0e68200258a80b0c5ab34f12a78bf48372a377a11e14eb668863c03b3a
turbo-cache-controlno-preview
go-importgithub.com/feast-dev/feast git https://github.com/feast-dev/feast.git
octolytics-dimension-user_id57027613
octolytics-dimension-user_loginfeast-dev
octolytics-dimension-repository_id161133770
octolytics-dimension-repository_nwofeast-dev/feast
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id161133770
octolytics-dimension-repository_network_root_nwofeast-dev/feast
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release353b231ffaec2de44db15b2e82887804ede7c21e
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/feast-dev/feast/issues/362#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F362
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F362
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=feast-dev%2Ffeast
Reloadhttps://github.com/feast-dev/feast/issues/362
Reloadhttps://github.com/feast-dev/feast/issues/362
Reloadhttps://github.com/feast-dev/feast/issues/362
feast-dev https://github.com/feast-dev
feasthttps://github.com/feast-dev/feast
Notifications https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Fork 1.2k https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Star 6.7k https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Code https://github.com/feast-dev/feast
Issues 183 https://github.com/feast-dev/feast/issues
Pull requests 68 https://github.com/feast-dev/feast/pulls
Discussions https://github.com/feast-dev/feast/discussions
Actions https://github.com/feast-dev/feast/actions
Security 0 https://github.com/feast-dev/feast/security
Insights https://github.com/feast-dev/feast/pulse
Code https://github.com/feast-dev/feast
Issues https://github.com/feast-dev/feast/issues
Pull requests https://github.com/feast-dev/feast/pulls
Discussions https://github.com/feast-dev/feast/discussions
Actions https://github.com/feast-dev/feast/actions
Security https://github.com/feast-dev/feast/security
Insights https://github.com/feast-dev/feast/pulse
New issuehttps://github.com/login?return_to=https://github.com/feast-dev/feast/issues/362
New issuehttps://github.com/login?return_to=https://github.com/feast-dev/feast/issues/362
Ingestion with Spark: Job Management for Beam Spark Runnerhttps://github.com/feast-dev/feast/issues/362#top
area/job-managementhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22area%2Fjob-management%22
keep-openhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22keep-open%22
kind/discussionhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fdiscussion%22
kind/featureNew feature or requesthttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Ffeature%22
https://github.com/ches
https://github.com/ches
cheshttps://github.com/ches
on Dec 13, 2019https://github.com/feast-dev/feast/issues/362#issue-537310994
the Beam Spark Runnerhttps://beam.apache.org/documentation/runners/spark/
SparkContexthttps://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkContext.html
Apache Livyhttps://livy.incubator.apache.org/
spark-jobserverhttps://github.com/spark-jobserver/spark-jobserver
the Beam docshttps://beam.apache.org/documentation/runners/spark/
@smadarasmihttps://github.com/smadarasmi
#302https://github.com/feast-dev/feast/issues/302
#361https://github.com/feast-dev/feast/pull/361
area/job-managementhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22area%2Fjob-management%22
keep-openhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22keep-open%22
kind/discussionhttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fdiscussion%22
kind/featureNew feature or requesthttps://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Ffeature%22
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.