René's URL Explorer Experiment

Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast · GitHub

Open Graph Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast

X Title: Ingestion with Spark: Job Management for Beam Spark Runner · Issue #362 · feast-dev/feast

Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit less straightforward than Google Cloud ...

Open Graph Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit...

X Description: We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a...

Opengraph URL: https://github.com/feast-dev/feast/issues/362

X: @github

direct link

Domain: github.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Ingestion with Spark: Job Management for Beam Spark Runner","articleBody":"We would like to run ingestion on Spark (Streaming), i.e. with [the Beam Spark Runner][1]. Thus, an implementation of Feast's job management is needed.\r\n\r\nThere are a couple of factors that make this a bit less straightforward than Google Cloud Dataflow:\r\n\r\n1. There is not a standard remote/HTTP API for job submission and management built into Spark*.\r\n1. The Beam Spark Runner does not upload your executable job artifact and submit it for you like it does for Dataflow, because of 1 and because there is no assumption of a cloud service like GCS for where to put it—conventions vary depending on how \u0026 where organizations run Spark: they might use S3, HDFS, or an artifact repository to ferry job packages to where they're accessible from the runtime (YARN, Mesos, Kubernetes, EMR).\r\n\r\n\\* *Other than starting a [SparkContext] connected to the remote cluster, in-process in Feast Core. I feel that isn't workable for a number of reasons, not least of which are heavy dependencies on Spark as a library, and the lifecycle of streaming ingestion jobs being unnecessarily coupled to that of the Feast Core instance.*\r\n\r\n### Planned Approach\r\n\r\n#### Job Management\r\n\r\nWe initially plan to implement `JobManager` using ~~the Java client library for~~ [Apache Livy][2], a REST interface to Spark. This will use only an HTTP client, so it is light on dependencies and shouldn't get in the way of alternative `JobManager`s for Spark, should another organization wish to implement one for something other than Livy. _(Edit: turns out that Livy's `livy-http-client` artifact still depends on Spark as a library, it's not a plain REST client, so we'll avoid that…)_\r\n\r\nWe have internal experience and precedent using Livy, but not for Spark Streaming applications, so we have some uncertainties about whether it can work well. In case that it doesn't, we'll probably look to try [spark-jobserver] which does explicitly claim support for Streaming jobs.\r\n\r\n#### Ingestion Job Artifact\r\n\r\nWe're a bit less certain about how users should get the Feast ingestion Beam job artifact to their Spark cluster, due to the above mentioned variation in deployments.\r\n\r\nRoughly speaking, Feast Ingestion would be packaged as an assembly JAR that includes `beam-runners-spark` as well. So, a new `ingestion-spark` module may be added to the Maven build which is simply a POM for doing just that.\r\n\r\nDeployment itself may then need to rely on documentation.\r\n\r\n#### Beam Spark Runner\r\n\r\nA minor note, but we will use the \"legacy\", non-portable Beam Spark Runner. As [the Beam docs][1] cover, the runner based on Spark Structured Streaming is incomplete and only supports batch jobs, and the non-portable runner is still recommended for Java-only needs.\r\n\r\nIn theory this is runtime configuration for Feast users: if they want to try the portable runner, it should be possible, but we'll most likely be testing with the non-portable one.\r\n\r\ncc @smadarasmi \r\n\r\nReference issues to keep tabs on during implementation: #302, #361.\r\n\r\n[1]: https://beam.apache.org/documentation/runners/spark/\r\n[2]: https://livy.incubator.apache.org/\r\n[SparkContext]: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkContext.html\r\n[spark-jobserver]: https://github.com/spark-jobserver/spark-jobserver","author":{"url":"https://github.com/ches","@type":"Person","name":"ches"},"datePublished":"2019-12-13T02:15:42.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":6},"url":"https://github.com/362/feast/issues/362"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:730b9570-65f1-cd92-be1a-9444c0d95e40
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	DBD6:3E41F6:26450BC:3390158:697AC7BE
html-safe-nonce	ec003b1b46a092c728963490dce901a672d1c82eaf5f60f7773058d587721067
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJEQkQ2OjNFNDFGNjoyNjQ1MEJDOjMzOTAxNTg6Njk3QUM3QkUiLCJ2aXNpdG9yX2lkIjoiODExODMzOTU5OTc1NjYwOTQ3MCIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9
visitor-hmac	b43067dc2bb0cf6f6908c41bf85807e4f9ae70ef66baf8c8f8607f0b5f7e7465
hovercard-subject-tag	issue:537310994
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/362/issue_layout
twitter:image	https://opengraph.githubassets.com/e5025ae6a043f402aacb6349801970aed4bab248823cc0cd026b45ebcea1282a/feast-dev/feast/issues/362
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/e5025ae6a043f402aacb6349801970aed4bab248823cc0cd026b45ebcea1282a/feast-dev/feast/issues/362
og:image:alt	We would like to run ingestion on Spark (Streaming), i.e. with the Beam Spark Runner. Thus, an implementation of Feast's job management is needed. There are a couple of factors that make this a bit...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	ches
hostname	github.com
expected-hostname	github.com
None	4af1ba0e68200258a80b0c5ab34f12a78bf48372a377a11e14eb668863c03b3a
turbo-cache-control	no-preview
go-import	github.com/feast-dev/feast git https://github.com/feast-dev/feast.git
octolytics-dimension-user_id	57027613
octolytics-dimension-user_login	feast-dev
octolytics-dimension-repository_id	161133770
octolytics-dimension-repository_nwo	feast-dev/feast
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	161133770
octolytics-dimension-repository_network_root_nwo	feast-dev/feast
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	353b231ffaec2de44db15b2e82887804ede7c21e
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/feast-dev/feast/issues/362#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F362
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Ffeast-dev%2Ffeast%2Fissues%2F362
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=feast-dev%2Ffeast
Reload	https://github.com/feast-dev/feast/issues/362
Reload	https://github.com/feast-dev/feast/issues/362
Reload	https://github.com/feast-dev/feast/issues/362
feast-dev	https://github.com/feast-dev
feast	https://github.com/feast-dev/feast
Notifications	https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Fork 1.2k	https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Star 6.7k	https://github.com/login?return_to=%2Ffeast-dev%2Ffeast
Code	https://github.com/feast-dev/feast
Issues 183	https://github.com/feast-dev/feast/issues
Pull requests 68	https://github.com/feast-dev/feast/pulls
Discussions	https://github.com/feast-dev/feast/discussions
Actions	https://github.com/feast-dev/feast/actions
Security 0	https://github.com/feast-dev/feast/security
Insights	https://github.com/feast-dev/feast/pulse
Code	https://github.com/feast-dev/feast
Issues	https://github.com/feast-dev/feast/issues
Pull requests	https://github.com/feast-dev/feast/pulls
Discussions	https://github.com/feast-dev/feast/discussions
Actions	https://github.com/feast-dev/feast/actions
Security	https://github.com/feast-dev/feast/security
Insights	https://github.com/feast-dev/feast/pulse
New issue	https://github.com/login?return_to=https://github.com/feast-dev/feast/issues/362
New issue	https://github.com/login?return_to=https://github.com/feast-dev/feast/issues/362
Ingestion with Spark: Job Management for Beam Spark Runner	https://github.com/feast-dev/feast/issues/362#top
area/job-management	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22area%2Fjob-management%22
keep-open	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22keep-open%22
kind/discussion	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fdiscussion%22
kind/featureNew feature or request	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Ffeature%22
	https://github.com/ches
	https://github.com/ches
ches	https://github.com/ches
on Dec 13, 2019	https://github.com/feast-dev/feast/issues/362#issue-537310994
the Beam Spark Runner	https://beam.apache.org/documentation/runners/spark/
SparkContext	https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkContext.html
Apache Livy	https://livy.incubator.apache.org/
spark-jobserver	https://github.com/spark-jobserver/spark-jobserver
the Beam docs	https://beam.apache.org/documentation/runners/spark/
@smadarasmi	https://github.com/smadarasmi
#302	https://github.com/feast-dev/feast/issues/302
#361	https://github.com/feast-dev/feast/pull/361
area/job-management	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22area%2Fjob-management%22
keep-open	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22keep-open%22
kind/discussion	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Fdiscussion%22
kind/featureNew feature or request	https://github.com/feast-dev/feast/issues?q=state%3Aopen%20label%3A%22kind%2Ffeature%22
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.