我们有非常复杂的管道,我们需要编写和安排。 我看到Hadoop生态系统有Oozie。 当我在Mesos或Standalone上运行Spark并且没有Hadoop集群时,基于Spark的作业有哪些选择?
We have very complex pipelines which we need to compose and schedule. I see that Hadoop ecosystem has Oozie for this. What are the choices for Spark based jobs when I am running Spark on Mesos or Standalone and doesn't have a Hadoop cluster?
最满意答案
与Hadoop不同,使用Spark链接事情非常简单。 因此编写Spark Scala脚本可能就足够了。 我的第一个建议就是把它绑起来。
如果你想保持它的SQL,你可以试试SparkSQL。
如果您有一个非常复杂的流程,则需要查看Google数据流https://github.com/GoogleCloudPlatform/DataflowJavaSDK 。
Unlike with Hadoop, it is pretty easy to chains things with Spark. So writing a Spark Scala script might be enough. My first recommendation is tying that.
If you like to keep it SQL like, you can try SparkSQL.
If you have a really complex flow, it is worth looking at Google data flow https://github.com/GoogleCloudPlatform/DataflowJavaSDK.
什么是oozie等同于Spark?(What is oozie equivalent for Spark?)我们有非常复杂的管道,我们需要编写和安排。 我看到Hadoop生态系统有Oozie。 当我在Mesos或Standalone上运行Spark并且没有Hadoop集群时,基于Spark的作业有哪些选择?
We have very complex pipelines which we need to compose and schedule. I see that Hadoop ecosystem has Oozie for this. What are the choices for Spark based jobs when I am running Spark on Mesos or Standalone and doesn't have a Hadoop cluster?
最满意答案
与Hadoop不同,使用Spark链接事情非常简单。 因此编写Spark Scala脚本可能就足够了。 我的第一个建议就是把它绑起来。
如果你想保持它的SQL,你可以试试SparkSQL。
如果您有一个非常复杂的流程,则需要查看Google数据流https://github.com/GoogleCloudPlatform/DataflowJavaSDK 。
Unlike with Hadoop, it is pretty easy to chains things with Spark. So writing a Spark Scala script might be enough. My first recommendation is tying that.
If you like to keep it SQL like, you can try SparkSQL.
If you have a really complex flow, it is worth looking at Google data flow https://github.com/GoogleCloudPlatform/DataflowJavaSDK.
发布评论