什么是oozie等同于Spark?(What is oozie equivalent for Spark?)

我们有非常复杂的管道,我们需要编写和安排。 我看到Hadoop生态系统有Oozie。 当我在Mesos或Standalone上运行Spark并且没有Hadoop集群时,基于Spark的作业有哪些选择?

We have very complex pipelines which we need to compose and schedule. I see that Hadoop ecosystem has Oozie for this. What are the choices for Spark based jobs when I am running Spark on Mesos or Standalone and doesn't have a Hadoop cluster?

最满意答案

与Hadoop不同,使用Spark链接事情非常简单。 因此编写Spark Scala脚本可能就足够了。 我的第一个建议就是把它绑起来。

如果你想保持它的SQL,你可以试试SparkSQL。

如果您有一个非常复杂的流程,则需要查看Google数据流https://github.com/GoogleCloudPlatform/DataflowJavaSDK 。

Unlike with Hadoop, it is pretty easy to chains things with Spark. So writing a Spark Scala script might be enough. My first recommendation is tying that.

If you like to keep it SQL like, you can try SparkSQL.

If you have a really complex flow, it is worth looking at Google data flow https://github.com/GoogleCloudPlatform/DataflowJavaSDK.

什么是oozie等同于Spark?(What is oozie equivalent for Spark?)

我们有非常复杂的管道,我们需要编写和安排。 我看到Hadoop生态系统有Oozie。 当我在Mesos或Standalone上运行Spark并且没有Hadoop集群时,基于Spark的作业有哪些选择?

We have very complex pipelines which we need to compose and schedule. I see that Hadoop ecosystem has Oozie for this. What are the choices for Spark based jobs when I am running Spark on Mesos or Standalone and doesn't have a Hadoop cluster?

最满意答案

与Hadoop不同,使用Spark链接事情非常简单。 因此编写Spark Scala脚本可能就足够了。 我的第一个建议就是把它绑起来。

如果你想保持它的SQL,你可以试试SparkSQL。

如果您有一个非常复杂的流程,则需要查看Google数据流https://github.com/GoogleCloudPlatform/DataflowJavaSDK 。

Unlike with Hadoop, it is pretty easy to chains things with Spark. So writing a Spark Scala script might be enough. My first recommendation is tying that.

If you like to keep it SQL like, you can try SparkSQL.

If you have a really complex flow, it is worth looking at Google data flow https://github.com/GoogleCloudPlatform/DataflowJavaSDK.