与scala并行创建地图的最佳方式是什么?(What is the best way to create map in parallel with scala?)

假设我有一个应该转换为Map的集合,但不能像地图方法那样一对一。

var map = collection.mutable.HashMap() for (p <- dataList.par) { if(cond1(p)) { map += (p, true) } else { // do nothing } }

我想出了几个解决方案,并想知道什么是最好的。

map.synchronize { map += (p, true) }

使用actor来更新地图。 但我不知道如何等待所有演员的任务完成

yield Some(p) or None ,然后运行foreach { case Some(p) => map += (p, true)} 。 但是如果第一个迭代器来自并行集合,我不知道如何使其顺序化。

Suppose I have a collection which should be converted to Map, but not in one to one fashion like map method.

var map = collection.mutable.HashMap() for (p <- dataList.par) { if(cond1(p)) { map += (p, true) } else { // do nothing } }

I've come up with several solutions and want to know what is best.

map.synchronize { map += (p, true) }

use actor to update map. But I dont know how to wait till all actors task are completed

yield Some(p) or None and then run foreach { case Some(p) => map += (p, true)}. But I don't know how to make it sequential if first iterator is from parallel collections.

最满意答案

不确定实际执行得最好,但是应该对条件进行平行评估:

import scala.collection._ val map: mutable.Map[Int, Boolean] = dataList.par.collect{case p if cond1(p) => (p, true)}(breakOut)

(有一个可变的Map,因为它是你的代码做的,但这不是必需的)。

上下文必须给出: mutable.Map[Int, Boolean]的预期结果的类型(因此: mutable.Map[Int, Boolean] )才能工作。

编辑: breakOut是scala.collection.breakOut 。 集合操作返回一个集合(这里collect )采用隐式参数bf: CanBuildFrom[SourceCollectionType, ElementType, ResultType] 。 由库提供的隐式CanBuildFroms被安排,以便返回最好的结果类型,最好的意思是最接近源集合类型。 breakOut被传递来代替这个隐式参数,所以可以选择另一个CanBuildFrom ,因此可以选择结果类型。 breakOut所做的是选择CanBuildFrom而不考虑源集合类型。 但是,有很多隐含的可用的,没有优先权的规则。 这就是为什么结果类型必须由上下文给出,以便可以选择其中一个含义。

总而言之,当通过breakOut代替隐式参数时,结果将被构建为上下文中预期的类型。

Not sure that will actually perform best, but that should make evaluation of conditions parallel:

import scala.collection._ val map: mutable.Map[Int, Boolean] = dataList.par.collect{case p if cond1(p) => (p, true)}(breakOut)

(with a mutable Map as it is what your code did, but this is not required).

Context must give the type of the expected result (hence the : mutable.Map[Int, Boolean]) for breakOut to work.

Edit: breakOut is scala.collection.breakOut. Collections operation returning a collection (here collect) takes an implicit argument bf: CanBuildFrom[SourceCollectionType, ElementType, ResultType]. Implicit CanBuildFroms made available by the library are arranged so that the best possible ResultType will be returned, and best means closest to the source collection type. breakOut is passed in place of this implicit argument, so that another CanBuildFrom, hence result type, can be selected. What breakOut does is select the CanBuildFrom irrespective of the source collection type. But then there are many implicits available and no priority rule. That is why the result type must be given by the context, so that one of the implicits can be selected.

To sum up, when breakOut is passed in place of the implicit argument, the result will be built to the type expected in the context.

与scala并行创建地图的最佳方式是什么?(What is the best way to create map in parallel with scala?)

假设我有一个应该转换为Map的集合,但不能像地图方法那样一对一。

var map = collection.mutable.HashMap() for (p <- dataList.par) { if(cond1(p)) { map += (p, true) } else { // do nothing } }

我想出了几个解决方案,并想知道什么是最好的。

map.synchronize { map += (p, true) }

使用actor来更新地图。 但我不知道如何等待所有演员的任务完成

yield Some(p) or None ,然后运行foreach { case Some(p) => map += (p, true)} 。 但是如果第一个迭代器来自并行集合,我不知道如何使其顺序化。

Suppose I have a collection which should be converted to Map, but not in one to one fashion like map method.

var map = collection.mutable.HashMap() for (p <- dataList.par) { if(cond1(p)) { map += (p, true) } else { // do nothing } }

I've come up with several solutions and want to know what is best.

map.synchronize { map += (p, true) }

use actor to update map. But I dont know how to wait till all actors task are completed

yield Some(p) or None and then run foreach { case Some(p) => map += (p, true)}. But I don't know how to make it sequential if first iterator is from parallel collections.

最满意答案

不确定实际执行得最好,但是应该对条件进行平行评估:

import scala.collection._ val map: mutable.Map[Int, Boolean] = dataList.par.collect{case p if cond1(p) => (p, true)}(breakOut)

(有一个可变的Map,因为它是你的代码做的,但这不是必需的)。

上下文必须给出: mutable.Map[Int, Boolean]的预期结果的类型(因此: mutable.Map[Int, Boolean] )才能工作。

编辑: breakOut是scala.collection.breakOut 。 集合操作返回一个集合(这里collect )采用隐式参数bf: CanBuildFrom[SourceCollectionType, ElementType, ResultType] 。 由库提供的隐式CanBuildFroms被安排,以便返回最好的结果类型,最好的意思是最接近源集合类型。 breakOut被传递来代替这个隐式参数,所以可以选择另一个CanBuildFrom ,因此可以选择结果类型。 breakOut所做的是选择CanBuildFrom而不考虑源集合类型。 但是,有很多隐含的可用的,没有优先权的规则。 这就是为什么结果类型必须由上下文给出,以便可以选择其中一个含义。

总而言之,当通过breakOut代替隐式参数时,结果将被构建为上下文中预期的类型。

Not sure that will actually perform best, but that should make evaluation of conditions parallel:

import scala.collection._ val map: mutable.Map[Int, Boolean] = dataList.par.collect{case p if cond1(p) => (p, true)}(breakOut)

(with a mutable Map as it is what your code did, but this is not required).

Context must give the type of the expected result (hence the : mutable.Map[Int, Boolean]) for breakOut to work.

Edit: breakOut is scala.collection.breakOut. Collections operation returning a collection (here collect) takes an implicit argument bf: CanBuildFrom[SourceCollectionType, ElementType, ResultType]. Implicit CanBuildFroms made available by the library are arranged so that the best possible ResultType will be returned, and best means closest to the source collection type. breakOut is passed in place of this implicit argument, so that another CanBuildFrom, hence result type, can be selected. What breakOut does is select the CanBuildFrom irrespective of the source collection type. But then there are many implicits available and no priority rule. That is why the result type must be given by the context, so that one of the implicits can be selected.

To sum up, when breakOut is passed in place of the implicit argument, the result will be built to the type expected in the context.