Dask数组from_from_stack错过了信息文件(Dask array from_npy_stack misses info file)

行动

试图从一堆非由Dask编写的.npy文件创建一个Dask数组。

问题

Dask from_npy_stack()需要一个info文件,当使用Dask创建.npy堆栈时,该文件通常由to_npy_stack()函数创建。

尝试

我发现了这个PR( https://github.com/dask/dask/pull/686 ),其中描述了如何创建信息文件

def to_npy_info(dirname, dtype, chunks, axis): with open(os.path.join(dirname, 'info'), 'wb') as f: pickle.dump({'chunks': chunks, 'dtype': x.dtype, 'axis': axis}, f)

如何加载在.npy之外创建的.npy堆栈?

from pathlib import Path import numpy as np import dask.array as da data_dir = Path('/home/tom/data/') for i in range(3): data = np.zeros((2,2)) np.save(data_dir.joinpath('{}.npy'.format(i)), data) data = da.from_npy_stack('/home/tom/data')

导致以下错误:

--------------------------------------------------------------------------- IOError Traceback (most recent call last) <ipython-input-94-54315c368240> in <module>() 9 np.save(data_dir.joinpath('{}.npy'.format(i)), data) 10 ---> 11 data = da.from_npy_stack('/home/tom/data/') /home/tom/vue/env/local/lib/python2.7/site-packages/dask/array/core.pyc in from_npy_stack(dirname, mmap_mode) 3722 Read data in memory map mode 3723 """ -> 3724 with open(os.path.join(dirname, 'info'), 'rb') as f: 3725 info = pickle.load(f) 3726 IOError: [Errno 2] No such file or directory: '/home/tom/data/info'

Action

Trying to create a Dask array from a stack of .npy files not written by Dask.

Problem

Dask from_npy_stack() expects an info file, which is normally created by to_npy_stack() function when creating .npy stack with Dask.

Attempts

I found this PR (https://github.com/dask/dask/pull/686) with a description of how the info file is created

def to_npy_info(dirname, dtype, chunks, axis): with open(os.path.join(dirname, 'info'), 'wb') as f: pickle.dump({'chunks': chunks, 'dtype': x.dtype, 'axis': axis}, f)

Question

How do I go about loading .npy stacks that are created outside of Dask?

Example

from pathlib import Path import numpy as np import dask.array as da data_dir = Path('/home/tom/data/') for i in range(3): data = np.zeros((2,2)) np.save(data_dir.joinpath('{}.npy'.format(i)), data) data = da.from_npy_stack('/home/tom/data')

Resulting in the following error:

--------------------------------------------------------------------------- IOError Traceback (most recent call last) <ipython-input-94-54315c368240> in <module>() 9 np.save(data_dir.joinpath('{}.npy'.format(i)), data) 10 ---> 11 data = da.from_npy_stack('/home/tom/data/') /home/tom/vue/env/local/lib/python2.7/site-packages/dask/array/core.pyc in from_npy_stack(dirname, mmap_mode) 3722 Read data in memory map mode 3723 """ -> 3724 with open(os.path.join(dirname, 'info'), 'rb') as f: 3725 info = pickle.load(f) 3726 IOError: [Errno 2] No such file or directory: '/home/tom/data/info'

最满意答案

from_npy_stack函数简短。 同意它可能应该将元数据作为诸如你的情况的可选参数,但是假如你有正确的值,你可以在加载"info"文件后简单地使用代码行。 其中一些值,即dtype和用于制作chunks的每个数组的形状,可能可以通过查看第一个数据文件

name = 'from-npy-stack-%s' % dirname keys = list(product([name], *[range(len(c)) for c in chunks])) values = [(np.load, os.path.join(dirname, '%d.npy' % i), mmap_mode) for i in range(len(chunks[axis]))] dsk = dict(zip(keys, values)) out = Array(dsk, name, chunks, dtype)

另外,请注意,我们正在这里构建文件的名称,但您可能希望通过执行listdir或glob来获取这些文件的名称。

The function from_npy_stack is short and simple. Agree that it probably ought to take the metadata as an optional argument for cases such as yours, but you could simply use the lines of code after loading the "info" file assuming you have the right values to. Some of these values, i.e., dtype and the shape of each array for making chunks, could presumably be obtained by looking at the first of the data files

name = 'from-npy-stack-%s' % dirname keys = list(product([name], *[range(len(c)) for c in chunks])) values = [(np.load, os.path.join(dirname, '%d.npy' % i), mmap_mode) for i in range(len(chunks[axis]))] dsk = dict(zip(keys, values)) out = Array(dsk, name, chunks, dtype)

Also, note that we are constructing the names of the files here, but you might want to get those by doing a listdir or glob.

Dask数组from_from_stack错过了信息文件(Dask array from_npy_stack misses info file)

行动

试图从一堆非由Dask编写的.npy文件创建一个Dask数组。

问题

Dask from_npy_stack()需要一个info文件,当使用Dask创建.npy堆栈时,该文件通常由to_npy_stack()函数创建。

尝试

我发现了这个PR( https://github.com/dask/dask/pull/686 ),其中描述了如何创建信息文件

def to_npy_info(dirname, dtype, chunks, axis): with open(os.path.join(dirname, 'info'), 'wb') as f: pickle.dump({'chunks': chunks, 'dtype': x.dtype, 'axis': axis}, f)

如何加载在.npy之外创建的.npy堆栈?

from pathlib import Path import numpy as np import dask.array as da data_dir = Path('/home/tom/data/') for i in range(3): data = np.zeros((2,2)) np.save(data_dir.joinpath('{}.npy'.format(i)), data) data = da.from_npy_stack('/home/tom/data')

导致以下错误:

--------------------------------------------------------------------------- IOError Traceback (most recent call last) <ipython-input-94-54315c368240> in <module>() 9 np.save(data_dir.joinpath('{}.npy'.format(i)), data) 10 ---> 11 data = da.from_npy_stack('/home/tom/data/') /home/tom/vue/env/local/lib/python2.7/site-packages/dask/array/core.pyc in from_npy_stack(dirname, mmap_mode) 3722 Read data in memory map mode 3723 """ -> 3724 with open(os.path.join(dirname, 'info'), 'rb') as f: 3725 info = pickle.load(f) 3726 IOError: [Errno 2] No such file or directory: '/home/tom/data/info'

Action

Trying to create a Dask array from a stack of .npy files not written by Dask.

Problem

Dask from_npy_stack() expects an info file, which is normally created by to_npy_stack() function when creating .npy stack with Dask.

Attempts

I found this PR (https://github.com/dask/dask/pull/686) with a description of how the info file is created

def to_npy_info(dirname, dtype, chunks, axis): with open(os.path.join(dirname, 'info'), 'wb') as f: pickle.dump({'chunks': chunks, 'dtype': x.dtype, 'axis': axis}, f)

Question

How do I go about loading .npy stacks that are created outside of Dask?

Example

from pathlib import Path import numpy as np import dask.array as da data_dir = Path('/home/tom/data/') for i in range(3): data = np.zeros((2,2)) np.save(data_dir.joinpath('{}.npy'.format(i)), data) data = da.from_npy_stack('/home/tom/data')

Resulting in the following error:

--------------------------------------------------------------------------- IOError Traceback (most recent call last) <ipython-input-94-54315c368240> in <module>() 9 np.save(data_dir.joinpath('{}.npy'.format(i)), data) 10 ---> 11 data = da.from_npy_stack('/home/tom/data/') /home/tom/vue/env/local/lib/python2.7/site-packages/dask/array/core.pyc in from_npy_stack(dirname, mmap_mode) 3722 Read data in memory map mode 3723 """ -> 3724 with open(os.path.join(dirname, 'info'), 'rb') as f: 3725 info = pickle.load(f) 3726 IOError: [Errno 2] No such file or directory: '/home/tom/data/info'

最满意答案

from_npy_stack函数简短。 同意它可能应该将元数据作为诸如你的情况的可选参数,但是假如你有正确的值,你可以在加载"info"文件后简单地使用代码行。 其中一些值,即dtype和用于制作chunks的每个数组的形状,可能可以通过查看第一个数据文件

name = 'from-npy-stack-%s' % dirname keys = list(product([name], *[range(len(c)) for c in chunks])) values = [(np.load, os.path.join(dirname, '%d.npy' % i), mmap_mode) for i in range(len(chunks[axis]))] dsk = dict(zip(keys, values)) out = Array(dsk, name, chunks, dtype)

另外,请注意,我们正在这里构建文件的名称,但您可能希望通过执行listdir或glob来获取这些文件的名称。

The function from_npy_stack is short and simple. Agree that it probably ought to take the metadata as an optional argument for cases such as yours, but you could simply use the lines of code after loading the "info" file assuming you have the right values to. Some of these values, i.e., dtype and the shape of each array for making chunks, could presumably be obtained by looking at the first of the data files

name = 'from-npy-stack-%s' % dirname keys = list(product([name], *[range(len(c)) for c in chunks])) values = [(np.load, os.path.join(dirname, '%d.npy' % i), mmap_mode) for i in range(len(chunks[axis]))] dsk = dict(zip(keys, values)) out = Array(dsk, name, chunks, dtype)

Also, note that we are constructing the names of the files here, but you might want to get those by doing a listdir or glob.