最近,我在Jon Clements的帮助下发现,在这个帖子中,以下代码的执行时间差别很大。
你知道为什么会这样吗?
注释: self.stream_data是一个带有许多零和int16值的向量元组,而create_ZS_data方法正在执行所谓的ZeroSuppression。
环境 输入:许多(3.5k)小文件(每个约120kb) 操作系统: Linux64 Python ver 2.6.8
基于发电机的解决方案:
def create_ZS_data(self): self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]] for row, column in itertools.product(xrange(self.rows), xrange(self.columns)) if self.stream_data[column + row * self.rows ] )Profiler信息:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file) 463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)乔恩的解决方案:
create_ZS_data(self): self.ZS_data = list() for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)): for colno, col in enumerate(cols): # col == value, (rowno, colno) = index if col: self.ZS_data.append([colno, rowno, col])Profiler信息:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)Recently, I've discovered with the help of Jon Clements in this thread that the following codes have very different execution times.
Do you have any idea why this is happening?
Comment: self.stream_data is a vector tuple with many zeros and int16 values and create_ZS_data method is performing so called ZeroSuppression.
Environment Input: Many (3.5k) small files (~120kb each) OS: Linux64 Python ver 2.6.8
Solution based on a generator:
def create_ZS_data(self): self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]] for row, column in itertools.product(xrange(self.rows), xrange(self.columns)) if self.stream_data[column + row * self.rows ] )Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file) 463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)Jon's Solution:
create_ZS_data(self): self.ZS_data = list() for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)): for colno, col in enumerate(cols): # col == value, (rowno, colno) = index if col: self.ZS_data.append([colno, rowno, col])Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)最满意答案
我查看了之前的讨论; 你似乎很困扰你的聪明理解在循环中不像源代码中的字符那样有效。 我没有指出的是,这将是我首选的实现:
def sparse_table_elements(cells, columns, rows): ncells = len(cells) non_zeros = list() for nrow in range(0, ncells, columns): row = cells[nrow:nrow+columns] for ncol, cell in enumerate(row): if cell: non_zeros.append([ncol, nrow, cell]) return non_zeros我没有测试它,但我可以理解它。 有一些事情对我来说是潜在的低效率。 重新计算两个恒定单调“无聊”指数的笛卡尔乘积必须是昂贵的:
itertools.product(xrange(self.rows), xrange(self.columns))然后使用结果[(0, 0), (0, 1), ...]从源进行单元素索引:
stream_data[column + row * self.rows]这比“Jon”的实现更像是处理更大的切片。
发电机不是保证效率的秘诀。 在这种特殊情况下,135kb的数据已经被读入核心,一个构造不良的发电机似乎在耗费你的成本。 如果您想要简洁的矩阵运算,请使用APL ; 如果你想要可读代码,不要在Python中争取狂热最小化。
I looked at the prior discussion; you seem to be troubled that your clever comprehension isn't as efficient in cycles as it is in characters of source code. What I didn't point out then was that this would be my preferred implementation to read:
def sparse_table_elements(cells, columns, rows): ncells = len(cells) non_zeros = list() for nrow in range(0, ncells, columns): row = cells[nrow:nrow+columns] for ncol, cell in enumerate(row): if cell: non_zeros.append([ncol, nrow, cell]) return non_zerosI've not tested it, but I can make sense of it. There are a couple of things that jump out at me as being potential inefficiencies. Recomputing the Cartesian product of two constant monotonically "boring" indices has got to be expensive:
itertools.product(xrange(self.rows), xrange(self.columns))you then use the results [(0, 0), (0, 1), ...] to do single element indexing from your source:
stream_data[column + row * self.rows]which is also more costly than handling larger slices as the "Jon's" implementation does.
Generators are not some secret sauce that guarantee efficiency. In this particular case, with 135kb of data that has already been read into core, a poorly constructed generator does seem to be costing you. If you want concise matrix operations, use APL; if you want readable code, don't strive for rabid minimization in Python.
为什么循环枚举比生成器快得多?(Why loop over enumerate is so much faster then a generator?)最近,我在Jon Clements的帮助下发现,在这个帖子中,以下代码的执行时间差别很大。
你知道为什么会这样吗?
注释: self.stream_data是一个带有许多零和int16值的向量元组,而create_ZS_data方法正在执行所谓的ZeroSuppression。
环境 输入:许多(3.5k)小文件(每个约120kb) 操作系统: Linux64 Python ver 2.6.8
基于发电机的解决方案:
def create_ZS_data(self): self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]] for row, column in itertools.product(xrange(self.rows), xrange(self.columns)) if self.stream_data[column + row * self.rows ] )Profiler信息:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file) 463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)乔恩的解决方案:
create_ZS_data(self): self.ZS_data = list() for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)): for colno, col in enumerate(cols): # col == value, (rowno, colno) = index if col: self.ZS_data.append([colno, rowno, col])Profiler信息:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)Recently, I've discovered with the help of Jon Clements in this thread that the following codes have very different execution times.
Do you have any idea why this is happening?
Comment: self.stream_data is a vector tuple with many zeros and int16 values and create_ZS_data method is performing so called ZeroSuppression.
Environment Input: Many (3.5k) small files (~120kb each) OS: Linux64 Python ver 2.6.8
Solution based on a generator:
def create_ZS_data(self): self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]] for row, column in itertools.product(xrange(self.rows), xrange(self.columns)) if self.stream_data[column + row * self.rows ] )Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file) 463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)Jon's Solution:
create_ZS_data(self): self.ZS_data = list() for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)): for colno, col in enumerate(cols): # col == value, (rowno, colno) = index if col: self.ZS_data.append([colno, rowno, col])Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function) 3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)最满意答案
我查看了之前的讨论; 你似乎很困扰你的聪明理解在循环中不像源代码中的字符那样有效。 我没有指出的是,这将是我首选的实现:
def sparse_table_elements(cells, columns, rows): ncells = len(cells) non_zeros = list() for nrow in range(0, ncells, columns): row = cells[nrow:nrow+columns] for ncol, cell in enumerate(row): if cell: non_zeros.append([ncol, nrow, cell]) return non_zeros我没有测试它,但我可以理解它。 有一些事情对我来说是潜在的低效率。 重新计算两个恒定单调“无聊”指数的笛卡尔乘积必须是昂贵的:
itertools.product(xrange(self.rows), xrange(self.columns))然后使用结果[(0, 0), (0, 1), ...]从源进行单元素索引:
stream_data[column + row * self.rows]这比“Jon”的实现更像是处理更大的切片。
发电机不是保证效率的秘诀。 在这种特殊情况下,135kb的数据已经被读入核心,一个构造不良的发电机似乎在耗费你的成本。 如果您想要简洁的矩阵运算,请使用APL ; 如果你想要可读代码,不要在Python中争取狂热最小化。
I looked at the prior discussion; you seem to be troubled that your clever comprehension isn't as efficient in cycles as it is in characters of source code. What I didn't point out then was that this would be my preferred implementation to read:
def sparse_table_elements(cells, columns, rows): ncells = len(cells) non_zeros = list() for nrow in range(0, ncells, columns): row = cells[nrow:nrow+columns] for ncol, cell in enumerate(row): if cell: non_zeros.append([ncol, nrow, cell]) return non_zerosI've not tested it, but I can make sense of it. There are a couple of things that jump out at me as being potential inefficiencies. Recomputing the Cartesian product of two constant monotonically "boring" indices has got to be expensive:
itertools.product(xrange(self.rows), xrange(self.columns))you then use the results [(0, 0), (0, 1), ...] to do single element indexing from your source:
stream_data[column + row * self.rows]which is also more costly than handling larger slices as the "Jon's" implementation does.
Generators are not some secret sauce that guarantee efficiency. In this particular case, with 135kb of data that has already been read into core, a poorly constructed generator does seem to be costing you. If you want concise matrix operations, use APL; if you want readable code, don't strive for rabid minimization in Python.
发布评论