问二进制一维Numpy数组到整数的快速一对一映射
EN

Stack Overflow用户

提问于 2017-01-12 08:16:49

回答 2查看 551关注 0票数 2

将只包含0和1的一维Numpy数组转换为唯一整数的最快方法是什么？

到目前为止，我想出的最好方法是使用Cython，并将数组看作镜像的二进制数字*。

@cython.boundscheck(False)
@cython.wraparound(False)
def _map_binary(np.ndarray[np.int64_t, ndim=1] x):

    cdef int tot = 0
    cdef int i
    cdef int n = x.shape[0]

    for i in xrange(n):
        if x[i]:
            tot += 2**i
    return tot

由于这仍然是我算法的瓶颈，我想知道是否有一种更聪明、更快的方法来实现它。

*很明显，这种映射只对长度相同的数组是一对一的(因为在数组中添加零不会改变所产生的整数)，但就我的目的而言，这是很好的。

cython

python

performance

numpy

mapping

回答 2

Stack Overflow用户

发布于 2017-01-13 02:07:28

整理评论中的一些观点和我自己的一些观点

一些纯Python (+numpy)选项：

import numpy as np

def _map_binary_str_and_back(x):
    # from comment by @NickA
    return int("".join([str(c) for c in x]),2)

def _map_binary_np_dot(x):
    # from question http://stackoverflow.com/questions/41069825/convert-binary-01-numpy-to-integer-or-binary-string
    return np.dot(x,1 << np.arange(x.size))

def _map_binary_np_pack(x):
    # uses built in numpy function packbits, but unfortunately needs a bit of manipulation
    # afterwards
    x = np.packbits(x) # as np.int8
    x.resize((8,),refcheck=False)
    return x.view(dtype=np.int64)

一些Cython选项(请注意，我已经将输出更改为64位整数，以便它可以处理长达64个元素的数组)：

cimport cython
cimport numpy as np

def _map_binary_original(np.ndarray[np.int64_t, ndim=1] x):

    cdef np.uint64_t tot = 0
    cdef np.uint64_t i
    cdef int n = x.shape[0]

    for i in xrange(n):
        if x[i]:
            tot += 2**i
    return tot

@cython.boundscheck(False)
@cython.wraparound(False)
def _map_binary_contig(np.ndarray[np.int64_t, ndim=1, mode="c"] x):

    cdef np.uint64_t tot = 0
    cdef np.uint64_t i
    cdef int n = x.shape[0]

    for i in xrange(n):
        if x[i]:
            tot += 2**i
    return tot

@cython.boundscheck(False)
@cython.wraparound(False)    
def _map_binary_shift(np.ndarray[np.int64_t, ndim=1, mode="c"] x):

    cdef np.uint64_t tot = 0
    cdef np.uint64_t i
    cdef int n = x.shape[0]

    for i in xrange(n):
        if x[i]:
            tot += 1<<i
    return tot

@cython.boundscheck(False)
@cython.wraparound(False)    
def _map_binary_times2(np.ndarray[np.int64_t, ndim=1, mode="c"] x):
    # @FranciscoCouzo

    cdef np.uint64_t tot = 0
    cdef np.uint64_t i
    cdef int n = x.shape[0]

    for i in xrange(n):
        tot *= 2
        if x[i]:            
            tot +=1 
    return tot

@cython.boundscheck(False)
@cython.wraparound(False)    
def _map_binary_times2_as_shift(np.ndarray[np.int64_t, ndim=1, mode="c"] x):

    cdef np.uint64_t tot = 0
    cdef np.uint64_t i
    cdef int n = x.shape[0]

    for i in xrange(n):
        tot *= 2
        if x[i]:            
            tot +=1 
    return tot

以及(供参考)一些计时代码

from map_binary import (_map_binary_original,_map_binary_contig,
                        _map_binary_shift,_map_binary_times2,
                        _map_binary_times2_as_shift)

test_array = np.random.randint(2,size=(60,)).astype(dtype=np.int64)

def time_function(name):
    from timeit import timeit

    num = 10000
    timed = timeit("f(x)","from __main__ import {} as f, test_array as x".format(name),number=num)/num
    print(name, timed)

for n in list(globals().keys()):
    if n.startswith('_map_binary'):
        time_function(n)

结果(为清晰起见略为修改格式)：

_map_binary_str_and_back    9.774386967484043e-05
_map_binary_np_dot          7.402434574531678e-06
_map_binary_np_pack         1.5813756692768855e-06
_map_binary_original        7.462656716457738e-07
_map_binary_contig          7.208434833198e-07
_map_binary_shift           5.84043665719558e-07
_map_binary_times2          6.467991376011505e-07
_map_binary_times2_as_shift 6.412435894529889e-07

总结如下：

在"no“版本中，使用np.packbits是最快的选择(但显然比Cython版本更糟糕)。然而，非Cython版本需要进一步的工作，以确保它们给出相同的答案(我认为点正在遭受整数溢出。packbits翻转了endianness，所以给出了一个有效但不同的答案)
指定数组的连续性会使事情变得稍微快一些。
移位似乎是最快的，其次是乘积，其次是幂(只有在使用无符号整数时，移位才是最快的)。

票数 3

Stack Overflow用户

发布于 2017-01-12 08:45:47

我不确定您的解决方案是否是算法透视图中的最佳方法，但为了编写更优化的Cython代码，我建议进行以下更改：

在Cython中使用内存视图数组
在使用Cython时使用非pythonic代码，这样Cython可以从您的python代码中生成更小的C代码。因此，不使用shape和索引，因为您有一个一维数组，您可以使用size()，而不是使用xrange和索引只是在数组上循环并在每次迭代中增加变量i (或者至少只是使用range())，这是因为xrange是一个生成器，需要更多的工作来转换为C。
使用像pow这样的C库
预定义函数的类型

form ibc.math cimport pow

@cython.boundscheck(False)
@cython.wraparound(False)
cdef int _map_binary(np.int32_t[:] x):
    cdef int tot = 0
    cdef int i = 0
    cdef int n = x.size
    cdef int item
    for item in x:
        if item:
            tot = tot + pow(2, i)
        i = i + 1
    return tot