哈希表如何在相同或碰撞值上呈线性?(How hashtables are linear on same or collision values?)

我在查看这个 StackOverflow答案,以更好地理解哈希,并看到以下内容(关于我们需要在固定时间内获取桶大小的事实):

如果使用类似线性探测或双重哈希的方法,查找所有散列为相同值的项目意味着您需要散列值,然后遍历表中非空项目的“链”,以查找其中有多少项哈希值相同。 对于散列为相同值的项目数量,这不是线性的 - 对散列为相同值或冲突值的项目数量是线性的。

这是什么意思,它是“散列到相同或相互冲突的值的项目的数量是线性的”? 难道它在哈希表中的项目总数是线性的,因为在线性探测期间它可能需要遍历每个值? 我不明白为什么它只需要通过碰撞的那些。

例如,如果我在散列表上使用线性探测(步长1),并且我有不同的键(不碰撞,所有散列到唯一值)映射到奇数索引槽1,3,5,7,9.....然后,我想插入许多所有散列到2的键,所以我用这些键填充所有我的索引点。 如果我想知道有多少键到2,我不需要通过整个哈希表? 但我不只是迭代散列到相同值或碰撞值的项,因为奇数索引插槽不会发生碰撞。

I was looking at this StackOverflow answer to understand hashing better and saw the following (regarding the fact that we would need to get bucket size in constant time):

if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.

What does this mean that it's "linear on the number of items that hashed to the same or a colliding value"? Wouldn't it be linear on total number of items in the hashtable, since it's possible that it will need to walk through every value during linear probing? I don't see why it would just have to go through the ones that collided.

Like for example, if I am using linear probing (step size 1) on a hashtable and I have different keys (not colliding, all hash to unique values) mapping to the odd index slots 1,3,5,7,9..... Then, I want to insert many keys that all hash to 2, so I fill up all my even index spots with those keys. If I wanted to know how many keys hash to 2, wouldn't I need to go through the entire hash table? But I'm not just iterating through items that hashed to the same or colliding value, since the odd index slots are not colliding.

最满意答案

哈希表在概念上类似于链接列表(表中的存储桶 )的数组(表格)。 不同之处在于您如何管理和访问该数组:使用函数来生成用于计算数组索引的数字。

一旦你有两个元素放在同一个桶中(相同的计算值,即collission),那么问题就变成了列表中的搜索。 列表中元素的数量有望低于哈希表中的全部元素(意味着其他元素存在于其他桶中)。

但是,您正在跳过该段中的重要介绍:

如果你使用类似线性探测或双重哈希的方法 ,找到所有散列为相同值的项意味着你需要对值进行散列,然后遍历表中非空项的“链”来找出其中有多少项哈希值相同。 对于散列为相同值的项目数量,这不是线性的 - 对散列为相同值或冲突值的项目数量是线性的

线性探测是一个哈希表的不同实现,其中您不使用任何列表(链)作为您的collissions。 相反,您只需从预期的位置开始寻找阵列中最接近的可用点,然后继续前进。 数组越多,发现下一个位置的机会也越高,所以你只需要继续搜索。 尽管从来没有(你并不在意)这两种情况中的哪一种,除非你在那里明确地看到现有元素的散列,否则这些位置被散列为相同或相互冲突的值使用。

这个CppCon演示视频可以很好地介绍和深入分析哈希表。

A hash table is conceptually similar to an array (table) of linked lists (bucket in the table). The difference is in how you manage and access that array: using a function to generate a number that is used to compute the array index.

Once you have two elements placed in the same bucket (the same computed value, i.e. collission), then the problem turns out to be a search in a list. The number of elements in the list is hopefully lower than the total elements in the hash table (meaning that other elements exist in other buckets).

However, you are skipping the important introduction in that paragraph:

If you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though -- it's linear on the number of items that hashed to the same or a colliding value.

Linear probing is a different implementation of a hash table in which you don't use any list (chain) for your collissions. Instead, you just find the nearest available spot in the array, starting from the expected position and going forward. The more populated the array is, the higher the chance is to find that the next position is being used too, so you just need to keep searching. The positions are used by items that hashed to the same or colliding value, although you are never (and you don't really care) which of these two cases is, unless you explicitly see the hash of the existing element there.

This CppCon presentation video makes a good introduction and in-depth analysis of hash tables.

哈希表如何在相同或碰撞值上呈线性?(How hashtables are linear on same or collision values?)

我在查看这个 StackOverflow答案,以更好地理解哈希,并看到以下内容(关于我们需要在固定时间内获取桶大小的事实):

如果使用类似线性探测或双重哈希的方法,查找所有散列为相同值的项目意味着您需要散列值,然后遍历表中非空项目的“链”,以查找其中有多少项哈希值相同。 对于散列为相同值的项目数量,这不是线性的 - 对散列为相同值或冲突值的项目数量是线性的。

这是什么意思,它是“散列到相同或相互冲突的值的项目的数量是线性的”? 难道它在哈希表中的项目总数是线性的,因为在线性探测期间它可能需要遍历每个值? 我不明白为什么它只需要通过碰撞的那些。

例如,如果我在散列表上使用线性探测(步长1),并且我有不同的键(不碰撞,所有散列到唯一值)映射到奇数索引槽1,3,5,7,9.....然后,我想插入许多所有散列到2的键,所以我用这些键填充所有我的索引点。 如果我想知道有多少键到2,我不需要通过整个哈希表? 但我不只是迭代散列到相同值或碰撞值的项,因为奇数索引插槽不会发生碰撞。

I was looking at this StackOverflow answer to understand hashing better and saw the following (regarding the fact that we would need to get bucket size in constant time):

if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.

What does this mean that it's "linear on the number of items that hashed to the same or a colliding value"? Wouldn't it be linear on total number of items in the hashtable, since it's possible that it will need to walk through every value during linear probing? I don't see why it would just have to go through the ones that collided.

Like for example, if I am using linear probing (step size 1) on a hashtable and I have different keys (not colliding, all hash to unique values) mapping to the odd index slots 1,3,5,7,9..... Then, I want to insert many keys that all hash to 2, so I fill up all my even index spots with those keys. If I wanted to know how many keys hash to 2, wouldn't I need to go through the entire hash table? But I'm not just iterating through items that hashed to the same or colliding value, since the odd index slots are not colliding.

最满意答案

哈希表在概念上类似于链接列表(表中的存储桶 )的数组(表格)。 不同之处在于您如何管理和访问该数组:使用函数来生成用于计算数组索引的数字。

一旦你有两个元素放在同一个桶中(相同的计算值,即collission),那么问题就变成了列表中的搜索。 列表中元素的数量有望低于哈希表中的全部元素(意味着其他元素存在于其他桶中)。

但是,您正在跳过该段中的重要介绍:

如果你使用类似线性探测或双重哈希的方法 ,找到所有散列为相同值的项意味着你需要对值进行散列,然后遍历表中非空项的“链”来找出其中有多少项哈希值相同。 对于散列为相同值的项目数量,这不是线性的 - 对散列为相同值或冲突值的项目数量是线性的

线性探测是一个哈希表的不同实现,其中您不使用任何列表(链)作为您的collissions。 相反,您只需从预期的位置开始寻找阵列中最接近的可用点,然后继续前进。 数组越多,发现下一个位置的机会也越高,所以你只需要继续搜索。 尽管从来没有(你并不在意)这两种情况中的哪一种,除非你在那里明确地看到现有元素的散列,否则这些位置被散列为相同或相互冲突的值使用。

这个CppCon演示视频可以很好地介绍和深入分析哈希表。

A hash table is conceptually similar to an array (table) of linked lists (bucket in the table). The difference is in how you manage and access that array: using a function to generate a number that is used to compute the array index.

Once you have two elements placed in the same bucket (the same computed value, i.e. collission), then the problem turns out to be a search in a list. The number of elements in the list is hopefully lower than the total elements in the hash table (meaning that other elements exist in other buckets).

However, you are skipping the important introduction in that paragraph:

If you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though -- it's linear on the number of items that hashed to the same or a colliding value.

Linear probing is a different implementation of a hash table in which you don't use any list (chain) for your collissions. Instead, you just find the nearest available spot in the array, starting from the expected position and going forward. The more populated the array is, the higher the chance is to find that the next position is being used too, so you just need to keep searching. The positions are used by items that hashed to the same or colliding value, although you are never (and you don't really care) which of these two cases is, unless you explicitly see the hash of the existing element there.

This CppCon presentation video makes a good introduction and in-depth analysis of hash tables.