Dictionary<TKey, TValue>是 C# 中非常流行的数据结构,也是面试问题的热门选择。我已经使用了 10 亿次,我非常确定我了解它们的工作原理。但是,当我更深入地研究它们并检查实际代码时,我发现它们比我想象的还要好(也许您也是如此)。在本文中,我们将一起进行深入研究,甚至编写我们自己的词典教育副本。所以和我一起开始吧!Dictionary
为了确保我们的副本与实际代码匹配,我们将首先探索原始代码中的内容,然后删除所有不必要的内容,并在需要的地方添加日志 ()。只需复制两个主要方法就足够了:Add 和 GetValueOrDefault 来重新创建 的所有基本要素,因此这就是我们要执行的操作。但首先,让我们看看 a 中的字段 :Console.WriteLineDictionaryDictionary
我将使用参考源中的 .NET Framework 4.8 中的代码。尽管现代 .NET 代码稍微复杂一些,但基本要素仍然相同,因此 Framework 版本应该可以。
private struct Entry {
public int hashCode;
public int next;
public TKey key;
public TValue value;
}
private int[] buckets;
private Entry[] entries;
private int count;
private int version;
private int freeList;
private int freeCount;
private IEqualityComparer<TKey> comparer;
和 properties 是优化技术的一部分,对于 a 的运行不是必需的 - 我们将把它们传递给我们的 replica。 只是一个 changes 计数器,我们也会传递它。为简单起见,我们还将使用而不是允许外部配置。添加打印当前状态(所有属性的值)的方法,我们将得到:freeListfreeCountDictionaryversionEqualityComparer<TKey>.Default
public class EducationalDictionary<TKey, TValue>
{
private struct Entry {
public int hashCode;
public int next;
public TKey key;
public TValue? value;
string ValueString => value == null ? "null" : value!.ToString()!;
public override string ToString()
{
return $"{key} - {ValueString} + (next = {next})";
}
}
private int[] buckets;
private Entry[] entries;
private int count;
private readonly EqualityComparer<TKey> comparer = EqualityComparer<TKey>.Default;
public void PrintFullState(string preface)
{
Console.WriteLine();
Console.Write(this.ToString(preface));
}
public string ToString(string preface)
{
StringBuilder result = new();
result.AppendLine($"{preface}, state:");
result.AppendLine();
result.AppendLine($"buckets: [{String.Join(", ", buckets!)}]");
result.AppendLine($"entries:");
for (int i = 0; i < entries.Length; i++)
{
result.AppendLine($" [{i}] = {entries[i]}");
}
result.AppendLine($"count: {count}");
return result.ToString();
}
}
源代码中的方法实质上是对另一个方法的调用:Add
public void Add(TKey key, TValue value) {
Insert(key, value, true);
}
该方法可能看起来很可怕,但别担心,我们会一点一点地剖析它:Insert
private void Insert(TKey key, TValue value, bool add) {
if( key == null ) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets == null) Initialize(0);
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;
#if FEATURE_RANDOMIZED_STRING_HASHING
int collisionCount = 0;
#endif
for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
if (add) {
ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
}
entries[i].value = value;
version++;
return;
}
#if FEATURE_RANDOMIZED_STRING_HASHING
collisionCount++;
#endif
}
int index;
if (freeCount > 0) {
index = freeList;
freeList = entries[index].next;
freeCount--;
}
else {
if (count == entries.Length)
{
Resize();
targetBucket = hashCode % buckets.Length;
}
index = count;
count++;
}
entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;
version++;
#if FEATURE_RANDOMIZED_STRING_HASHING
#if FEATURE_CORECLR
// In case we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
// in this case will be EqualityComparer<string>.Default.
// Note, randomized string hashing is turned on by default on coreclr so EqualityComparer<string>.Default will
// be using randomized string hashing
if (collisionCount > HashHelpers.HashCollisionThreshold && comparer == NonRandomizedStringEqualityComparer.Default)
{
comparer = (IEqualityComparer<TKey>) EqualityComparer<string>.Default;
Resize(entries.Length, true);
}
#else
if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer))
{
comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);
Resize(entries.Length, true);
}
#endif // FEATURE_CORECLR
#endif
}
它还调用其他方法:
private void Initialize(int capacity) {
int size = HashHelpers.GetPrime(capacity);
buckets = new int[size];
for (int i = 0; i < buckets.Length; i++) buckets[i] = -1;
entries = new Entry[size];
freeList = -1;
}
private void Resize(int newSize, bool forceNewHashCodes) {
Contract.Assert(newSize >= entries.Length);
int[] newBuckets = new int[newSize];
for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
Entry[] newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);
if(forceNewHashCodes) {
for (int i = 0; i < count; i++) {
if(newEntries[i].hashCode != -1) {
newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
}
}
}
for (int i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
int bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;
}
最后,它们调用 上的方法。我们将复制这些,因为如果您不处理边缘情况,它们将非常微不足道。我们的版本将如下所示:HashHelpers
public static class HashHelpers
{
public static readonly int[] primes = {
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};
public static int GetPrime(int min)
{
foreach (var prime in primes)
{
if (prime >= min) return prime;
}
return min;
}
public static int ExpandPrime(int oldSize)
{
return GetPrime(2 * oldSize);
}
}
因为我们将在调整大小完成后删除 logic 和 print state 。老实说,代码并不是很重要,它本质上是重新排列和针对新大小。这是我们的结果:Resizeif (forceNewHashCodes)entriesbuckets
private void Resize(int newSize) {
var newBuckets = new int[newSize];
for (var i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
var newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);
for (var i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
var bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;
PrintFullState("\u2194\ufe0f Resize");
}
我们将初始化逻辑从 straight 移动到构造函数(因为它将更容易处理可空性):if (buckets == null) Initialize(0);
以及初始化完成后的打印状态
public EducationalDictionary()
{
var size = HashHelpers.GetPrime(0);
buckets = new int[size];
for (var i = 0; i < buckets.Length; i++) buckets[i] = -1;
entries = new Entry[size];
PrintFullState("\ud83d\ude80 Initialized");
}
以下是我们将执行的更改列表:Add
public void Add(TKey key, TValue value) {
var hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
var targetBucket = hashCode % buckets!.Length;
if (count == entries!.Length)
{
Resize(HashHelpers.ExpandPrime(count));
targetBucket = hashCode % buckets.Length;
}
var index = count;
count++;
entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;
Console.WriteLine();
PrintFullState($"\ud83d\udce5 Add: {key} - {value}. hashCode = {hashCode}, targetBucket = {targetBucket}");
}
现在看起来简单多了,对吧?现在让我们完成我们的类,实现 .GetValueOrDefault
这一次的实现相当简单,甚至在原始代码中也是如此:GetValueOrDefault
internal TValue GetValueOrDefault(TKey key) {
int i = FindEntry(key);
if (i >= 0) {
return entries[i].value;
}
return default(TValue);
}
在搜索本身(在方法中):FindEntry
private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}
这里没有太多需要清理的地方。但是我们将添加大量日志,因为这是我们最重要的逻辑
private int FindEntry(TKey key)
{
var hashCode = comparer.GetHashCode(key!) & 0x7FFFFFFF;
var initialBucketIndex = hashCode % buckets.Length;
Console.WriteLine();
Console.WriteLine($"\ud83d\udd0e Search. Key = {key}. Initial Bucket Index {initialBucketIndex}");
for (var i = buckets[initialBucketIndex]; i >= 0; i = entries[i].next)
{
Console.WriteLine();
Console.WriteLine($"Comparing key from entries[{i}] ({entries[i]}) to {key}");
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
Console.WriteLine($"Key is equal returning {i}");
return i;
}
else {
Console.WriteLine($"Key is not equal, moving to the next linked index ({entries[i].next})");
}
}
Console.WriteLine();
Console.WriteLine("Search exit condition met (i >= 0). Returning -1 (as not found)");
return -1;
}
这样就完成了我们的 .现在让我们使用它并查看我们准备的详细日志!EducationalDictionary
我们将添加 4 条记录,搜索每条记录,外加一个不存在的键,该键应返回 ,表示找不到它。测试可能如下所示:null
[TestMethod]
public void FindsMultipleKeys()
{
var dict = new EducationalDictionary<int, string>();
dict.Add(48, "John");
dict.Add(34, "Josh");
dict.Add(22, "Jack");
dict.Add(11, "Alex");
dict.GetValueOrDefault(48).Should().Be("John");
dict.GetValueOrDefault(34).Should().Be("Josh");
dict.GetValueOrDefault(22).Should().Be("Jack");
dict.GetValueOrDefault(11).Should().Be("Alex");
dict.GetValueOrDefault(50).Should().Be(null);
}
这是我们从运行测试中得到的日志:
Initialized, state:
buckets: [-1, -1, -1]
entries:
[0] = 0 - null + (next = 0)
[1] = 0 - null + (next = 0)
[2] = 0 - null + (next = 0)
count: 0
Add: 48 - John. hashCode = 48, targetBucket = 0, state:
buckets: [0, -1, -1]
entries:
[0] = 48 - John + (next = -1)
[1] = 0 - null + (next = 0)
[2] = 0 - null + (next = 0)
count: 1
Add: 34 - Josh. hashCode = 34, targetBucket = 1, state:
buckets: [0, 1, -1]
entries:
[0] = 48 - John + (next = -1)
[1] = 34 - Josh + (next = -1)
[2] = 0 - null + (next = 0)
count: 2
Add: 22 - Jack. hashCode = 22, targetBucket = 1, state:
buckets: [0, 2, -1]
entries:
[0] = 48 - John + (next = -1)
[1] = 34 - Josh + (next = -1)
[2] = 22 - Jack + (next = 1)
count: 3
Resize, state:
buckets: [-1, 2, -1, -1, -1, -1, 1]
entries:
[0] = 48 - John + (next = -1)
[1] = 34 - Josh + (next = 0)
[2] = 22 - Jack + (next = -1)
[3] = 0 - null + (next = 0)
[4] = 0 - null + (next = 0)
[5] = 0 - null + (next = 0)
[6] = 0 - null + (next = 0)
count: 3
Add: 11 - Alex. hashCode = 11, targetBucket = 4, state:
buckets: [-1, 2, -1, -1, 3, -1, 1]
entries:
[0] = 48 - John + (next = -1)
[1] = 34 - Josh + (next = 0)
[2] = 22 - Jack + (next = -1)
[3] = 11 - Alex + (next = -1)
[4] = 0 - null + (next = 0)
[5] = 0 - null + (next = 0)
[6] = 0 - null + (next = 0)
count: 4
Search. Key = 48. Initial Bucket Index 6
Comparing key from entries[1] (34 - Josh + (next = 0)) to 48
Key is not equal, moving to the next linked index (0)
Comparing key from entries[0] (48 - John + (next = -1)) to 48
Key is equal returning 0
Search. Key = 34. Initial Bucket Index 6
Comparing key from entries[1] (34 - Josh + (next = 0)) to 34
Key is equal returning 1
Search. Key = 22. Initial Bucket Index 1
Comparing key from entries[2] (22 - Jack + (next = -1)) to 22
Key is equal returning 2
Search. Key = 11. Initial Bucket Index 4
Comparing key from entries[3] (11 - Alex + (next = -1)) to 11
Key is equal returning 3
Search. Key = 50. Initial Bucket Index 1
Comparing key from entries[2] (22 - Jack + (next = -1)) to 50
Key is not equal, moving to the next linked index (-1)
Search exit condition met (i >= 0). Returning -1 (as not found)
有了代码和日志,让我们把我们发现的 A 作品Dictionary
通过上面的代码和日志,我们可以解释 a 如何搜索值:Dictionary
使用该算法,搜索和插入操作都非常快,并且不需要太多内存!