当前位置：首页 > news >正文

【C++ 学习㉖】- 位图详解（哈希扩展）

news 2025/6/23 17:21:42

一、位图的概念

二、位图的实现

2.1 - bitset.h

2.2 - test.cpp

三、位图的应用

3.1 - 例题一

3.2 - 例题二

一、位图的概念

假设有这样一个需求：在 100 亿个整型数字中快速查询某个数是否存在其中，并假设是 32 位操作系统，4 GB 内存。

由于数字的数量如此之多，如果使用一个 int 型的数组进行存储，需要占用的内存空间为 $\dfrac{4 \times 10^{10}}{1024 \times 1024 \times 1024} \approx 37.25 GB$ ，那么如何用更小的空间来 "存储" 这些数字呢？

我们可以用比特位（bit）来标记数字，每个比特位中存放的值则表示其标记的数字是否存在，0 表示不存在，1 表示存在，这就是位图的基本思想。

例如，标记数字 1、2、4、6：

由于 int 总共有 $2^{32}$ 种取值，所以标记所有这些数字需要占用的内存空间为 $\dfrac{2^{32}}{8 \times 1024 \times 1024 \times 1024} = 0.5GB$ 。

二、位图的实现

2.1 - `bitset.h`

#pragma once

#include <vector>

namespace yzz
{template<size_t N>  // 总共有 N + 1 个比特位class bitset{public:bitset() : _v(N / 32 + 1) { }
void set(size_t x){size_t i = x / 32;size_t j = x % 32;_v[i] |= (1 << j);}
void reset(size_t x){size_t i = x / 32;size_t j = x % 32;_v[i] &= ~(1 << j);}
bool test(size_t x) const{size_t i = x / 32;size_t j = x % 32;return _v[i] & (1 << j);}private:std::vector<int> _v;};
}

2.2 - `test.cpp`

#include "bitset.h"
#include <iostream>
using namespace std;

int main()
{int arr[] = { -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 };yzz::bitset<0xffffffff> bs;for (const int& e : arr){bs.set(e);}bs.reset(-3);bs.reset(3);for (const int& e : arr){if (bs.test(e))cout << e << " ";}// -5 -4 -2 -1 0 1 2 4 5cout << endl;return 0;
}

三、位图的应用

位图的应用是大量数据的快速排序、查找和去重。

3.1 - 例题一

给定 100 亿个整数，找到只出现一次的所有整数。

doublebitset.h：

#pragma once

#include "bitset.h"

namespace yzz
{template<size_t N>class doublebitset{public:void set(size_t x){if (_bs1.test(x) == 0 && _bs2.test(x) == 0)  // 00 -> 01{_bs2.set(x);}else if (_bs1.test(x) == 0 && _bs2.test(x) == 1)  // 01 -> 10{_bs1.set(x);_bs2.reset(x);}// 10 则不变}
bool isSingleNum(size_t x) const{return _bs1.test(x) == 0 && _bs2.test(x) == 1;}private:bitset<N> _bs1;bitset<N> _bs2;};
}

思路：用 2 个比特位来表示一个数字的状态，00 表示不存在，01 表示只出现一次，10 表示出现一次以上。

具体实现则是使用两个位图。

思考：给定 100 亿个整数，找到出现次数不超过 2 次的所有整数。

思路是类似的，用 2 个比特位来表示一个数字的状态，00 表示不存在，01 表示只出现一次，10 表示出现两次，11 表示出现两次以上。

test.cpp：

#include "doublebitset.h"
#include <iostream>
using namespace std;

int main()
{int arr[] = { -3, -3, -2, -1, -2, 0, 1, 1, 2, 2, 3 };yzz::doublebitset<0xffffffff> dbs;for (const int& e : arr){dbs.set(e);}for (const int& e : arr){if (dbs.isSingleNum(e))cout << e << " ";}// -1 0 3cout << endl;return 0;
}

3.2 - 例题二

给两个文件，分别有 100 亿个整数，求两个文件的交集。

法一：

#include "bitset.h"
#include <iostream>
using namespace std;

int main()
{int arr1[] = { -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 };int arr2[] = { -3, -3, -2, -1, -2, 0, 1, 1, 2, 2, 3 };yzz::bitset<0xffffffff> bs1;yzz::bitset<0xffffffff> bs2;// 去重for (const int& e : arr1){bs1.set(e);}for (const int& e : arr2){bs2.set(e);}// 求交集for (int i = -10; i <= 10; ++i){if (bs1.test(i) && bs2.test(i))cout << i << " ";}// -3 -2 -1 0 1 2 3cout << endl;return 0;
}

法二：

#include "bitset.h"
#include <iostream>
using namespace std;

int main()
{int arr1[] = { -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 };int arr2[] = { -3, -3, -2, -1, -2, 0, 1, 1, 2, 2, 3 };yzz::bitset<0xffffffff> bs;for (const int& e : arr1){bs.set(e);}for (const int& e : arr2){if (bs.test(e)){cout << e << " ";bs.reset(e);  // 避免出现重复的数字}}// -3 -2 -1 0 1 2 3cout << endl;return 0;
}

查看全文

http://www.lryc.cn/news/185977.html