新手用C++写了个泛型堆，效率竟比STL的更快？

文章目录关于为什么突然想写一个模板类？对刚用模板的C++新手而言的几大坑点习惯性写.h和.cpp文件对模板特化运用和理解很少我的Heap实现总览Heap类一、成员变量二、静态成员函数三、类的内部成员函数(放源代码详解)关于堆化函数的实现关于所用模板的说明整合所有代码，实现Heap类模板我的Heap测试正确性测试(与STL priority_queue对比)效率测试(与STL priority_

C+G

546人浏览 · 2021-10-03 02:44:13

C+G · 2021-10-03 02:44:13 发布

文章目录

关于为什么突然想写一个模板类？

嗯。。主要是因为最近在翻看《STL源码剖析》，然后发现原来STL源码是如此的庞大且复杂，而又及其具有条理，而其中最难的就是各个组件的关系，而对外所展现的效果就是泛型编程，对我这个初入门菜鸟来说的话，我之前对模板仅仅停留在知道，但没用过的阶段🤣

由于STL库中对模板的骚操作一个接一个，没个模板的基础，根本就看不懂，所以我痛定思痛，一定要亲手用模板实现一个数据结构(之前用非模板实现过一些)。

好了，目标有了，用模板实现一个数据结构，那数据结构这么多，我到底实现哪个呢？要不就把堆给冲了？我一拍大腿，可行！👍

最后关于它的具体实现，我肯定不会像STL那样考虑的那么周到，组件分工齐全，毕竟STL是包含六大组件的！
具体如下图：

对刚用模板的C++新手而言的几大坑点

习惯性写.h和.cpp文件

对于对C++有一定掌握，但基本没用过模板的人而言，声明与定义早就烂熟于心的.h和.cpp文件。

然而，当你使用模板的时候，再去吧声明和定义分开，将会产生一个错误！除非在.h文件末尾有包含.cpp文件。然而这样的操作是很多余的！

所以在用模板实现类的时候，最好只写一个.h文件！

对模板特化运用和理解很少

我现在就处于这个状态，说不上来该怎么规范🤣

我的Heap实现

总览Heap类

这是我画的树状图，我设计的Heap一共分为以下三个部分：

一、成员变量

这部分没啥好说的，就用一个 nums 指针存储变量数据，length 记录当前的 nums 已经使用的长度，capacity 存储当前 nums 的最大容量。

二、静态成员函数

我将堆的基本操作封装为静态成员变量的初衷是方便，对外实现堆化的功能，即使使用外部数组也能实现堆。
下面简单的介绍一下这几个函数（后面再详解）：

sift_down()：接受四个参数，依次是用于向下堆化的数组，起始的堆化位置，结束的堆化位置，以及一个仿函数(用于设定最大/最小堆)。

static void sift_down(T &nums, size_t start, size_t len, _CMP cmp);

sift_up()：向上堆化的操作，与上面的类似，少了结束信息，因为结束信息一般都是0.

static void sift_up(T &nums, size_t start, _CMP cmp);

heapify()：调用向下堆化函数，实现完全堆化，接收两个参数，待堆化数组和数组长度

 static void heapify(T &nums, size_t len);

print()：主要实现一个用于测试功能的泛型打印接口，接收的参数肯定就是数组和数组长度了。(略过，不重要)

三、类的内部成员函数(放源代码详解)

构造和析构函数：很简单的，我直接放源代码出来。

Heap() : length(0), capacity(1) {//暂时没有对构造函数拓展的打算
     nums = new _T[capacity];
}
~Heap(){
     delete []nums;
}

push()：每次入队后进行一次向上堆化，这个简单，直接调上面的静态接口就行了。注意：我这里采用的是倍增的方式进行延伸内存，一旦出现capacity不够用的情况，就重新分配内存，其中数据拷贝方面用了copy函数。

void push(_T val) {
    if (length >= capacity) {//两倍两倍的扩容
        _T *t = nums;
        capacity *= 2;
        nums = new _T[capacity];
        std::copy(t, t + length, nums);
        delete[] t;
    }
    nums[length] = val;
    sift_up(nums, length, _CMP());
    length++;
}

pop()：这个操作是需要一点技巧的–为了不破坏整体堆化的结构，我们直接把根部与尾部元素进行交换，再从根部往下重新堆化一次(注意此时堆化的终点应该要比原来小1)。

还采取了C语言的断言方式，防止pop操作的时候，堆中已经没有元素。

void pop() {
    if (length == 0)
        assert(0);
    length--;
    std::swap(nums[0], nums[length]);//实际上pop操作就相当于堆排的一次过程
    sift_down(nums, 0, length, _CMP());
}

top()：直接取根节点的值。

_T top() {
    if (length == 0)
        assert(0);
    return nums[0];
}

print()：用于打印内部数据的接口。

void print() {//内部print方便测试
    print(nums, length);//调用静态print
}

关于堆化函数的实现

我直接开放代码实现，具体的理解靠大家了。

sift_down()

template<typename T>
static void sift_down(T &nums, size_t start, size_t len, _CMP cmp) {//最小堆还是最大堆由cmp决定
    int end = len;
    int parent = start;
    int child = parent * 2 + 1;
    while (child < end) {
        if (child + 1 < end && cmp(nums[child], nums[child + 1]))
            child++;
        if (!cmp(nums[parent], nums[child])) {
            break;
        } else {
            std::swap(nums[parent], nums[child]);
            parent = child;
            child = parent * 2 + 1;
        }
    }
}

sift_up

template<typename T>
static void sift_up(T &nums, size_t start, _CMP cmp) {//最小堆还是最大堆由cmp决定
    int end = 0;
    int child = start;
    int parent = (child - 1) / 2;
    while (child > end) {
        if (!cmp(nums[parent], nums[child])) {
            break;
        } else {
            std::swap(nums[parent], nums[child]);
            child = parent;
            parent = (child - 1) / 2;
        }
    }
}

heapify()

template<typename T>
static void heapify(T &nums, size_t len) {//用于对数组进行堆化
    for (int i = len - 1; i >= 0; i--) {
        sift_down<T>(nums, i, len, _CMP());
    }
}

print()

template<typename T>
static void print(T &nums, size_t length) {
    for (int i = 0; i < length; i++) {
        std::cout << nums[i] << ' ';
    }
}

关于所用模板的说明

类模板(全局均可用的类型参数)：

其中 _T 用于表示数组元素的类型，默认为int，_CMP 则是一个类(仿函数)的类型，默认为已经提供的cmp仿函数，所以如果为自定义类型构建堆，则一定要自己写好相应的仿函数进行传递。

template<typename T = int> //用于默认排序的仿函数，默认为大顶堆
class cmp {
public:
    bool operator()(T &a, T &b) {
        return a < b;
    }
};
template<typename _T = int, typename _CMP = cmp<_T>>

成员函数模板

为了便于外界使用静态成员函数进行相应的操作，所以每个静态成员函数都提供了对应的类型模板，但比较操作的仿函数用的还是和类模板一样的类型。

如：

template<typename T>
static void sift_up(T &nums, size_t start, _CMP cmp) {//最小堆还是最大堆由cmp决定
    int end = 0;
    int child = start;
    int parent = (child - 1) / 2;
    while (child > end) {
        if (!cmp(nums[parent], nums[child])) {
            break;
        } else {
            std::swap(nums[parent], nums[child]);
            child = parent;
            parent = (child - 1) / 2;
        }
    }
}

整合所有代码，实现Heap类模板

这里用一个命名空间套住是最好，防止命名冲突！我这里用的自己常用的网名hhh

//
// Created by Alone on 2021/10/2.
//

#ifndef MY_TINY_STL_HEAP_H
#define MY_TINY_STL_HEAP_H

#include <iostream>
#include <cassert>
#include <algorithm>

namespace L_B__ {
    template<typename T=int>//用于默认排序的仿函数，默认为大顶堆
    class cmp {
    public:
        bool operator()(T &a, T &b) {
            return a < b;
        }
    };

//@模板类的实现
    template<typename _T=int, typename _CMP = cmp<_T>>
    class Heap {
        //私有成员
        _T *nums;
        size_t length;
        size_t capacity;
    public://@静态成员函数，对外对内都能实现功能
        template<typename T>
        static void sift_down(T &nums, size_t start, size_t len, _CMP cmp) {//最小堆还是最大堆由cmp决定
            int end = len;
            int parent = start;
            int child = parent * 2 + 1;
            while (child < end) {
                if (child + 1 < end && cmp(nums[child], nums[child + 1]))
                    child++;
                if (!cmp(nums[parent], nums[child])) {
                    break;
                } else {
                    std::swap(nums[parent], nums[child]);
                    parent = child;
                    child = parent * 2 + 1;
                }
            }
        }

        template<typename T>
        static void sift_up(T &nums, size_t start, _CMP cmp) {//最小堆还是最大堆由cmp决定
            int end = 0;
            int child = start;
            int parent = (child - 1) / 2;
            while (child > end) {
                if (!cmp(nums[parent], nums[child])) {
                    break;
                } else {
                    std::swap(nums[parent], nums[child]);
                    child = parent;
                    parent = (child - 1) / 2;
                }
            }
        }

        template<typename T>
        static void heapify(T &nums, size_t len) {//用于对数组进行堆化
            for (int i = len - 1; i >= 0; i--) {
                sift_down<T>(nums, i, len, _CMP());
            }
        }

        template<typename T>
        static void print(T &nums, size_t length) {//专为打印原始数组
            for (int i = 0; i < length; i++) {
                std::cout << nums[i] << ' ';
            }
        }

    public://@基本的内部成员函数
        Heap() : length(0), capacity(1) {//暂时没有对构造函数拓展的打算
            nums = new _T[capacity];
        }

        ~Heap() {
            delete[]nums;
        }

        void push(_T val) {
            if (length >= capacity) {//两倍两倍的扩容
                _T *t = nums;
                capacity *= 2;
                nums = new _T[capacity];
                std::copy(t, t + length, nums);
                delete[] t;
            }
            nums[length] = val;
            sift_up(nums, length, _CMP());
            length++;
        }

        void pop() {
            if (length == 0)
                assert(0);
            length--;
            std::swap(nums[0], nums[length]);//实际上pop操作就相当于堆排的一次过程
            sift_down(nums, 0, length, _CMP());
        }

        _T top() {
            if (length == 0)
                assert(0);
            return nums[0];
        }

        void print() {//内部print方便测试
            print(nums, length);
        }
    };
}

#endif //MY_TINY_STL_HEAP_H

我的Heap测试

正确性测试(与STL priority_queue对比)

经历这数千万数据的测试，仍然和STL的正确性是一样的，说明正确性是有了保障的。

测试代码

#include "Heap.h"
#include <ctime>
#include <queue>
using namespace std;
#define MAX_SIZE 10000000

int main() {
    L_B__::Heap<int> a;
    priority_queue<int> Q;
    int step = 1;
    for(int i=1;i<MAX_SIZE;i++){
        a.push(step%i);
        Q.push(step%i);
        step += 2;
    }
    cout<<"STL->";
    for (int i = 0; i < 10; i++) {
        auto t = move(Q.top());
        cout << t<<' ';
        Q.pop();
    }
    cout << endl;
    cout << "My->";
    for (int i = 0; i < 10; i++) {
        auto t = move(a.top());
        cout << t<<' ';
        a.pop();
    }
}

效率测试(与STL priority_queue对比)

1000w数据两者的push和pop速度对比

欧耶！比STL快两倍！！🐱‍🏍

其实这样不是很奇怪，STL本来就要实现和考虑很多事情，各种组件的套娃，而我们只是实现了这样一个简单的堆，那当然快很多了。

测试源代码

#include "Heap.h"
#include <ctime>
#include <queue>
using namespace std;
#define MAX_SIZE 10000000

int main() {
    L_B__::Heap<int> a;
    priority_queue<int> Q;
    auto start = clock();
    //@push测试
    int step = 1;
    for(int i=1;i<MAX_SIZE;i++){
        Q.push(step%i);
        step += 2;
    }
    cout<<"STL Push:"<<clock()-start<<"ms"<<endl;
    start = clock();
    step = 1;
    for(int i=1;i<MAX_SIZE;i++){
        a.push(step%i);
        step += 2;
    }
    cout<<"My Push:"<<clock()-start<<"ms"<<endl;

    //@pop测试
    start = clock();
    for(int i=1;i<MAX_SIZE;i++){
        Q.pop();
    }
    cout<<"STL Pop:"<<clock()-start<<"ms"<<endl;
    start = clock();
    for(int i=1;i<MAX_SIZE;i++){
        a.pop();
    }
    cout<<"My Pop:"<<clock()-start<<"ms"<<endl;
}

解题实测(LeetCode 347. 前 K 个高频元素)

我去，果然比stl的快不少🤣

直接把代码cv放到LC前面，然后测试使用自定义类型的情况。
解题代码：

struct S{
    int a,b;
};
class cmp{
    public:
    bool operator()(S&a,S&b){
        return a.b<b.b;
    }
};
namespace L_B__ {
    template<typename T=int>//用于默认排序的仿函数，默认为大顶堆
    class cmp {
    public:
        bool operator()(T &a, T &b) {
            return a < b;
        }
    };

//@模板类的实现
    template<typename _T=int, typename _CMP = cmp<_T>>
    class Heap {
        //私有成员
        _T *nums;
        size_t length;
        size_t capacity;
    public://@静态成员函数，对外对内都能实现功能
        template<typename T>
        static void sift_down(T &nums, size_t start, size_t len, _CMP cmp) {//最小堆还是最大堆由cmp决定
            int end = len;
            int parent = start;
            int child = parent * 2 + 1;
            while (child < end) {
                if (child + 1 < end && cmp(nums[child], nums[child + 1]))
                    child++;
                if (!cmp(nums[parent], nums[child])) {
                    break;
                } else {
                    std::swap(nums[parent], nums[child]);
                    parent = child;
                    child = parent * 2 + 1;
                }
            }
        }

        template<typename T>
        static void sift_up(T &nums, size_t start, _CMP cmp) {//最小堆还是最大堆由cmp决定
            int end = 0;
            int child = start;
            int parent = (child - 1) / 2;
            while (child > end) {
                if (!cmp(nums[parent], nums[child])) {
                    break;
                } else {
                    std::swap(nums[parent], nums[child]);
                    child = parent;
                    parent = (child - 1) / 2;
                }
            }
        }

        template<typename T>
        static void heapify(T &nums, size_t len) {//用于对数组进行堆化
            for (int i = len - 1; i >= 0; i--) {
                sift_down<T>(nums, i, len, _CMP());
            }
        }

        template<typename T>
        static void print(T &nums, size_t length) {//专为打印原始数组
            for (int i = 0; i < length; i++) {
                std::cout << nums[i] << ' ';
            }
        }

    public://@基本的内部成员函数
        Heap() : length(0), capacity(1) {//暂时没有对构造函数拓展的打算
            nums = new _T[capacity];
        }

        ~Heap() {
            delete[]nums;
        }

        void push(_T val) {
            if (length >= capacity) {//两倍两倍的扩容
                _T *t = nums;
                capacity *= 2;
                nums = new _T[capacity];
                std::copy(t, t + length, nums);
                delete[] t;
            }
            nums[length] = val;
            sift_up(nums, length, _CMP());
            length++;
        }

        void pop() {
            if (length == 0)
                assert(0);
            length--;
            std::swap(nums[0], nums[length]);//实际上pop操作就相当于堆排的一次过程
            sift_down(nums, 0, length, _CMP());
        }

        _T top() {
            if (length == 0)
                assert(0);
            return nums[0];
        }

        void print() {//内部print方便测试
            print(nums, length);
        }
    };
}



class Solution {
public:
    vector<int> topKFrequent(vector<int>& nums, int k) {
            unordered_map<int,int>t;
            //开始映射hashmap
            for(int s:nums){
                t[s]++;
            }
            L_B__::Heap<S,cmp>Q;
            for(unordered_map<int,int>::iterator it=t.begin();it!=t.end();it++){
                Q.push({it->first,it->second});
            }
            vector<int>res;
            //得到前k个最大的元素
            for(int i=0;i<k;i++){
                auto t = Q.top();Q.pop();
                res.push_back(t.a);
            }
        return res;
    }
};