PTA平台数据结构期末大作业：哈夫曼树小论文、代码和测试数据

文章目录0 引入1 大论文内容 (直接pta平台复制)2 代码3 测试数据string.txt4 部分图片4.1 小结2.3 1）4.2 小结2.3 2）4.3 小结2.3 4）0 引入记录我之前数据结构期末大作业，包括完整的小论文内容和C++代码。时间仓促，可能有些不足。代码相关图片请在源代码中截取，部分中间图片在文章结尾哇。1 大论文内容 (直接pta平台复制)# 1 引入**题目：*

因吉

791人浏览 · 2021-08-01 20:21:23

因吉 · 2021-08-01 20:21:23 发布

文章目录

0 引入
1 大论文内容 (直接pta平台复制)
2 代码
3 测试数据string.txt
4 部分图片

0 引入

记录我之前数据结构期末大作业，包括完整的小论文内容和C++代码。时间仓促，可能有些不足。
代码相关图片请在源代码中截取，部分中间图片在文章结尾哇。

1 大论文内容 (直接pta平台复制)

# 1 引入
**题目：**哈夫曼树
**要求：**为给定的英文文本构造哈夫曼编码，部分示例如下：
>Effificient and robust ... annotation

# 2 问题1
## 2.1 问题重述
计算每个字符出现的概率，并以概率为权重来构建哈夫曼树。
**要求：**
1）写出构造过程；
2）画出最终的哈夫曼树；
3）得到每个字符的哈夫曼编码。
**难点分析**：
字符的选取范围：根据百度百科，字符的定义为**类字形单位或符号，包括字母、数字、运算符号、标点符号和其他符号等**。
因此，本文中将大小字母、标点符号包括括号、逗号、空格等均算作单个字符。进一步考对实际应用进行考虑，故**使得大小写字母等同**。
## 2.2 字符概率的计算
字符概率的概率的计算依靠代码实现，主要步骤如下：
1）将字符存储在string.txt文件中；
2）读取文件；
3）将大写字母转换为小写字母，并统计不重复字符的个数；
4）统计每个字符出现的概率，并存储于哈希表中；
5）将哈希表中的元素存储于列表中，并返回排序后的结果。
**代码如下：**
1）函数声明及相关库：

2）主函数：

3）文件读取：

4）计算字符概率：

5）其他函数：

最终统计的字符总数为716，且**概率总结如下**：
>字符： ；概率：0.1341
字符：i；概率：0.0782
字符：e；概率：0.0768
字符：s；概率：0.0740
字符：a；概率：0.0726
字符：t；概率：0.0698
字符：n；概率：0.0628
字符：o；概率：0.0573
字符：l；概率：0.0545
字符：r；概率：0.0461
字符：f；概率：0.0377
字符：c；概率：0.0321
字符：m；概率：0.0321
字符：d；概率：0.0223
字符：h；概率：0.0209
字符：u；概率：0.0196
字符：y；概率：0.0182
字符：w；概率：0.0154
字符：g；概率：0.0140
字符：p；概率：0.0140
字符：,；概率：0.0084
字符：-；概率：0.0070
字符：.；概率：0.0070
字符：b；概率：0.0056
字符：k；概率：0.0056
字符：v；概率：0.0042
字符：(；概率：0.0028
字符：)；概率：0.0028
字符：1；概率：0.0028
字符：2；概率：0.0014

## 2.3 哈夫曼树的构造
1）选取概率最小的字符'2'和')'：

2）选取'('：

3）重复上述过程；
4）获得最终的哈夫曼树：

## 2.4 字符的哈夫曼编码
**编码如下：**
>字符： ；编码：100
字符：(；编码：010011111
字符：)；编码：00000100
字符：,；编码：000000
字符：-；编码：0100001
字符：.；编码：0100110
字符：1；编码：00000101
字符：2；编码：010011110
字符：a；编码：1100
字符：b；编码：0000011
字符：c；编码：01111
字符：d；编码：00011
字符：e；编码：1110
字符：f；编码：10111
字符：g；编码：010001
字符：h；编码：00010
字符：i；编码：1111
字符：k；编码：0100000
字符：l；编码：0011
字符：m；编码：01110
字符：n；编码：0110
字符：o；编码：0101
字符：p；编码：010010
字符：r；编码：0010
字符：s；编码：1101
字符：t；编码：1010
字符：u；编码：00001
字符：v；编码：01001110
字符：w；编码：101100
字符：y；编码：101101
# 3 问题2
## 3.1 问题重述
**要求：**
1）代码实现章节2.3中的构造过程；
2）输出每个字符的编码；
3）需要代码和运行结果截图。
## 3.2 代码实现
1）哈夫曼树的结构：

2）哈夫曼树函数声明：

3）主函数：

4）哈夫曼树的创建与声明：

5）获取哈夫曼编码：

## 3.3 运行结果

# 4 时间复杂度与空间复杂度分析
相应步骤的时间复杂度与空间度的总结如下表：


| 步骤 | 时间复杂度 | 空间复杂度 |解释|
| -------- | -------- | -------- |--|
| 文件读取     | $O(n_l)$     | $n_c$     | $n_l$为文件行数，$n_c$为字符个数
|计算权重|$O(n_c+m)$|9$m$|$m$为不重复字符的个数，char算1，double算8
|树的初始化|$O(m)$|9m|
|树的创建|$O(M)$|M|$M$为哈夫曼树的结点数
|求哈夫曼编码|$O(M\log_2M)$|不好算|

2 代码

>#pragma once
#ifndef __PTA__
#define __PTA__
>
>#include<iostream>
#include<fstream>
#include<string>
#include<map>
#include<vector>
#include <queue>
#include<algorithm>
>
>enum Status
{
	FAILED,
	SUCCESS,
}Status;
>
>// 哈夫曼树结构
class Node
{
public:
	char c;
	double weight;
	Node* left;
	Node* right;
>
>	Node(char para_c, double para_weight, Node* para_left = NULL, Node* para_right = NULL)
		:c(para_c), weight(para_weight), left(para_left), right(para_right) {}
	>
>	// 控制入队的顺序
	bool operator<(const Node& node) const
	{
		return weight > node.weight;
	}
};
>
>// 读取文件并将大写字母替换为小写字母
enum Status read_data(std::string* data, std::string file_name);
// 获取权重字典
std::map<char, double> char_count(std::string* data);
// 输出列表
std::ostream& operator<<(std::ostream& cout, std::vector<std::pair<char, double>> v);
std::ostream& operator<<(std::ostream& cout, std::vector<std::pair<char, std::string>> v);
// 用于字典排序
bool map_compare(const std::pair<char, double>& map1, const std::pair<char, double>& map2);
// 创建哈夫曼树
enum Status tree_init(std::priority_queue<Node> &q, std::vector<std::pair<char, double>> v);
// 构建哈夫曼树
enum Status tree_create(std::priority_queue<Node>& q);
// 获取哈夫曼编码
void tree_code(Node* root, std::string& prefix, std::map<char, std::string>& huff_map);;
// 展示哈夫曼编码
void show_code(std::map<char, std::string>& huff_map);
>
>int main()
{
	std::string file_name = "string.txt";
	std::string data;
	read_data(&data, file_name);
>
>	// 字符计数及权重计算
	std::cout << "字符总数：" <<data.size() << std::endl;
	std::map<char, double> map = char_count(&data);
	std::vector<std::pair<char, double>> v(map.begin(), map.end());
	std::sort(v.begin(), v.end(), map_compare);
	>
>	// 建立哈夫曼树
	std::priority_queue<Node> q;
	tree_init(q, v);
	tree_create(q);
>
>	// 获取哈夫曼编码
	Node root = q.top();
	std::string prefix = "";
	std::map<char, std::string> huff_map;
	tree_code(&root, prefix, huff_map);
	std::vector<std::pair<char, std::string>> v_huff(huff_map.begin(), huff_map.end());
	std::cout << v_huff;
	std::cout << "Done.\n";
	system("pause");
}
>
>enum Status read_data(std::string* data, std::string file_name)
{
	std::ifstream input;
	std::string temp_data;
	input.open(file_name);
	std::cout << "文件读取中...\n";
	while (getline(input, temp_data))
	{
		(*data).append(temp_data);
	}
	input.close();
>
>	std::cout << "读取完毕。\n";
	return SUCCESS;
}
>
>std::map<char, double> char_count(std::string* data)
{
	std::map<char, double> map;
	int num = 'A' - 'a';
	for (std::string::iterator i = (*data).begin(); i != (*data).end(); i++)
	{
		if (*i >= 'A' and *i <= 'Z')
		{
			*i -= num;
		}
		if (map.find(*i) == map.end())
		{
			map.insert(std::pair<char, double>(*i, 0));
		}
		(*map.find(*i)).second += 1;
	}
>
>	for (std::map<char, double>::iterator i = map.begin(); i != map.end(); i++)
	{
		(*i).second /= data->size();
	}
	return map;
}
>
>std::ostream& operator<<(std::ostream& cout, std::vector<std::pair<char, double>> v)
{
	for (std::vector<std::pair<char, double>>::iterator i = v.begin(); i != v.end(); i++)
	{
		printf("字符：%c；概率：%.4lf\n", (*i).first, (*i).second);
	}
	return cout;
}
>
>std::ostream& operator<<(std::ostream& cout, std::vector<std::pair<char, std::string>> v)
{
	for (std::vector<std::pair<char, std::string>>::iterator i = v.begin(); i != v.end(); i++)
	{
		std::cout << "字符：" << (*i).first << "；编码：" << (*i).second << std::endl;
	}
	return cout;
}
>
>bool map_compare(const std::pair<char, double>& p1, const std::pair<char, double>& p2)
{
	return p1.second > p2.second;
}
>
>enum Status tree_init(std::priority_queue<Node> &q, std::vector<std::pair<char, double>> v)
{
	for (std::vector<std::pair<char, double>>::iterator i = v.begin(); i != v.end(); i++)
	{
		Node node((*i).first, (*i).second);
		q.push(node);
	}
	return SUCCESS;
}
>
>enum Status tree_create(std::priority_queue<Node>& q)
{
	while (q.size() != 1)
	{
		Node* left = new Node(q.top());
		q.pop();
		Node* right = new Node(q.top());
		q.pop();
		Node node('#', left->weight + right->weight, left, right);
		q.push(node);
	}
	return SUCCESS;
}
>
>void tree_code(Node* root, std::string& prefix, std::map<char, std::string>& huff_map)
{
	std::string temp_prefix = prefix;
>
>	if (root->left == NULL)
	{
		return;
	}
	// 处理左子树
	prefix += "0";
	if (root->left->left == NULL)
	{
		huff_map[root->left->c] = prefix;
	}
	else
	{
		tree_code(root->left, prefix, huff_map);
	}
>
>	// 回溯
	prefix = temp_prefix;
	// 处理右子树
	prefix += "1";
	if (root->right->right == NULL)
	{
		huff_map[root->right->c] = prefix;
	}
	else
	{
		tree_code(root->right, prefix, huff_map);
	}
}
>
>#endif // !__PTA__

3 测试数据string.txt

Effificient and robust facial landmark localisation is crucial for the deployment of
real-time face analysis systems. This paper presents a new loss function, namely
Rectifified Wing (RWing) loss, for regression-based facial landmark localisation with
Convolutional Neural Networks (CNNs). We fifirst systemically analyse different loss
functions, including L2, L1 and smooth L1. The analysis suggests that the training of
a network should pay more attention to small-medium errors. Motivated by this
finding, we design a piece-wise loss that amplififies the impact of the samples with
small-medium errors. Besides, we rectify the loss function for very small errors to
mitigate the impact of inaccuracy of manual annotation