频度链表

相关知识

为了完成本关任务,你需要掌握:

  1. 建立单链表
  2. 单链表排序

建立单链表

  1. 频度链表数据结构
1
2
3
4
5
6
7
8
9
typedef struct ListNode{        //结点结构,哈夫曼树与频度链表共用
    char c;                     //结点的字符
    int frequency;              //字符的频度
    char *code;                 //字符的编码(对哈夫曼树结点有效)
    struct ListNode *parent;    //结点的双亲结点(对哈夫曼树结点有效)
    struct ListNode *left;      //结点的左子树(对哈夫曼树结点有效)
    struct ListNode *right;     //结点的右子树(对哈夫曼树结点有效)
    struct ListNode *next;      //结点的后继结点(对频度链表结点有效)
}ListNode,HuffmanTree;
  1. 若字符在频度链表中存在,则该字符的频度加1,否则创建新结点并将该结点插入到频度链表中

单链表排序

对所得到的频度链表进行排序,使得字符的频度按从高到低排列

编程要求

根据提示,在右侧编辑器补充代码,建立频度链表并排序,从高到低输出各个字符的频度。

测试说明

平台会对你编写的代码进行测试

测试输入:
hello world!

预期输出:
‘l’3
‘o’2
‘h’1
‘e’1
‘ ‘1
‘w’1
‘r’1
‘d’1
‘!’1

提示:
1.字符按频度从高到低排列,相同频度的字符按输入先后进行排列,先输入的字符排列在前,后输入的字符排列在后
2.输出示例为'字符' 频度 ,字符和频度之间有一个空格,输出完一个字符和频度后换行再输出下一个字符和频度
3.换行符字符输出示例'\n'10
4.最后一个字符和频度输出完后仍有一个换行符要输出

我的思路

由于这道题写的时间很久远了,当时没有把思路记录下来,这里就不再写了
虽然把各个功能写在一坨里面,但是注释写的还是很详细的QvQ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#include<stdio.h>
#include<stdlib.h>

typedef struct ListNode{ //结点结构,哈夫曼树与频度链表共用
char c; //结点的字符
int frequency; //字符的频度
char *code; //字符的编码(对哈夫曼树结点有效)
struct ListNode *parent; //结点的双亲结点(对哈夫曼树结点有效)
struct ListNode *left; //结点的左子树(对哈夫曼树结点有效)
struct ListNode *right; //结点的右子树(对哈夫曼树结点有效)
struct ListNode *next; //结点的后继结点(对频度链表结点有效)
}ListNode,HuffmanTree;

int main(){
char input[1000];//used to keep the input

gets(input);//Get the whole line.Using sancf would stop reading when meets ' '

//create the list,head node is not used to store data
ListNode* head = malloc(sizeof(ListNode));
head->frequency = 0;
head->next = NULL;

int i = 0;
while(input[i] != '\0'){
ListNode* p = head;
ListNode* pre = p;//pointer pointing the previous node of q,which can be helpful for some operations
while(p != NULL && p->c != input[i]){
//iterate the list until find the target or to the end
pre = p;
p = p->next;
}
if(p != NULL){
//enter this block means having found the target element in the existing list
p->frequency++;

//and after add 1 to the frequency,we need to change the position
//to make sure the frequency stored in the list is decreasing

//1. take this node out of this list
pre->next = p->next;
//2. find the right position for the node which frequency is changed
ListNode* cur = head->next;//assistant pointer
pre = head;//now we reuse the 'pre' to point at the previous node of cur
while(cur != NULL && cur->frequency > p->frequency){
pre = cur;
cur = cur->next;
printf("?");
}
//3. insert the node to the new position
//after the loop above, we can make sure the node should be insert after the 'pre'
p->next = cur;
pre->next = p;
}else {
//enter this block means we meet a new element,so add it to the list
//here pre is the last solid node of the list

//create a new node for the new element
ListNode* newNode = malloc(sizeof(ListNode));
newNode->c = input[i];
newNode->frequency = 1;

//add the new node to the tail of the list
//which is pratical,in that the frequency stored in the list is decreasing
newNode->next = pre->next;
pre->next = newNode;
}
i++;
}
ListNode *iterator = head->next;//use this pointer to iterate the list and print the answer
while(iterator != NULL){
printf("'%c' %d\n",iterator->c, iterator->frequency);
iterator = iterator->next;
}
return 0;
}

哈夫曼树

任务描述

本关任务:编写一个程序,利用上一关的频度链表,建立相应的哈夫曼树,并得到相应的哈夫曼编码。

相关知识

为了完成本关任务,你需要掌握:
1.建立二叉树
2.遍历二叉树

编程要求

采用原址建立的方法,利用上一关得到的频度链表,将频度链表中的结点作为哈夫曼树中的结点,建立哈夫曼树。算法循环执行的每一次循环中,从还没有指定双亲指针的结点中选择频度最小的元素和频度次小的两结点。创建其二者的双亲结点并设置二者的双亲指针指向该结点,同时使双亲结点的左右孩指针指向两结点。之后将生成的双亲结点插入到频度链表中。
循环执行至除根结点外的所有结点都具有双亲指针为止,哈夫曼树建立成功,先序遍历哈夫曼树,得到各个叶子结点的哈夫曼编码,并按上一关中的顺序输出相应的哈夫曼编码。
并在最后一行输出哈夫曼树的带权路径长度,权重为字符的频度。

测试说明

平台会对你编写的代码进行测试:
测试输入:

  1. Data structure experiment
  2. HuffmanTree
  3. metro line

Data structure experiment
HuffmanTree
metro line

预期输出:

  1. ‘e’ 8 110
  2. ‘t’ 5 001
  3. ‘r’ 5 010
  4. ‘a’ 3 0110
  5. ‘ ‘ 3 0111
  6. ‘u’ 3 10020
  7. ‘m’ 3 1001
  8. ‘n’ 3 1010
  9. ‘i’ 2 10111
  10. ‘\n’ 2 11100
  11. ‘f’ 2 11101
  12. ‘D’ 1 111100
  13. ‘s’ 1 111101
  14. ‘c’ 1 111110
  15. ‘x’ 1 111111
  16. ‘p’ 1 00000
  17. ‘H’ 1 00001
  18. ‘T’ 1 00010
  19. ‘o’1 00011
  20. ‘l’ 1 10110
  21. 193

提示:
1.由于同一文本建立的哈夫曼树不同,则得到的哈夫曼编码不唯一。上述样例给出的哈夫曼编码只是基中一种答案
2.带权路径长度输出后不用再输出换行符,

我的思路

构建Huffman树的算法设计

  1. 每次取出链表末尾的两个结点(原因在2中),构建新子树。将该子树的根节点的频度设置为两孩子的频度之和,并将其插入频度链表
  2. 频度链表在插入新结点时,需要和构建频度链表时插入新结点一致,根据频度找到合适的位置再插入;(但是与频度链表构建不完全相同的是,此时插入只需对结点的频度进行比较即可,因为这个时候每个结点所存储的字符一定是不一致的,且对于非叶子结点,其存储的字符是无意义的。)
  3. 当链表中不满两个结点时(也就是说只剩下根节点),循环结束(判断方式即首结点的next为空)。

但是每次需要找到倒数第二个结点,每次插入,都比较耗时,这里不再多考虑了QAQ……

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
//find the last two nodes and take them out of the list
ListNode* FetchSecondtoLast(ListNode* head){
ListNode* fast = head;
ListNode* slow = head;
ListNode* pre = slow;
fast = fast->next->next;
while(fast != NULL){
pre = slow;
slow = slow->next;
fast = fast->next;
}
pre->next = NULL;
return slow;
}

//insert new node(root of the newly built subtree) to the frequency list
void InsertNode(ListNode* head, ListNode* p){
ListNode* cur = head;
while(cur->next != NULL && cur->next->frequency > p->frequency){
cur = cur->next;
}
p->next = cur->next;
cur->next = p;
}

//build the Huffman tree and return its root node
ListNode* CreateHuffmanTree(ListNode* head){
while(head->next != NULL && head->next->next != NULL){
ListNode* c1 = FetchSecondtoLast(head);
ListNode* c2 = c1->next;
ListNode* p = malloc(sizeof(ListNode));
p->left = c2;
p->right = c1;
p->frequency = c1->frequency + c2->frequency;
p->code = NULL; // 初始化code字段
p->next = NULL;
p->parent = NULL;
p->c = '#';
InsertNode(head, p);
c1->parent = p;
c2->parent = p;
}
return head->next;;
}

遍历构建Huffman编码

  • 利用回溯的思想,在每层递归前copy当前的编码,作为参数向下递归,知道找到叶子结点,再把编码赋给结点即可
  • 规定左为0,右为1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//generate code for each leaf node
void GenerateCodes(ListNode* node, char* code) {
if (node == NULL) {
return;
}
if (node->left == NULL && node->right == NULL) {
// if this is leaf node,then save the code
node->code = strdup(code);
} else {
// recursively traverse the left substree
char leftCode[100];
strcpy(leftCode, code);
strcat(leftCode, "0");
GenerateCodes(node->left, leftCode);

// recursively traverse the right substree
char rightCode[100];
strcpy(rightCode, code);
strcat(rightCode, "1");
GenerateCodes(node->right, rightCode);
}
}

void CreateHuffmanCode(ListNode* root){
char code[100] = "";
GenerateCodes(root, code);
}

计算WPL

  • 一颗二叉树中所有叶子结点的带权路径长度之和称为该树的带权路径长度,记为$WPL = \sum^{n_0 - 1}_{i = 0}(W_i * l_i)$ 其中n_0表示叶子结点个数,$w_i$和$l_i$分别表示叶子结点k_i的权值和根到k_i之间的路径长度
  • 将当前所处深度作为参数在递归中传递,直到找到叶子结点,用深度(等于路径长度)乘结点的频度即可,并且不断相加,即可得到WPL
1
2
3
4
5
6
7
8
9
int CalculateWPL(ListNode* node, int depth){
if(node == NULL){
return 0;
}
if (node->left == NULL && node->right == NULL) {
return node->frequency * depth;
}
return CalculateWPL(node->left, depth + 1) + CalculateWPL(node->right, depth + 1);
}

答案输出

  • 由于是按照频度递减输出的答案,与最初构建的链表的顺序是一致的,而我构建Huffman树的过程会破坏原有链表顺序,所以我选择在构建Huffman树之前先用一个数组记录原有的链表顺序
  • 遇到’\n’时需要判断,并使用转义符避免输出的是真的换行而非’\n’本身
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//save origin order of the list for the final display and calculate the length for other operations
ListNode** SaveOriginOrder(ListNode* head, int* a){
ListNode* iterator = head->next;
int length = 0;
while (iterator != NULL)
{
length++;
iterator = iterator->next;
}
ListNode** nodes = malloc(length * sizeof(ListNode*));
iterator = head->next;
for(int i = 0; i < length; i++){
nodes[i] = iterator;
iterator = iterator->next;
}
*a = length;
return nodes;
}


void DisplayAns(ListNode** nodes, int length){
for(int i = 0; i < length; i++){
if(nodes[i]->c == '\n'){
printf("'\\n' %d %s\n", nodes[i]->frequency, nodes[i]->code);
continue;
}
printf("'%c' %d %s\n", nodes[i]->c, nodes[i]->frequency, nodes[i]->code);
}
}

接收数据

这是我最头疼的部分了QAQ
实在是不熟悉c的接受已经字符串的操作,而且题目并没有说清楚如何判断输入结束,我也找不到一个合适的方法能够接收不定数量行数的输入
所以最终我只能选择把多行输入拼接到同一个字符串中,并且根据样例的输出结果(三行输入只有两个‘\n’),判断删除最后一个换行符,将其替换为’\0’,最为字符串末尾
同时为了判断输入是否结束,我只能选择在末尾多输入一行空行,如果遇到空行就结束,所以不一定能满足最终的题目需要……

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
char buffer[1000];// the max size of each line is 1000
size_t size = 1; // initial size is set to 1,so that we can keep the '\0'
char *input = malloc(size);
input[0] = '\0';
while (fgets(buffer, sizeof(buffer), stdin)) {
if (buffer[0] == '\n') {
if(size > 1){
input[size - 2] = '\0';
size--;
}
break;
}
size_t buffer_length = strlen(buffer);
size_t new_size = size + buffer_length;
char *new_result = realloc(input, new_size);
input = new_result;
strcat(input, buffer);
size = new_size;
}

最终整体代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
#include<stdio.h>
#include<string.h>
#include<stdlib.h>

typedef struct ListNode{ //结点结构,哈夫曼树与频度链表共用
char c; //结点的字符
int frequency; //字符的频度
char *code; //字符的编码(对哈夫曼树结点有效)
struct ListNode *parent; //结点的双亲结点(对哈夫曼树结点有效)
struct ListNode *left; //结点的左子树(对哈夫曼树结点有效)
struct ListNode *right; //结点的右子树(对哈夫曼树结点有效)
struct ListNode *next; //结点的后继结点(对频度链表结点有效)
}ListNode,HuffmanTree;

ListNode* CreateFrequencyList(){
char buffer[1000];// the max size of each line is 1000
size_t size = 1; // initial size is set to 1,so that we can keep the '\0'
char *input = malloc(size);
input[0] = '\0';
while (fgets(buffer, sizeof(buffer), stdin)) {
if (buffer[0] == '\n') {
if(size > 1){
input[size - 2] = '\0';
size--;
}
break;
}
size_t buffer_length = strlen(buffer);
size_t new_size = size + buffer_length;
char *new_result = realloc(input, new_size);
input = new_result;
strcat(input, buffer);
size = new_size;
}

//create the list,head node is not used to store data
ListNode* head = malloc(sizeof(ListNode));
head->frequency = 0;
head->next = NULL;
head->parent = NULL;
head->left = NULL;
head->right = NULL;

int i = 0;
while(input[i] != '\0'){
ListNode* p = head;
ListNode* pre = p;//pointer pointing the previous node of q,which can be helpful for some operations
while(p != NULL && p->c != input[i]){
//iterate the list until find the target or to the end
pre = p;
p = p->next;
}
if(p != NULL){
//enter this block means having found the target element in the existing list
p->frequency++;

//and after add 1 to the frequency,we need to change the position
//to make sure the frequency stored in the list is decreasing

//1. take this node out of this list
pre->next = p->next;
//2. find the right position for the node which frequency is changed
ListNode* cur = head->next;//assistant pointer
pre = head;//now we reuse the 'pre' to point at the previous node of cur
while(cur != NULL && cur->frequency > p->frequency){
pre = cur;
cur = cur->next;
}
//3. insert the node to the new position
//after the loop above, we can make sure the node should be insert after the 'pre'
p->next = cur;
pre->next = p;
}else {
//enter this block means we meet a new element,so add it to the list
//here pre is the last solid node of the list

//create a new node for the new element
ListNode* newNode = malloc(sizeof(ListNode));
newNode->parent = NULL;
newNode->left = NULL;
newNode->right = NULL;
newNode->c = input[i];
newNode->frequency = 1;

//add the new node to the tail of the list
//which is pratical,in that the frequency stored in the list is decreasing
newNode->next = pre->next;
pre->next = newNode;
}
i++;
}
return head;
}

void DispFList(ListNode* head){
ListNode *iterator = head->next;//use this pointer to iterate the list and print the answer
while(iterator != NULL){
printf("'%c' %d\n",iterator->c, iterator->frequency);
iterator = iterator->next;
}
}

//save origin order of the list for the final display and calculate the length for other operations
ListNode** SaveOriginOrder(ListNode* head, int* a){
ListNode* iterator = head->next;
int length = 0;
while (iterator != NULL)
{
length++;
iterator = iterator->next;
}
ListNode** nodes = malloc(length * sizeof(ListNode*));
iterator = head->next;
for(int i = 0; i < length; i++){
nodes[i] = iterator;
iterator = iterator->next;
}
*a = length;
return nodes;
}

//find the last two nodes and take them out of the list
ListNode* FetchSecondtoLast(ListNode* head){
ListNode* fast = head;
ListNode* slow = head;
ListNode* pre = slow;
fast = fast->next->next;
while(fast != NULL){
pre = slow;
slow = slow->next;
fast = fast->next;
}
pre->next = NULL;
return slow;
}

//insert new node(root of the newly built subtree) to the frequency list
void InsertNode(ListNode* head, ListNode* p){
ListNode* cur = head;
while(cur->next != NULL && cur->next->frequency > p->frequency){
cur = cur->next;
}
p->next = cur->next;
cur->next = p;
}

//build the Huffman tree and return its root node
ListNode* CreateHuffmanTree(ListNode* head){
while(head->next != NULL && head->next->next != NULL){
ListNode* c1 = FetchSecondtoLast(head);
ListNode* c2 = c1->next;
ListNode* p = malloc(sizeof(ListNode));
p->left = c2;
p->right = c1;
p->frequency = c1->frequency + c2->frequency;
p->code = NULL; // 初始化code字段
p->next = NULL;
p->parent = NULL;
p->c = '#';
InsertNode(head, p);
c1->parent = p;
c2->parent = p;
}
return head->next;;
}

//generate code for each leaf node
void GenerateCodes(ListNode* node, char* code) {
if (node == NULL) {
return;
}
if (node->left == NULL && node->right == NULL) {
// if this is leaf node,then save the code
node->code = strdup(code);
} else {
// recursively traverse the left substree
char leftCode[100];
strcpy(leftCode, code);
strcat(leftCode, "0");
GenerateCodes(node->left, leftCode);

// recursively traverse the right substree
char rightCode[100];
strcpy(rightCode, code);
strcat(rightCode, "1");
GenerateCodes(node->right, rightCode);
}
}

void CreateHuffmanCode(ListNode* root){
char code[100] = "";
GenerateCodes(root, code);
}

int CalculateWPL(ListNode* node, int depth){
if(node == NULL){
return 0;
}
if (node->left == NULL && node->right == NULL) {
return node->frequency * depth;
}
return CalculateWPL(node->left, depth + 1) + CalculateWPL(node->right, depth + 1);
}

void DisplayAns(ListNode** nodes, int length){
for(int i = 0; i < length; i++){
if(nodes[i]->c == '\n'){
printf("'\\n' %d %s\n", nodes[i]->frequency, nodes[i]->code);
continue;
}
printf("'%c' %d %s\n", nodes[i]->c, nodes[i]->frequency, nodes[i]->code);
}
}

int main(){
ListNode* head = CreateFrequencyList();
int length;
ListNode** nodes = SaveOriginOrder(head, &length);
ListNode* root = CreateHuffmanTree(head);
CreateHuffmanCode(root);
DisplayAns(nodes, length);
int wpl = CalculateWPL(root, 0);
printf("%d", wpl);
return 0;
}