经常看到、听到的字典树到底是什么？

基本概念

首先，字典树（Trie）是一种数据结构，从名字来看，它还是一种树形结构，那它具体是什么样子的呢？答案可以从它的别名——前缀树得到。
字典树与一般的树结构类似，也由根节点和子节点组成，根节点不对应任何字符，是字典树的起点；每个子节点代表一个英文字符，每个节点都有 26 个孩子节点指针，所以字典树也可以叫做 26 叉树（好家伙，是 2 叉树的 13 倍呢~😂）。那么，在这种情况下，从根节点到叶子节点的路径就可以看作是一个单词（word，广义上理解就是一个字符串），拥有相同祖先节点的叶子节点，我们就可以认为它们具有相同的前缀，所以它才会被叫做前缀树。同样，当我们要查询某个单词是否在字典树中时，按照单词的字符顺序去遍历字典树，这个过程与查字典非常类似，所以它也被叫做字典树。

特点

在对字典树有了一定的认识后，我们可以分析出字典树的一些特点：

字典树也是一种树，所以它也具备树这种数据结构的特点
字典树的根节点不包含字符，除根节点外每一个节点都只包含一个字符
字典树的根节点到某一节点的路径上的字符连接起来，就是该节点对应的字符串

实现

现在我们考虑实现这种数据结构，为了方便，直接用 C 语言来完成。

定义

先考虑字典树结构体的定义，每个节点都有 26 个孩子节点，并且需要保存当前的字符，同时为了便于判断某个字符串是否在字典树中，增加一个布尔变量来判断是否结束：

typedef struct Node {
    struct Node* son[26];
    char c;
    bool is_end;
} TNode;

typedef TNode* Ptr_TNode;
typedef TNode* Trie;

操作函数

有了结构体的定义后，就可以围绕结构体定义设计对应的操作函数了。

trie_init

第一件事情，就是初始化的操作。
初始状态下，只有一个根节点，按照前面的定义，根节点不保存任何数据，但它有 26 孩子：

Trie trie_init() {
    Trie root = (Trie)calloc(1, sizeof(TNode));
    root->is_end = false;
    return root;
}

实际上，我们可以将根节点的is_end属性设置为true，表示空串也保存在当前这颗树中。

trie_new_node

字典树的初始化完成后，就可以考虑字典树的插入操作了。既然是插入操作，那必然需要创建新的节点。新节点一定是子节点，那么就需要保存数据，且默认不是终止节点：

Ptr_TNode trie_new_node(char c) {
    Ptr_TNode node = (Ptr_TNode)calloc(1, sizeof(TNode));
    node->c = c;
    node->is_end = false;
    return node;
}

trie_insert

再回到插入操作，整个过程应该从根节点开始，从原字符串的开头开始逐个插入每个字符，当当前节点不存在以当前字符为值的孩子时，就创建新节点：

void trie_insert(Trie root, const char *str) {
    if(root == NULL || str == NULL || *str == '\0') {
        return;
    }

    Ptr_TNode cur = root;
    int len = strlen(str);
    for(int i = 0; i < len; ++i) {
        int idx = str[i] - 'a';
        if(cur->son[idx] == NULL) {
            Ptr_TNode nnode = trie_new_node(str[i]);
            cur->son[idx] = nnode;
        }
        cur = cur->son[idx];
    }
    cur->is_end = true;
}

trie_search

插入操作完成后，这颗字典树就基本可以认为是建好了，接下来就该考虑如何查找了。分析一下，查找的过程应该与插入是类似的，也需要逐个字符进行查找，直到原字符串的最后一个字符找到了，且当前节点的is_end属性为true，才能认为当前字符串在字典树中：

bool trie_search(Trie root, const char *str) {
    if(root == NULL || str == NULL) {
        return false;
    }

    Ptr_TNode cur = root;
    int len = strlen(str), i;
    for(i = 0; i < len; ++i) {
        int idx = str[i] - 'a';
        if(cur->son[idx] == NULL) {
            break;
        }
        cur = cur->son[idx];
    }
    return (i == len) && (cur->is_end);
}

这个地方，我们可以思考一下，当原字符串的最后一个字符找到了，但当前节点的is_end属性为false，意味着什么？
意味着原字符串是字典树中某个字符串的前缀。

trie_destroy

围绕字典树的最后一个操作就是销毁字典树，避免内存泄漏：

void trie_destroy(Trie root) {
    if(root == NULL) {
        return;
    }
    for(int i = 0; i < 26; ++i) {
        trie_destroy(root->son[i]);
    }
    free(root);
}

是不是觉得和树的后序遍历很像？

测试

实现完成后，再简单的测试一下正确性：

void unit_test() {
    Trie root = trie_init();

    // 待插入的单词
    const char *s1 = "hello";
    const char *s2 = "hey";
    const char *s3 = "a";        // 单个字符
    const char *s4 = "apple";

    // 插入操作
    trie_insert(root, s1);
    trie_insert(root, s2);
    trie_insert(root, s3);
    trie_insert(root, s4);
    trie_insert(root, s4);  // 重复插入（测试无异常）

    // ============== 测试用例 ==============
    printf("=== Test Result ===\n");
    // 1. 存在的完整单词
    trie_search(root, s1) ? printf("%s exist.\n", s1) : printf("%s not exist.\n", s1);
    trie_search(root, s2) ? printf("%s exist.\n", s2) : printf("%s not exist.\n", s2);
    trie_search(root, s3) ? printf("%s exist.\n", s3) : printf("%s not exist.\n", s3);
    trie_search(root, s4) ? printf("%s exist.\n", s4) : printf("%s not exist.\n", s4);

    // 2. 不存在的单词
    const char *t1 = "hed";
    const char *t2 = "hellow";
    trie_search(root, t1) ? printf("%s exist.\n", t1) : printf("%s not exist.\n", t1);
    trie_search(root, t2) ? printf("%s exist.\n", t2) : printf("%s not exist.\n", t2);

    // 3. 单词前缀（核心测试：必须返回不存在）
    const char *t3 = "he";
    const char *t4 = "app";
    trie_search(root, t3) ? printf("%s exist.\n", t3) : printf("%s not exist.\n", t3);
    trie_search(root, t4) ? printf("%s exist.\n", t4) : printf("%s not exist.\n", t4);

    // 4. 边界测试：空字符串
    const char *t5 = "";
    trie_search(root, t5) ? printf("empty string exist.\n") : printf("empty string not exist.\n");

    trie_destroy(root);
}

编译运行，符合预期。

总结

本文主要在讨论字典树这种数据结构，先从其特点入手，然后逐步实现这种数据结构，并完成测试。但我们的实现比较简单，还不具备生产能力；同时，对我们设计的字典树而言，隐含了原数据（原字符串）必须由 26 个英文字母构成的条件。

全部代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

/**
 * @brief The definition of trie node.
 */
typedef struct Node {
    struct Node* son[26];
    char c;
    bool is_end;
} TNode;

typedef TNode* Ptr_TNode;
typedef TNode* Trie;

/**
 * @brief The initialization of trie.
 * @return The root of trie.
 */
Trie trie_init() {
    Trie root = (Trie)calloc(1, sizeof(TNode));
    root->is_end = false;
    return root;
}

/**
 * @brief Create a new trie node.
 * @param c The node's value.
 * @return The pointer of new trie node.
 */
Ptr_TNode trie_new_node(char c) {
    Ptr_TNode node = (Ptr_TNode)calloc(1, sizeof(TNode));
    node->c = c;
    node->is_end = false;
    return node;
}

/**
 * @brief Insert the string to trie.
 * @param root The root of trie.
 * @param str The source string.
 */
void trie_insert(Trie root, const char *str) {
    if(root == NULL || str == NULL || *str == '\0') {
        return;
    }

    Ptr_TNode cur = root;
    int len = strlen(str);
    for(int i = 0; i < len; ++i) {
        int idx = str[i] - 'a';
        if(cur->son[idx] == NULL) {
            Ptr_TNode nnode = trie_new_node(str[i]);
            cur->son[idx] = nnode;
        }
        cur = cur->son[idx];
    }
    cur->is_end = true;
}

/**
 * @brief Search the string in trie.
 * @param root The root of trie.
 * @param str The source string.
 * @return True, the string in trie, otherwise false.
 */
bool trie_search(Trie root, const char *str) {
    if(root == NULL || str == NULL) {
        return false;
    }

    Ptr_TNode cur = root;
    int len = strlen(str), i;
    for(i = 0; i < len; ++i) {
        int idx = str[i] - 'a';
        if(cur->son[idx] == NULL) {
            break;
        }
        cur = cur->son[idx];
    }
    return (i == len) && (cur->is_end);
}

/**
 * @brief Destroy the trie.
 * @param root The root of trie.
 */
void trie_destroy(Trie root) {
    if(root == NULL) {
        return;
    }
    for(int i = 0; i < 26; ++i) {
        trie_destroy(root->son[i]);
    }
    free(root);
}

/**
 * @brief Unit test.
 */
void unit_test() {
    Trie root = trie_init();

    // 待插入的单词
    const char *s1 = "hello";
    const char *s2 = "hey";
    const char *s3 = "a";        // 单个字符
    const char *s4 = "apple";

    // 插入操作
    trie_insert(root, s1);
    trie_insert(root, s2);
    trie_insert(root, s3);
    trie_insert(root, s4);
    trie_insert(root, s4);  // 重复插入（测试无异常）

    // ============== 测试用例 ==============
    printf("=== Test Result ===\n");
    // 1. 存在的完整单词
    trie_search(root, s1) ? printf("%s exist.\n", s1) : printf("%s not exist.\n", s1);
    trie_search(root, s2) ? printf("%s exist.\n", s2) : printf("%s not exist.\n", s2);
    trie_search(root, s3) ? printf("%s exist.\n", s3) : printf("%s not exist.\n", s3);
    trie_search(root, s4) ? printf("%s exist.\n", s4) : printf("%s not exist.\n", s4);

    // 2. 不存在的单词
    const char *t1 = "hed";
    const char *t2 = "hellow";
    trie_search(root, t1) ? printf("%s exist.\n", t1) : printf("%s not exist.\n", t1);
    trie_search(root, t2) ? printf("%s exist.\n", t2) : printf("%s not exist.\n", t2);

    // 3. 单词前缀（核心测试：必须返回不存在）
    const char *t3 = "he";
    const char *t4 = "app";
    trie_search(root, t3) ? printf("%s exist.\n", t3) : printf("%s not exist.\n", t3);
    trie_search(root, t4) ? printf("%s exist.\n", t4) : printf("%s not exist.\n", t4);

    // 4. 边界测试：空字符串
    const char *t5 = "";
    trie_search(root, t5) ? printf("empty string exist.\n") : printf("empty string not exist.\n");

    trie_destroy(root);
}

int main() {
    
    unit_test();

    return 0;
}

bachzart's Blog

什么是字典树？