search

Home  >  Q&A  >  body text

c++ - 关于用C从txt文件查找单词的搜索算法优化问题

用C或C++实现从一个比较大的txt文件里查找一个单词,txt文件里每行一个单词,按a~z从上到下排列,有什么好的算法,用什么数据结构可以提高查询的速度和效率??

阿神阿神2885 days ago603

reply all(4)I'll reply

  • 阿神

    阿神2017-04-17 11:08:20

    In this case, consider the following data structure: Dictionary tree (DictTree)

    typedef struct _dict_tree_
    {
        struct _dict_tree_ * dt[TREENODENUM];
        char    c ;
        char    flag ;
    }DT ;
    

    The specific operation is

    The so-called 26-fork tree is stored according to the child nodes corresponding to each letter. Then read the words line by line and insert them into the tree, for example: Word: abandon The order of insertion into the tree is a->b->a->n->d->o->n Insert the child node corresponding to this tree for each letter

    int len = strlen(str);
    while( i < len )
    {
        index = str[i] - 97 ;        /*通过 index 来找到子节点*/
        if( pt->dt[index] == NULL )
        {
            pt->dt[index] = ( DT *)malloc( sizeof( DT) );
            pt->dt[index]->c = str[i] ;
            pt->dt[index]->flag = EMPTY ;
            for( j = 0 ; j < TREENODENUM ; j++ )
            {
                pt->dt[index]->dt[j] = NULL ;
            }
        }
        pt = pt->dt[index] ;
        i++;
    }
    

    The search is to follow the letters to determine whether the child node is empty

    int len = strlen(str);
    while( i != len )
    {
        index = str[i] - 97 ;
        if( pt->dt[index] == NULL )
            return 0 ;
        pt = pt->dt[index] ;
        i++;
    }
    

    reply
    0
  • 怪我咯

    怪我咯2017-04-17 11:08:20

    If preprocessing of txt files is allowed, it can be achieved using inverted index.

    If the txt file is only checked once and never used again, then no.

    reply
    0
  • 高洛峰

    高洛峰2017-04-17 11:08:20

    Based on your question, it seems that these words have been sorted? If it is in order, it can be done according to the dichotomy method. First read the total number of lines in the file, then search by the first character, then by the second character, and so on. The advantage of this algorithm is that it does not need to read all the files into memory.

    reply
    0
  • 大家讲道理

    大家讲道理2017-04-17 11:08:20

    If you modify the question, this txt file is 1T in size. How can I fix it better? If you don't have enough reputation, you can't change it.

    reply
    0
  • Cancelreply