Home  >  Article  >  Database  >  How to use Redis word segmentation index method

How to use Redis word segmentation index method

王林
王林forward
2023-05-26 17:28:52981browse

Word Segmentation Index Method

After my practice, this method is the only one that I think is more feasible and consistent with the characteristics of redis based on the opinions given by the predecessors in the previous article. However, in the end, it is still not as efficient as memory.

For detailed implementation ideas, please see the Redis author's blog (Reference 1). The example here is still based on UserName, in English, and only does word segmentation with a length of 3 for phrases. Please expand by yourself for other scenarios.

First based on the letter search of AutoComplete, then we need to do a word segmentation for all Names, that is:

abc => (a, ab, abc)

When When we enter a, we will directly get the contents of the set a; when we enter ab, we will directly get the contents of the set ab. Then we start the conversion. First we need to segment the names of the User table:

var redis = ConnectionMultiplexer.Connect("localhost");var db = redis.GetDatabase();for (var i = 1; i < 4; i++)
{    var data = dbCon.Lookup<string, int>(string.Format(@"select words, id from (
                                    select Row_number() over (partition by words order by name) as rn,id,words from (
                                        select  id, SUBSTRING(name, 1, {0}) as words, name from User 
                                    ) as t
                                    ) t2 where rn <= {1} and words != &#39;&#39; and words is not null", i, 20));

    data.ForEach((key, item) =>  {
         db.SetAdd("capqueen:Cache:user:" + key.ToLower(), item.Select<int, RedisValue>(j => j).ToArray());
      });
}

Step 1: Use SQL to sort by group to filter out the first 20 pieces of data for each segment. The syntax of OrmLite is used here. .

Part 2: Save to RedisSet. Note that this is just an index and does not save the specific User content

When searching, we can implement the following:

public List<User> SearchWords(string keywords)
{            var redis = ConnectionMultiplexer.Connect("localhost");            var db = redis.GetDatabase();            var result = db.SetMembers("capqueen:Cache:user:" + keywords.ToLower());            var users = new List<User>();            if (result.Any())
            {                //转换成ids                var ids = result.ToList().Select<RedisValue, RedisKey>(i => i.ToString());                //按照keys获取value ,事先已经存好了Usersvar values = db.StringGet(ids.ToArray());                //构造List Json以加速解析var portsJson = new StringBuilder("[");

                values.ToList().ForEach(item =>{                    if (!string.IsNullOrWhiteSpace(item))
                    {
                        portsJson.Append(item).Append(",");
                    }
                });

                portsJson.Append("]");

                users = JsonConvert.DeserializeObject<List<User>>(portsJson.ToString());
            }
}

After actual testing, this writing method is indeed much better than the previous Keys, but the performance is still unsatisfactory.

Scan search method

This method was discovered by me after consulting the Redis documentation, but even if it is a test, it is estimated that it cannot be used for large-scale queries in the production environment.

According to different data structures, Scan is divided into SCAN, HSCAN, SSCAN and ZSCAN. See the documentation for more details. We use ZSCAN here:

ZSCAN key cursor [MATCH pattern] [COUNT count]

Here cursor is a cursor for the search iteration. I haven’t figured it out yet. Pattern is the matching rule count. It is the number of records

Since I am using StackExchange.Redis, the zscan method it provides is:

IEnumerable SortedSetScan(RedisKey key, RedisValue pattern = null, int pageSize = 10, long cursor = 0, int pageOffset = 0, CommandFlags flags = CommandFlags.None);

public void CreateTerminalCache(List<User> users)
{            if (users == null) return;            var db = ConnectionMultiplexer.GetDatabase();            var sourceData = new List<KeyValuePair<RedisKey, RedisValue>>();            //构造集合数据var list = users.Select(item =>{                var value = JsonConvert.SerializeObject(item);                //构造原始数据sourceData.Add(new KeyValuePair<RedisKey, RedisValue>("capqueen:users:" + item.Id, value));                //构造数据    return new SortedSetEntry(item.Name, item.Id);
            });            //添加进有序集合,采用name - id db.SortedSetAdd("capqueen:users:index", list.ToArray());            //添加港口数据key-value            db.StringSet(sourceData.ToArray(), When.Always, CommandFlags.None);
}

Then the search is as follows:

public List<User> GetUserByWord(string words)
{            var db = ConnectionMultiplexer.GetDatabase();            //搜索var result = db.SortedSetScan("capqueen:users:index", words + "*", 10, 1, 30, CommandFlags.None).Take(30).ToList();           var users = new List<User>();            if (result.Any())
            {                //转换成ids                var ids = result.ToList().Select<SortedSetEntry, RedisKey>(i => i.ToString());                //按照keys获取valuevar values = db.StringGet(ids.ToArray());                //构造List Json以加速解析var portsJson = new StringBuilder("[");

                values.ToList().ForEach(item =>{                    if (!string.IsNullOrWhiteSpace(item))
                    {
                        portsJson.Append(item).Append(",");
                    }
                });

                portsJson.Append("]");

                users = JsonConvert.DeserializeObject<List<User>>(portsJson.ToString());
            }            return users;
}

The above is the detailed content of How to use Redis word segmentation index method. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete