search

Home  >  Q&A  >  body text

python - How to sort a sequence of data using a certain data in a tuple or a certain set of keys in a dictionary?

Suppose that such data is obtained through post-analysis of massive raw data:

[(id,node,val)(id,node,val)...]
is a tuple of user id, server, and value in order, and then separate according to the server, and then rely on the val size Sort and then write to excel.
Or generate [{"id":xxx,"node":xxx,"val":xxx},{"id":xxx,"node":xxx,"val":xxx}...]
If there is only one set of kv, it can be sorted by sorted, but the name of the node is unknown now, and these server names may change every day. After I obtain such data, how do I separate and sort the data according to the server name?
The main problem here is that the name of the node itself is not fixed. For example, you first create n lists and put the data of the same node into them, but you don’t know how many lists to create. And when writing the processed data to excel later, a loop will inevitably be used.
This is a loop within a loop, and the name of the new data group is not determined either after the data is classified or after it is arranged. Even using the exec command cannot meet the needs

黄舟黄舟2782 days ago997

reply all(2)I'll reply

  • 过去多啦不再A梦

    过去多啦不再A梦2017-06-12 09:24:19

    from collections import defaultdict
    
    d = defaultdict(list)
    data = [(id,node,val),(id,node,val)...]
    
    # 按node进行分组
    for x in data:
        d[x[1]].append(x)
        
    # 将分组数据依次写入excel
    for _, v in d.iteritems():
        # 排序
        tmp = sorted(v, key=lambda x: x["val"], reverse=True/False)
        # 写入excel
        write_to_excel(tmp)
    

    In addition, you can actually write all the data into a csv file by id, node, val
    Writing a shell script through Linux's awk, uniq, sort and other command tools is also very fast

    Also, it is not clear how big your massive data is and what order of magnitude it is. If the amount of data is really large, it is possible that the memory of the above python code is not enough. You need to estimate this by yourself

    reply
    0
  • 我想大声告诉你

    我想大声告诉你2017-06-12 09:24:19

    If I understand your needs correctly, you can use a dictionary. The key of the dictionary is the name of the node, and the value of the dictionary is a list composed of items:

    data = [{"id":xxx,"node":xxx,"val":xxx},{"id":xxx,"node":xxx,"val":xxx}...]
    
    result = {}
    for data_item in data:
        node_name = data_item["node"]
        if node_name in result.keys():
            result[node_name].append(data_item)
        else:
            result[node_name] = [data_item]

    Then take out the value of each item in the dictionary (that is, the data list) according to the key (server name), and sort it by adding lambda to sort it according to a certain value in each item.

    reply
    0
  • Cancelreply