Home  >  Article  >  Backend Development  >  Convert dot-separated values ​​to Go struct using Python

Convert dot-separated values ​​to Go struct using Python

王林
王林forward
2024-02-10 13:33:08958browse

使用 Python 将点分隔值转换为 Go 结构

php editor Youzi will introduce in this article how to use Python to convert dot-separated values ​​(such as "key1.subkey1.subkey2") into a structure in the Go language. This transformation process is useful for extracting and processing data from configuration files or API responses. We will use Python's recursive functions and Go language structures to implement this conversion, and give detailed code examples and explanations. After studying this article, readers will be able to easily process and convert point-separated values, improving the efficiency and flexibility of data processing.

Question content

This is a specific requirement for an application that can change the configuration (specifically the wso2 identity server, since I'm using go to write the kubernetes operator for it). But it's really not relevant here. I want to create a solution that allows easy management of large number of configuration maps to generate go structures. These configurations are mapped in .csv

Link to .csv - my_configs.csv

I want, Write a python script that automatically generates go structures so that any changes to the application configuration can be updated by simply executing the python script to create the corresponding go structure. I'm referring to the configuration of the application itself. For example, toml key names in csv can be changed/new values ​​can be added.

So far I have successfully created a python script that almost achieves my goal. The script is,

import pandas as pd

def convert_to_dict(data):
    result = {}
    for row in data:
        current_dict = result
        for item in row[:-1]:
            if item is not none:
                if item not in current_dict:
                    current_dict[item] = {}
                current_dict = current_dict[item]
    return result

def extract_json_key(yaml_key):
    if isinstance(yaml_key, str) and '.' in yaml_key:
        return yaml_key.split('.')[-1]
    else:
        return yaml_key

def add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key):
    struct_string += str(go_var) + " " + str(go_type) + ' `json:"' + str(json_key) + ',omitempty" toml:"' +str(toml_key) + '"` ' + "\n"
    return struct_string

def generate_go_struct(struct_name, struct_data):
    struct_name="configurations" if struct_name == "" else struct_name
    struct_string = "type " + struct_name + " struct {\n"
    yaml_key=df['yaml_key'].str.split('.').str[-1]
    
    # base case: generate fields for the current struct level    
    for key, value in struct_data.items():
        selected_rows = df[yaml_key == key]

        if len(selected_rows) > 1:
            go_var = selected_rows['go_var'].values[1]
            toml_key = selected_rows['toml_key'].values[1]
            go_type=selected_rows['go_type'].values[1]
            json_key=selected_rows['json_key'].values[1]
        else:
            go_var = selected_rows['go_var'].values[0]
            toml_key = selected_rows['toml_key'].values[0]
            go_type=selected_rows['go_type'].values[0]
            json_key=selected_rows['json_key'].values[0]

        # add fields to the body of the struct
        struct_string=add_fields_to_struct(struct_string,go_var,go_type,json_key,toml_key)   

    struct_string += "}\n\n"
    
    # recursive case: generate struct definitions for nested structs
    for key, value in struct_data.items():
        selected_rows = df[yaml_key == key]

        if len(selected_rows) > 1:
            go_var = selected_rows['go_var'].values[1]
        else:
            go_var = selected_rows['go_var'].values[0]

        if isinstance(value, dict) and any(isinstance(v, dict) for v in value.values()):
            nested_struct_name = go_var
            nested_struct_data = value
            struct_string += generate_go_struct(nested_struct_name, nested_struct_data)
    
    return struct_string

# read excel
csv_file = "~/downloads/my_configs.csv"
df = pd.read_csv(csv_file)

# remove rows where all columns are nan
df = df.dropna(how='all')
# create the 'json_key' column using the custom function
df['json_key'] = df['yaml_key'].apply(extract_json_key)

data=df['yaml_key'].values.tolist() # read the 'yaml_key' column
data = pd.dataframe({'column':data}) # convert to dataframe

data=data['column'].str.split('.', expand=true) # split by '.'

nested_list = data.values.tolist() # convert to nested list
data=nested_list 

result_json = convert_to_dict(data) # convert to dict (json)

# the generated co code
go_struct = generate_go_struct("", result_json)

# write to file
file_path = "output.go"
with open(file_path, "w") as file:
    file.write(go_struct)

The problem is (see below part of csv),

authentication.authenticator.basic
authentication.authenticator.basic.parameters
authentication.authenticator.basic.parameters.showAuthFailureReason
authentication.authenticator.basic.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp
authentication.authenticator.totp.parameters
authentication.authenticator.totp.parameters.showAuthFailureReason
authentication.authenticator.totp.parameters.showAuthFailureReasonOnLoginPage
authentication.authenticator.totp.parameters.encodingMethod
authentication.authenticator.totp.parameters.timeStepSize

Here, since the basic and totp fields parameters are duplicated, the script confuses itself and generates two totpparameters structures. The expected result is to have basicparameters and totpparameters structures. There are many similar duplicate words in the yaml_key column of the csv.

I know this has to do with the index being hardcoded to 1 in go_var = selected_rows['go_var'].values[1], but it's hard to fix this.

Can someone give me an answer? I think,

  1. Problems with recursive functions
  2. There is a problem with the code that generates json may be the root cause of this issue.

Thanks!

I also tried using chatgpt, but since this has to do with nesting and recursion, the answer provided by chatgpt is not very efficient.

renew

I found a problem with the row containing the properties, pooloptions, endpoint and parameters fields. This is because they are duplicated in the yaml_key column.

Solution

I was able to resolve this issue. However, I had to use a completely new approach to the problem, which was to use a tree data structure and then iterate over it. This is the main logic behind it - https://www.geeksforgeeks.org/level-sequential tree traversal/

This is the working python code.

import pandas as pd
from collections import deque

structs=[]
class TreeNode:
    def __init__(self, name):
        self.name = name
        self.children = []
        self.path=""

    def add_child(self, child):
        self.children.append(child)

def create_tree(data):
    root = TreeNode('')
    for item in data:
        node = root
        for name in item.split('.'):
            existing_child = next((child for child in node.children if child.name == name), None)
            if existing_child:
                node = existing_child
            else:
                new_child = TreeNode(name)
                node.add_child(new_child)
                node = new_child
    return root

def generate_go_struct(struct_data):
    struct_name = struct_data['struct_name']
    fields = struct_data['fields']
    
    go_struct = f"type {struct_name} struct {{\n"

    for field in fields:
        field_name = field['name']
        field_type = field['type']
        field_default_val = str(field['default_val'])
        json_key=field['json_key']
        toml_key=field['toml_key']

        tail_part=f"\t{field_name} {field_type} `json:\"{json_key},omitempty\" toml:\"{toml_key}\"`\n\n"

        if pd.isna(field['default_val']):
            go_struct += tail_part
        else:
            field_default_val = "\t// +kubebuilder:default:=" + field_default_val
            go_struct += field_default_val + "\n" + tail_part

    go_struct += "}\n\n"
    return go_struct

def write_go_file(go_structs, file_path):
    with open(file_path, 'w') as file:
        for go_struct in go_structs:
            file.write(go_struct)

def create_new_struct(struct_name):
    struct_name = "Configurations" if struct_name == "" else struct_name
    struct_dict = {
        "struct_name": struct_name,
        "fields": []
    }
    
    return struct_dict

def add_field(struct_dict, field_name, field_type,default_val,json_key, toml_key):
    field_dict = {
        "name": field_name,
        "type": field_type,
        "default_val": default_val,
        "json_key":json_key,
        "toml_key":toml_key
    }
    struct_dict["fields"].append(field_dict)
    
    return struct_dict

def traverse_tree(root):
    queue = deque([root])  
    while queue:
        node = queue.popleft()
        filtered_df = df[df['yaml_key'] == node.path]
        go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
        go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None

        if node.path=="":
            go_type="Configurations"

        # The structs themselves
        current_struct = create_new_struct(go_type)
        
        for child in node.children:  
            if (node.name!=""):
                child.path=node.path+"."+child.name   
            else:
                child.path=child.name

            filtered_df = df[df['yaml_key'] == child.path]
            go_var = filtered_df['go_var'].values[0] if not filtered_df.empty else None
            go_type = filtered_df['go_type'].values[0] if not filtered_df.empty else None
            default_val = filtered_df['default_val'].values[0] if not filtered_df.empty else None

            # Struct fields
            json_key = filtered_df['yaml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            toml_key = filtered_df['toml_key'].values[0].split('.')[-1] if not filtered_df.empty else None
            
            current_struct = add_field(current_struct, go_var, go_type,default_val,json_key, toml_key)

            if (child.children):
                # Add each child to the queue for processing
                queue.append(child)

        go_struct = generate_go_struct(current_struct)
        # print(go_struct,"\n")        
        structs.append(go_struct)

    write_go_file(structs, "output.go")

csv_file = "~/Downloads/my_configs.csv"
df = pd.read_csv(csv_file) 

sample_data=df['yaml_key'].values.tolist()

# Create the tree
tree = create_tree(sample_data)

# Traverse the tree
traverse_tree(tree)

Thank you for your help!

The above is the detailed content of Convert dot-separated values ​​to Go struct using Python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:stackoverflow.com. If there is any infringement, please contact admin@php.cn delete