Home  >  Article  >  Technology peripherals  >  Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

WBOY
WBOYforward
2023-09-10 15:05:081145browse

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

1. Overview of the graph

First introduce some basic concepts of the knowledge graph.

1. What is a knowledge graph?

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

The knowledge graph aims to use graph structures to model, identify and infer the relationships between things. Complex relationships and accumulated domain knowledge are important cornerstones for realizing cognitive intelligence and have been widely used in many fields such as search engines, intelligent question answering, language semantic understanding, and big data decision analysis.

Knowledge graph models both the semantic relationship and the structural relationship between data. Combined with deep learning technology, the two relationships can be better integrated and represented.

2. Why should we build a knowledge graph?

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

We want to build a knowledge graph mainly from the following two points: on the one hand It is the background characteristics of the data source of the ants themselves, and on the other hand, it is the benefits that the knowledge graph can bring.

[1] The data sources themselves are diverse and heterogeneous, lacking a unified knowledge understanding system.

[2] Knowledge graph can bring many benefits, including:

  • Semantic standardization: using graph construction Technology improves the level of standardization and normalization of entities, relationships, concepts, etc.
  • Domain knowledge accumulation: realize knowledge representation and interconnection based on semantics and graph structure, thereby accumulating rich domain knowledge.
  • Knowledge reuse: Build a high-quality Ant knowledge graph and provide multiple downstream services through integration, linking and other services to reduce business costs and improve efficiency.
  • Knowledge reasoning discovery: Discover more long-tail knowledge based on graph reasoning technology, serving scenarios such as risk control, credit, claims, merchant operations, marketing recommendations, etc.

3. Overview of how to build knowledge graphs

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

In the process of building various business knowledge graphs , we have precipitated a set of general construction paradigms for ant knowledge graphs, which are mainly divided into the following five parts:

  • Starting from business data, as an important part of the cold start of the graph Data Sources.
  • The knowledge graph of other domains is integrated with the existing graph, which is achieved through entity alignment technology.
  • The integration of the structured knowledge base in the business domain and the existing knowledge graph is also achieved through entity alignment technology.
  • Unstructured and semi-structured data, such as text, will be used to extract information and update existing maps through entity linking technology.
  • The integration of domain concept systems and expert rules links relevant concepts and rules with existing knowledge graphs.

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

After we have a common construction paradigm, we need to carry out systematic construction. Look at the systematic construction of the Ant Knowledge Graph from two perspectives. First, from an algorithmic perspective, there are various algorithmic capabilities, such as knowledge reasoning, knowledge matching, etc. From the perspective of implementation, from bottom to top, the lowest basic dependencies include graph computing engine and cognitive base computing; above it is the graph base, including NLP & multi-modal platform and graph platform; above it are various graph construction technologies, Based on this, we can build the ant knowledge graph; on the basis of the knowledge graph, we can do some graph reasoning; further up, we provide some general algorithm capabilities; at the top are business applications.

2. Graph Construction

Next, we will share some of Ant Group’s core capabilities in building knowledge graphs, including graph construction, graph fusion, and graph cognition.

1. Graph construction

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

The graph construction process mainly includes six steps:

  • Data source to obtain multivariate data.
  • Knowledge modeling converts massive data into structured data and models it from the three domains of concepts, entities and events.
  • Knowledge acquisition and building a knowledge processing R&D platform.
  • Knowledge storage, including Ha3 storage and graph storage, etc.
  • Knowledge operation, including knowledge editing, online query, extraction, etc.
  • Continuous learning allows the model to automatically and iteratively learn.

Three experiences and skills in the construction process

Entity classification integrating expert knowledge

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

In building a knowledge graph, it is necessary to classify the input entities. In the ant scenario, it is a large-scale multi-label classification task. In order to integrate expert knowledge for entity classification, the following three optimizations are mainly made:

  • Semantic information enhancement: Introduce Embedding of label semantic graph representation learning.
  • Contrastive learning: Add hierarchical label supervision for comparison.
  • Logical rule constraints: Incorporate expert prior knowledge.

Entity recognition injected into domain vocabulary

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

#On the basis of entity recognition, from word to edge Starting from the graph structure, the model learns reasonable weighting of the connected edges and reduces the weight of the noisy word connected edges. Two modules of boundary contrast learning and semantic contrast learning are proposed:

  • Boundary contrast learning is used to solve boundary conflict problems. After the vocabulary is injected, a fully connected graph is constructed, and GAT is used to learn the representation of each token. The correct part of the boundary classification constructs a positive example graph, and the incorrect part constructs a negative example graph. Through comparison, the model learns each token. Boundary information of a token.
  • #Semantic contrastive learning is used to solve semantic conflict problems. Drawing on the idea of ​​prototype learning, the semantic representation of the label is added to strengthen the association between each token and the semantics of the label.

Small sample relationship extraction constrained by logical rules

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

Our annotation samples are very large in domain issues If it is less, you will face a few-shot or zero-shot scenario. In this case, the core idea of ​​​​relation extraction is to introduce an external knowledge base. In order to solve the problem of performance degradation caused by different semantic spaces, a reasoning module based on logical rules is designed. ;In order to solve the rote learning problem caused by entity type matching, a subtle difference perception module is designed.

2. Graph fusion

Graph fusion refers to the fusion of information between graphs in different business fields.

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

Benefits of graph fusion:

  • Cross-business knowledge reuse: Based on the graph ontology model, Realize cross-business knowledge connection.
  • # Reduce invalid data copies: apply immediately after connection, standardized knowledge service links.
  • Rapid business value implementation: reduce the cost of finding data for the business, bring greater business value through knowledge reuse, reduce costs and improve efficiency.

Entity alignment in graph fusion

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

A core technical point in the knowledge graph fusion process is entity alignment. Here we use the SOTA algorithm BERT-INT, which mainly includes two modules, one is the presentation module and the other is the interaction module.

The implementation process of the algorithm mainly includes recall and sorting:

Recall: In the presentation module, the title text is used BERT vector similarity recall.

Sorting model based on title attribute neighbors: ü Use the representation module to complete the vector representation of titles, attributes and neighbors:

  • Calculate the cos similarity of the title.
  • Calculate the similarity matrix between the attributes and neighbor sets of two entities respectively, and extract one-dimensional similarity features.
  • # Splice three features into a feature vector to calculate Loss.

3. Graph cognition

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

This part mainly introduces the internal knowledge representation learning of ants frame.

Ant proposed a knowledge representation learning based on the Encoder-Decoder framework. Among them, Encoder is some graph neural learning methods, and Decoder is some knowledge representation learning, such as link prediction. This representation learning framework can self-supervise the production of universal entity/relationship Embeddings, which has several benefits: 1) Embedding Size is much smaller than the original feature space, reducing storage costs; 2) Low-dimensional vectors are denser, effectively alleviating the problem of data sparseness. ; 3) Learning in the same vector space makes the fusion of heterogeneous data from multiple sources more natural; 4) Embedding has certain universality and is convenient for downstream business use.

3. Graph Application

Next, I will share some typical application cases of knowledge graph in Ant Group.

1. Scenario application modes of the graph

Before introducing specific cases, let’s first introduce several modes of scenario application of the Ant Knowledge Graph, including knowledge acquisition, Knowledge management and reasoning, and knowledge services. As shown below.

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

#2. Some typical cases

Case 1: Structured matching recall based on knowledge graph

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

The business scenario is to download the content of the mini program in Alipay’s main search. The business pain points to be solved are:

  • products There is a lack of entities, as well as the relationship between goods and products.
  • #Weak product-level understanding of small programs.

#The solution is to build a merchant knowledge graph. Combined with the product relationship of the merchant map, a structured understanding of the user query product level is achieved.

Case 2: Real-time prediction of user intent in recommendation system

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

This case is for real-time prediction of user intent for homepage recommendations , AlipayKG was built, and the framework is shown in the figure above. Related work was also published on the top conference www 2023. You can refer to the paper for further understanding.

Case 3: Marketing coupon recommendation integrating knowledge representation

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

#This scenario is a scenario of consumer coupon recommendation, and the business faces The pain points are:

  • The head effect is serious.
  • # User verification and collection behavior is sparse.
  • #There are many cold start users and coupons, but the corresponding footprint data is lacking.

#In order to solve the above problems, we designed a deep vector recall algorithm that integrates dynamic graph representation. Because we found that the behavior of user consumption coupons is cyclical, a static single edge cannot model this cyclical behavior. To this end, we first constructed a dynamic graph, and then used the team's self-developed dynamic graph algorithm to learn the Embedding representation. After obtaining the representation, we put it into the twin tower model for vector recall.

Case 4: Intelligent claims expert rule reasoning based on diagnosis and treatment events

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

The last case is about graph rule reasoning. Taking the medical insurance health map as an example, it includes medical knowledge, claims rules, and "person" health information, which are linked to entities and coupled with logical rules as the basis for decision-making. Through the map, the efficiency of expert claims settlement has been improved.

4. Graphs and large models

Finally, let’s briefly discuss the opportunities of knowledge graphs in the context of the current rapid development of large models.

1. The relationship between knowledge graph and large model

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

Knowledge graph and large model each have their own advantages and disadvantages. The main advantages of large model are It has the advantages of general knowledge modeling and universality, and the shortcomings of large models can be made up for by the advantages of knowledge graphs. The advantages of the map include high accuracy and strong interpretability. Large models and knowledge graphs can influence each other.

There are usually three routes to the integration of graphs and large models. One is to use knowledge graphs to enhance large models; the second is to use large models to enhance knowledge graphs; the third is to use knowledge graphs to enhance large models. The large model and the knowledge graph work together and complement each other. The large model can be considered as a parameterized knowledge base, and the knowledge graph can be considered as a displayed knowledge base.

2. Cases of application of large models and knowledge graphs

Application of large models to knowledge graph construction

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

In the process of knowledge graph construction, large models can be used for information extraction, knowledge modeling and relationship reasoning.

How to use large models to apply to information extraction from knowledge graphs

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

This work of DAMO Academy decomposes the information extraction problem It becomes two stages:

  • In the first stage, we want to find the entities, relationships or event types that exist in the text to reduce the search space and Computational complexity.
  • #In the second stage, we further extract relevant information based on the previously extracted types and the given corresponding list.

Applying knowledge graph to large model

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

##Applying knowledge graph to large model mainly includes three aspects: Aspects:

Integrate the knowledge graph into the large model input. The knowledge graph can be used for data cleaning, or the knowledge graph can be used to directly perform formal splicing.

Integrate knowledge graph into large model training. For example, two tasks are trained at the same time. The knowledge graph can be used for knowledge representation tasks, and the large model can be used for pre-training of MLM, and the two are jointly modeled.

Inject knowledge graph into large model reasoning. First, two problems with large models can be solved. One is to use the knowledge graph as a priori constraints to avoid the "nonsense" of large models; the second is to solve the problem of timeliness of large models. On the other hand, based on knowledge graphs, interpretable solutions can be provided for large model generation.

Knowledge-enhanced question and answer system

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

Mainly includes two categories. One is the knowledge graph-enhanced question and answer system, which uses a large model to optimize the KBQA model; The other is information retrieval enhancement, similar to how LangChain, GopherCite, New Bing, etc. use large models to do knowledge base question and answer.

The knowledge-enhanced generative search Q&A system has the following advantages:

  • By accessing the search system, it solves Timeliness issues.
  • By providing a Reference link, manual verification can be performed to resolve factual errors.
  • #Introduces search results, enriches context, and enhances the effect of large model generation.

3. Summary and Outlook

Jia Qianghuai: Construction and application of large-scale knowledge graph of ants

How to better interact and collaborate with knowledge graphs and large models Progress includes the following three directions:

  • Promote the in-depth application of knowledge graphs and large models in NLP, question answering systems and other fields.
  • Use knowledge graphs for hallucination detection and detoxification of large models.
  • Research and development of large domain models combined with knowledge graphs.

The above is the detailed content of Jia Qianghuai: Construction and application of large-scale knowledge graph of ants. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete