Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering by LinkedIn Corporation

In this chapter, we provide a review of a very interesting paper by LinkedIn - a summary of a notable case study on industry0specific implementation of knowledge graphs and GenAI in customer support. The paper, "Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering," introduces an innovative approach that integrates RAG with Knowledge Graphs (KGs) to improve customer service query responses. By leveraging historical issue tickets, the framework constructs a dual-layer KG that captures both intra-issue hierarchies and inter-issue relationships, preserving the structural and relational information often overlooked in traditional retrieval systems. Explicit connections are derived from metadata, while semantic embeddings infer implicit links, enabling the model to harness both direct and contextual information. This integration allows for enhanced query understanding and retrieval precision, resulting in more relevant and contextually aware responses. The approach demonstrates the potential of combining KGs and RAG to address the complexities of customer service queries effectively.

Image Source

Knowledge Graph Construction

The cornerstone of the RAG pipeline described in the research is the Knowledge Graph (KG). The KG provides a structured and interconnected representation of historical customer service data, enabling the system to navigate complex relationships between issues, solutions, and contexts. By constructing a graph-based representation, the approach overcomes the limitations of flat, unstructured data storage methods, ensuring both contextual richness and scalability.

Customer service issues often involve repetitive or related questions. For example, issues about “login problems” may share common underlying causes or resolutions. A well-constructed KG organizes such data into meaningful nodes and relationships, enabling effective retrieval for answering user queries.

Detailed Implementation

1. Graph Structure Definition

Intra-Issue Graph:

Each customer service issue is represented as a tree.
Nodes include fields such as Summary, Description, Steps to Reproduce, Expected Outcome, and Observed Outcome.
Edges represent relationships between these fields, such as hierarchical connections (e.g., Summary is the parent of Steps to Reproduce).

Inter-Issue Graph:

Connections between different issue trees.
Explicit Connections: Defined by metadata like “cloned from,” “related to,” or “duplicates.”
Implicit Connections: Determined by semantic similarity of issue content, calculated using embedding models.

2. Parsing Customer Service Tickets

Predefined Fields:

Extract fields like code snippets, issue summary, and reproduction steps using simple rule-based parsers.
For instance, extract "Observed Outcome" by locating text under this label in a structured ticket.

LLM-Assisted Parsing:

Use LLMs to process unstructured text.
Example: Parse a raw description into structured YAML output for consistency.

3. Establishing Relationships

Explicit Relationships:

Directly derive connections from the ticket metadata, such as:
Ticket A is a duplicate of Ticket B.
Ticket C is cloned from Ticket D.

Implicit Relationships:

Compute embeddings for ticket summaries using models like BERT.
Measure cosine similarity to identify tickets discussing similar issues.

Code Implementation

Below is a basic implementation of parsing and constructing the graph using a hypothetical dataset:

from py2neo import Graph, Node, Relationship

from sentence_transformers import SentenceTransformer

from sklearn.metrics.pairwise import cosine_similarity

# Connect to a Neo4j database

graph_db = Graph("bolt://localhost:7687", auth=("neo4j", "password"))

# Load ticket data

tickets = [

{"id": "T1", "summary": "Login issue", "description": "Cannot login to LinkedIn", "related_to": "T2"},

{"id": "T2", "summary": "Account locked", "description": "User account locked after multiple login attempts"},

]

# Create nodes

for ticket in tickets:

node = Node("Issue", id=ticket["id"], summary=ticket["summary"], description=ticket["description"])

graph_db.merge(node, "Issue", "id")

# Add explicit relationships

for ticket in tickets:

if "related_to" in ticket:

issue_node = graph_db.nodes.match("Issue", id=ticket["id"]).first()

related_node = graph_db.nodes.match("Issue", id=ticket["related_to"]).first()

if related_node:

relation = Relationship(issue_node, "RELATED_TO", related_node)

graph_db.merge(relation)

# Implicit relationships based on embeddings

model = SentenceTransformer("all-MiniLM-L6-v2")

embeddings = model.encode([ticket["summary"] for ticket in tickets])

similarities = cosine_similarity(embeddings)

for i, ticket in enumerate(tickets):

for j, similarity in enumerate(similarities[i]):

if i != j and similarity > 0.8:

node_a = graph_db.nodes.match("Issue", id=tickets[i]["id"]).first()

node_b = graph_db.nodes.match("Issue", id=tickets[j]["id"]).first()

if node_a and node_b:

relation = Relationship(node_a, "SIMILAR_TO", node_b, similarity=similarity)

graph_db.merge(relation)

Outcome

This structured KG ensures:

Improved Retrieval: Nodes and edges allow precise navigation to relevant data.
Context Preservation: Relationships within and across issues capture the full problem context.
Scalability: Semantic embeddings enable the system to adapt dynamically to new data, scaling efficiently with large datasets.

The constructed KG serves as the foundation for subsequent retrieval and response generation tasks, directly impacting the system's performance and accuracy.

Semantic Embedding and Knowledge Representation

The effectiveness of the RAG pipeline hinges on accurately representing customer service tickets in a semantic space. Semantic embedding bridges the gap between unstructured text and structured graph queries. By converting text into dense, context-aware vector representations, this step enables efficient similarity-based retrieval and enhances the ability to generalize across diverse issues.

Semantic embedding ensures that the system not only recognizes similar phrasing but also understands the contextual meaning of tickets. For example, “Unable to access account” and “Login problem” refer to similar issues, which embedding models can identify by mapping these phrases close to each other in the semantic space.

Detailed Implementation

Selection of Embedding Models

Pre-trained Models: Models such as SentenceTransformer (e.g., ‘all-MiniLM-L6-v2’) or BERT are widely used for generating embeddings.
Domain-Specific Fine-Tuning: For improved accuracy, models can be fine-tuned on a corpus of customer service tickets.

Embedding Ticket Components

Each ticket is broken into semantically meaningful components (e.g., summary, description, steps to reproduce). Separate embeddings are computed for each component:

Summary Embedding: Captures the overall essence of the issue.
Description Embedding: Provides deeper contextual understanding.
Steps Embedding: Focuses on reproducibility and technical details.
Embedding Storage and Indexing

To facilitate efficient retrieval, embeddings are stored and indexed:

Vector Databases: Deep Lake for highly accurate and sub-second similarity searches.
Metadata Augmentation: Each embedding is tagged with ticket metadata (e.g., ticket ID, creation date, severity).
Computing Similarities

Using cosine similarity, relationships between tickets are identified:

High Similarity: Indicates potential duplicates or related tickets.
Threshold Tuning: A similarity score threshold (e.g., 0.8) balances recall and precision.

Code Implementation

Below is a step-by-step implementation for generating and storing embeddings:

from sentence_transformers import SentenceTransformer

from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

# Example ticket dataset

tickets = [

{"id": "T1", "summary": "Login issue", "description": "Cannot login to LinkedIn"},

{"id": "T2", "summary": "Account locked", "description": "Account locked after multiple login attempts"},

{"id": "T3", "summary": "Payment failed", "description": "Credit card payment failed on checkout"},

]

# Initialize the embedding model

model = SentenceTransformer("all-MiniLM-L6-v2")

# Generate embeddings

summaries = [ticket["summary"] for ticket in tickets]

descriptions = [ticket["description"] for ticket in tickets]

summary_embeddings = model.encode(summaries)

description_embeddings = model.encode(descriptions)

# Combine embeddings (optional: use weights for each component)

combined_embeddings = np.hstack([summary_embeddings, description_embeddings])

# Calculate similarity matrix

similarity_matrix = cosine_similarity(combined_embeddings)

# Display similarity results

for i, ticket in enumerate(tickets):

print(f"Similarities for Ticket {ticket['id']}:")

for j, score in enumerate(similarity_matrix[i]):

if i != j and score > 0.8: # Adjust threshold as needed

print(f" - Similar to Ticket {tickets[j]['id']} with score: {score:.2f}")

Outcome

Unified Representation: The semantic embeddings provide a consistent and context-aware representation of tickets.
Efficient Retrieval: Embedding similarity enables rapid identification of related tickets for complex queries.
Enhanced Graph Integration: The embeddings serve as a bridge to enrich the Knowledge Graph with implicit relationships based on contextual relevance.
Domain Adaptability: Fine-tuning or using domain-specific embeddings ensures the system adapts well to customer service-specific language and nuances.

This semantic embedding phase forms the backbone of retrieval in the RAG pipeline, ensuring relevance and precision in downstream tasks.

Graph Construction and Enrichment

Once the semantic embeddings are prepared, the next step is to structure the information as a graph. Graphs are powerful tools for representing relationships between entities, such as tickets, users, and potential solutions. This representation allows the system to:

Visualize connections between related tickets.
Enable traversal-based querying for complex relationships.
Incorporate additional metadata and inferred connections to enrich the graph.

Graph construction serves as a foundational layer for advanced querying and retrieval in the RAG pipeline. By encoding relationships such as “is duplicate of,” “relates to,” or “suggested solution,” the graph provides an intuitive and efficient way to analyze and retrieve information.

Detailed Implementation

1. Node and Edge Definition

Nodes:

Tickets: Represented as nodes containing metadata (e.g., ticket ID, summary, creation date).
Solutions: Nodes storing possible resolutions or fixes.
Users: Nodes representing the originator of the ticket.

Edges:

Semantic Relationships: Based on cosine similarity scores from the embedding phase.
Explicit Connections: E.g., “is duplicate of,” “suggested solution,” or “escalated by.”

2. Enrichment with Metadata

Each node and edge is enriched with metadata:

Nodes: Include fields such as creation date, severity, and status (e.g., open, resolved).
Edges: Carry relationship descriptions and confidence scores (e.g., "Similarity: 0.85").

3. Graph Representation and Storage

Graph Libraries: Tools such as NetworkX or Graphistry for graph manipulation.
Graph Databases: Persistent storage using graph databases like Neo4j or TigerGraph to enable scalable querying.

4. Relationship Inference

Edges between nodes are inferred based on:

Embedding Similarity: Tickets with similarity above a threshold are connected.
Rule-Based Relationships: Predefined rules (e.g., escalation paths) add additional edges.

Code Implementation

Below is an example for constructing and enriching a graph using NetworkX:

import networkx as nx

# Example ticket dataset

tickets = [

{"id": "T1", "summary": "Login issue", "description": "Cannot login to LinkedIn", "status": "open"},

{"id": "T2", "summary": "Account locked", "description": "Account locked after multiple login attempts", "status": "resolved"},

{"id": "T3", "summary": "Payment failed", "description": "Credit card payment failed on checkout", "status": "open"},

]

# Similarity scores (from embedding phase)

similarities = [

("T1", "T2", 0.85), # High similarity

("T1", "T3", 0.40), # Low similarity

("T2", "T3", 0.60), # Moderate similarity

]

# Initialize a directed graph

graph = nx.DiGraph()

# Add nodes with metadata

for ticket in tickets:

graph.add_node(ticket["id"], summary=ticket["summary"], status=ticket["status"])

# Add edges based on similarity

for source, target, score in similarities:

if score > 0.7: # Threshold for creating an edge

graph.add_edge(source, target, relationship="is similar to", similarity=score)

# Display graph information

print("Graph Nodes:")

print(graph.nodes(data=True))

print("\nGraph Edges:")

print(graph.edges(data=True))

Outcome

Node and Edge Mapping: The graph encapsulates relationships between tickets, providing a rich structure for queries.
Enhanced Queryability: Users can query the graph for tickets with specific relationships or metadata.
Scalability: Persistent graph databases like Neo4j allow the graph to scale efficiently with the dataset.
Dynamic Relationship Updates: The graph can evolve dynamically as new tickets and relationships are identified.

By structuring the dataset as a graph, the system creates a rich, context-aware representation of ticket relationships, forming the backbone of the GraphRAG pipeline. This step ensures seamless integration with downstream query and summarization tasks.

Graph-Based Retrieval Mechanism

With the graph structure in place, the next step is implementing a graph-based retrieval mechanism to answer complex queries. Traditional retrieval systems rely on keyword or semantic similarity but often fail to capture contextual relationships. In contrast, a graph-based system leverages the structured connections between nodes (tickets, solutions, and users) to provide contextually enriched and accurate retrieval results.

This mechanism ensures:

Context-Aware Retrieval: Queries can traverse relationships in the graph to fetch not just directly related nodes but also second-order or inferred connections.
Complex Query Resolution: Handles multi-faceted queries that require understanding of relationships, such as “What solutions are related to unresolved tickets from a specific user?”
Scalability and Performance: Efficient graph traversal and ranking algorithms ensure quick responses to complex queries.

Detailed Implementation

Query Processing

The retrieval mechanism begins by:

Parsing the user query into intent and entities.

Example: Query: “What are unresolved tickets similar to T1?”

Intent: Retrieve unresolved tickets.
Entity: Node ‘T1’ (specific ticket).

Graph Traversal Techniques

The retrieval process involves:

Direct Traversal: Fetching neighbors of a node based on edge relationships.
Weighted Traversal: Using edge weights (e.g., similarity scores) to prioritize results.
Path Finding: Identifying paths between nodes to uncover indirect relationships.
Re-Ranking Results

Results from traversal are ranked based on:

Edge weights (e.g., similarity scores).
Node metadata relevance (e.g., ticket status or creation date).
Integration with Semantic Retrieval

Graph traversal results can be combined with embedding-based retrieval for hybrid retrieval:

Use embeddings to shortlist relevant nodes.
Refine the results through graph-based traversal.

Code Implementation

Here’s an example of implementing a graph-based retrieval function using NetworkX:

def graph_retrieval(graph, query_node, relationship_filter=None, status_filter=None):

"""

Retrieve nodes from the graph based on relationships and metadata filters.

Args:

graph (nx.Graph): The graph object.

query_node (str): The starting node for traversal.

relationship_filter (str, optional): Filter by specific relationship type (e.g., 'is similar to').

status_filter (str, optional): Filter by node status (e.g., 'open').

Returns:

list: Retrieved nodes with metadata.

"""

# Get neighbors of the query node

neighbors = graph.neighbors(query_node)

results = []

for neighbor in neighbors:

edge_data = graph.get_edge_data(query_node, neighbor)

node_data = graph.nodes[neighbor]

# Apply relationship filter

if relationship_filter and edge_data["relationship"] != relationship_filter:

continue

# Apply status filter

if status_filter and node_data["status"] != status_filter:

continue

# Append to results

results.append({

"node_id": neighbor,

"relationship": edge_data["relationship"],

"similarity": edge_data.get("similarity", None),

"status": node_data["status"],

})

# Sort results by similarity if available

return sorted(results, key=lambda x: x.get("similarity", 0), reverse=True)

# Example Usage

query_results = graph_retrieval(graph, query_node="T1", relationship_filter="is similar to", status_filter="open")

print(query_results)

Outcome

Efficient Query Handling: The mechanism retrieves directly connected nodes while adhering to relationship and metadata constraints.
Contextual Enrichment: Results are enriched with metadata and contextual relevance, ensuring precision.
Dynamic Adaptation: Queries can evolve dynamically by adjusting filters or traversal parameters.
Scalability: The mechanism can efficiently handle large graphs through libraries like NetworkX or Neo4j.

By enabling graph-based retrieval, the system supports a powerful querying layer that complements the structured representation created earlier. This step ensures that even the most complex queries yield actionable and contextually relevant insights.

Integration of Semantic Embeddings with Graph Traversals

Graph-based retrieval systems, while excellent at leveraging structured relationships, may falter when nodes have limited connectivity or when the relationship semantics aren't explicit in the graph. Integrating semantic embeddings—vector representations of textual data—with graph traversals enhances the system's capability to retrieve contextually rich results.

This hybrid mechanism combines the strengths of:

Semantic Understanding: Capturing the underlying meaning of nodes and edges via embeddings.
Graph Traversal: Exploring explicit structural relationships within the data.

Key Goals

Improve retrieval precision by combining semantic similarity with graph-based relationships.
Support broader queries where explicit graph links might be missing but semantic overlap exists.
Provide robust adaptability to unseen queries by leveraging pretrained embedding models.

Detailed Implementation

Embedding Generation

Use models like OpenAI's GPT or Hugging Face transformers to generate embeddings for nodes.
Embeddings are stored as node attributes in the graph, enabling fast similarity calculations during retrieval.

Hybrid Retrieval Workflow

Step 1: Perform semantic similarity search to identify initial candidates.
Step 2: Use graph traversal techniques to refine and contextualize these candidates based on relationships and metadata.

Distance Metrics for Similarity

Cosine Similarity: Common for comparing embeddings. Nodes with cosine similarity above a threshold are considered related.
Euclidean Distance: Alternative metric, especially for dense embedding spaces.

Workflow Integration

Combine the semantic similarity scores with edge weights to compute a final relevance ranking.
Provide a flexible mechanism to adjust the weightage of semantic versus structural relevance based on the query type.

Code Implementation

Here’s an example implementation of hybrid retrieval, combining semantic similarity with graph traversals:

from sklearn.metrics.pairwise import cosine_similarity

import numpy as np

def generate_embeddings(data, model):

"""

Generate embeddings for the given data using a pretrained model.

Args:

data (list of str): Text data for embedding generation.

model: Pretrained embedding model.

Returns:

np.ndarray: Generated embeddings.

"""

return np.array([model.encode(text) for text in data])

def hybrid_retrieval(graph, query_embedding, model, top_k=10, relationship_filter=None):

"""

Perform hybrid retrieval combining semantic embeddings and graph traversal.

Args:

graph (nx.Graph): The graph object.

query_embedding (np.ndarray): The embedding of the query.

model: Pretrained embedding model for node embeddings.

top_k (int): Number of top results to return.

relationship_filter (str, optional): Filter by specific relationship type.

Returns:

list: Retrieved nodes with combined relevance scores.

"""

results = []

for node in graph.nodes:

node_embedding = graph.nodes[node].get("embedding", None)

if node_embedding is not None:

# Calculate semantic similarity

similarity = cosine_similarity([query_embedding], [node_embedding])[0][0]

neighbors = list(graph.neighbors(node))

# Apply relationship filter if provided

if relationship_filter:

neighbors = [

n for n in neighbors

if graph.get_edge_data(node, n)["relationship"] == relationship_filter

]

# Combine similarity with graph-based relevance

score = similarity + len(neighbors) # Example: semantic + structural score

results.append({"node": node, "score": score, "neighbors": neighbors})

# Return top-k results

return sorted(results, key=lambda x: x["score"], reverse=True)[:top_k]

# Example Usage

query = "Unresolved tickets similar to T1"

query_embedding = generate_embeddings([query], model)

results = hybrid_retrieval(graph, query_embedding, model, top_k=5, relationship_filter="is similar to")

print(results)

Outcome

Enhanced Precision: Integrating embeddings ensures results aren't solely dependent on explicit graph connections.
Query Flexibility: Supports queries that rely on semantic overlap or implicit relationships.
Scalability: Embedding-based retrieval can scale to large graphs by indexing embeddings and performing efficient similarity searches.
Adaptability: Pretrained embeddings can be fine-tuned for domain-specific tasks, improving retrieval accuracy in specialized contexts.

This step enriches the GraphRAG pipeline by marrying semantic depth with structural insights, creating a robust hybrid retrieval mechanism for handling diverse and complex queries.

Optimizing Query Execution and Real-Time Adaptation

As GraphRAG systems grow in complexity and handle larger datasets, ensuring optimal query execution becomes critical. Real-time adaptation focuses on dynamically optimizing retrieval processes based on the query type, user context, and data availability. This section addresses the challenges of:

Efficient Query Execution: Ensuring minimal latency for real-time applications.
Dynamic Adaptation: Tailoring retrieval strategies dynamically to fit specific query demands.
Resource Optimization: Balancing computational resources to handle multiple concurrent queries effectively.

Key Goals

Optimize query execution pipelines to handle high traffic efficiently.
Introduce dynamic adaptation mechanisms to adjust retrieval strategies in real-time.
Leverage advanced indexing and caching to improve performance.

Detailed Implementation

Adaptive Query Execution

Query Categorization: Classify incoming queries into predefined categories, such as fact-based, exploratory, or relationship-centric. This classification directs the retrieval mechanism.
Dynamic Weight Adjustment: Adjust the weightage between graph-based relevance and semantic similarity based on the query type.
Real-Time Feedback Loop: Incorporate a feedback loop to refine the retrieval process based on user interactions or query-specific outcomes.

Indexing and Caching

Use an index (e.g., Deep Lake's for highly accurate and sub-second similarity searches) for rapid semantic similarity search.
Cache frequently accessed nodes and subgraphs for recurring queries to reduce computation overhead.

Scalability Enhancements

Implement asynchronous processing for handling multiple queries simultaneously.
Utilize graph partitioning to parallelize traversal processes across subgraphs.

Query Optimization Techniques

Top-k Approximation: Retrieve only the top-k most relevant results rather than traversing the entire graph.
Relationship Filtering: Focus on specific types of edges or paths based on query intent.
Batch Processing: Aggregate multiple similar queries and execute them together to save computational resources.

Outcome

Performance Boost: Optimized execution reduces latency, enabling real-time responsiveness for end-users.
Dynamic Flexibility: Adaptive mechanisms ensure tailored retrieval strategies for diverse query types.
Resource Efficiency: Advanced indexing, caching, and parallelization reduce resource utilization while maintaining accuracy.
Enhanced Scalability: Handle large datasets and high query volumes without significant performance degradation.

This final step ensures that the GraphRAG system remains robust, responsive, and adaptable, completing the pipeline with a focus on user experience and scalability.