search
HomeBackend DevelopmentPython TutorialPython draws stunning Sankey diagrams, have you learned it?

Python draws stunning Sankey diagrams, have you learned it?

Apr 12, 2023 pm 02:28 PM
pythondataSankey diagram

Introduction to Sankey Diagram

Many times, we need a situation where we must visualize how data flows between entities. For example, take how residents move from one country to another. Here's a demonstration of how many residents moved from England to Northern Ireland, Scotland and Wales.

Python draws stunning Sankey diagrams, have you learned it?

It is clear from this Sankey visualization that more residents moved from England to Wales than from Scotland or Northern Ireland.

What is a Sankey diagram?

Sankey diagrams usually depict the flow of data from one entity (or node) to another entity (or node).

The entities to which data flows are called nodes. The node where the data flow originates is the source node (for example, England on the left), and the node where the flow ends is the target node (for example, Wales on the right). Source and target nodes are usually represented as labeled rectangles.

The flow itself is represented by straight or curved paths, called links. The width of a stream/link is directly proportional to the volume/number of streams. In the example above, the movement from England to Wales (i.e. the migration of residents) is more extensive (i.e. the migration of residents) than the movement from England to Scotland or Northern Ireland (i.e. the migration of residents), indicating that more residents move to Wales than to other countries .

Sankey diagrams can be used to represent the flow of energy, money, costs, and anything that has a flow concept.

Minard's classic chart of Napoleon's invasion of Russia is probably the most famous example of a Sankey chart. This visualization using a Sankey diagram shows very effectively how the French army progressed (or decreased?) on its way to Russia and back.

Python draws stunning Sankey diagrams, have you learned it?

#In this article, we use python’s plotly to draw a Sankey diagram.

How to draw a Sankey diagram?

This article uses the 2021 Olympic Games data set to draw a Sankey diagram. The dataset contains detailed information about the total number of medals - country, total number of medals, and individual totals for gold, silver, and bronze medals. We plot a Sankey chart to find out how many gold, silver and bronze medals a country has won.

df_medals = pd.read_excel("data/Medals.xlsx")
print(df_medals.info())
df_medals.rename(columns={'Team/NOC':'Country', 'Total': 'Total Medals', 'Gold':'Gold Medals', 'Silver': 'Silver Medals', 'Bronze': 'Bronze Medals'}, inplace=True)
df_medals.drop(columns=['Unnamed: 7','Unnamed: 8','Rank by Total'], inplace=True)

df_medals
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93 entries, 0 to 92
Data columns (total 9 columns):
 # Column Non-Null CountDtype
--------- -------------------
 0 Rank 93 non-null int64
 1 Team/NOC 93 non-null object 
 2 Gold 93 non-null int64
 3 Silver 93 non-null int64
 4 Bronze 93 non-null int64
 5 Total93 non-null int64
 6 Rank by Total93 non-null int64
 7 Unnamed: 7 0 non-nullfloat64
 8 Unnamed: 8 1 non-nullfloat64
dtypes: float64(2), int64(6), object(1)
memory usage: 6.7+ KB
None

Python draws stunning Sankey diagrams, have you learned it?

Sankey diagram drawing basics

Use plotly's go.Sankey, this method takes 2 parameters - nodes and links (nodes and links ).

Note: All nodes - source and target should have unique identifiers.

In the case of the Olympic medals data set in this article:

Source is the country. Consider the first 3 countries (United States, China, and Japan) as source nodes. Label these source nodes with the following (unique) identifiers, labels, and colors:

  • 0: United States: Green
  • 1: China: Blue
  • 2: Japan: Orange

Target is gold, silver or bronze. Label these target nodes with the following (unique) identifiers, labels, and colors:

  • 3: Gold Medal: Gold
  • 4: Silver Medal: Silver
  • 5 : Bronze: Brown

Link (between source node and target node) is the number of medals of each type. In each source there are 3 links, each ending with a target - Gold, Silver and Bronze. So there are 9 links in total. The width of each link should be the number of gold, silver and bronze medals. Tag these links to targets, values ​​and colors with the following sources:

  • 0 (US) to 3,4,5 : 39, 41, 33
  • 1 (China) to 3 ,4,5 : 38, 32, 18
  • 2 (Japan) to 3,4,5 : 27, 14, 17

You need to instantiate 2 python dict objects to Represents

  • nodes (source and target): labels and colors as separate lists and
  • links: source node, target node, value (width) and color of the link as separate List

and pass it to plotly's go.Sankey.

Each index of the list (label, source, target, value and color) corresponds to a node or link.

NODES = dict( 
# 0 1 23 4 5 
label = ["United States of America", "People's Republic of China", "Japan", "Gold", "Silver", "Bronze"],
color = ["seagreen", "dodgerblue", "orange", "gold", "silver", "brown" ],)
LINKS = dict( 
source = [0,0,0,1,1,1,2,2,2], # 链接的起点或源节点
target = [3,4,5,3,4,5,3,4,5], # 链接的目的地或目标节点
value =[ 39, 41, 33, 38, 32, 18, 27, 14, 17], # 链接的宽度(数量)
# 链接的颜色
# 目标节点: 3-Gold4-Silver5-Bronze
color = [ 
"lightgreen", "lightgreen", "lightgreen",# 源节点:0 - 美国 States of America
"lightskyblue", "lightskyblue", "lightskyblue",# 源节点:1 - 中华人民共和国China
"bisque", "bisque", "bisque"],)# 源节点:2 - 日本
data = go.Sankey(node = NODES, link = LINKS)
fig = go.Figure(data)
fig.show()

Python draws stunning Sankey diagrams, have you learned it?

This is a very basic Sankey diagram. But have you noticed that the chart is too wide and the silver medals appear before the gold medals?

Here’s how to adjust the position and width of nodes.

Adjust node positions and chart width

Add x and y positions to nodes to explicitly specify the node's position. Value should be between 0 and 1.

NODES = dict( 
# 0 1 23 4 5 
label = ["United States of America", "People's Republic of China", "Japan", "Gold", "Silver", "Bronze"],
color = ["seagreen", "dodgerblue", "orange", "gold", "silver", "brown" ],)
x = [ 0,0,0,0.5,0.5,0.5],
y = [ 0,0.5,1,0.1,0.5,1],)
data = go.Sankey(node = NODES, link = LINKS)
fig = go.Figure(data)
fig.update_layout(title="Olympics - 2021: Country &Medals",font_size=16)
fig.show()

So we got a compact Sankey diagram:

Python draws stunning Sankey diagrams, have you learned it?

Let’s take a look at how the various parameters passed in the code are mapped to the nodes and nodes in the graph. Link.

Python draws stunning Sankey diagrams, have you learned it?

代码如何映射到桑基图

添加有意义的悬停标签

我们都知道plotly绘图是交互的,我们可以将鼠标悬停在节点和链接上以获取更多信息。

Python draws stunning Sankey diagrams, have you learned it?

带有默认悬停标签的桑基图

当将鼠标悬停在图上,将会显示详细信息。悬停标签中显示的信息是默认文本:节点、节点名称、传入流数、传出流数和总值。

例如:

  • 节点美国共获得11枚奖牌(=39金+41银+33铜)
  • 节点金牌共有104枚奖牌(=美国39枚,中国38枚,日本27枚)

如果我们觉得这些标签太冗长了,我们可以对此进程改进。使用hovertemplate参数改进悬停标签的格式

  • 对于节点,由于hoverlabels 没有提供新信息,通过传递一个空hovertemplate = ""来去掉hoverlabel
  • 对于链接,可以使标签简洁,格式为-
  • 对于节点和链接,让我们使用后缀"Medals"显示值。例如 113 枚奖牌而不是 113 枚。这可以通过使用具有适当valueformat和valuesuffix的update_traces函数来实现。
NODES = dict( 
# 0 1 23 4 5
label = ["United States of America", "People's Republic of China", "Japan", "Gold", "Silver", "Bronze"],
color = ["seagreen", "dodgerblue","orange", "gold", "silver", "brown" ],
x = [ 0,0, 0,0.5,0.5,0.5],
y = [ 0,0.5, 1,0.1,0.5,1],
hovertemplate=" ",)

LINK_LABELS = []
for country in ["USA","China","Japan"]:
for medal in ["Gold","Silver","Bronze"]:
LINK_LABELS.append(f"{country}-{medal}")
LINKS = dict(source = [0,0,0,1,1,1,2,2,2], 
 # 链接的起点或源节点
 target = [3,4,5,3,4,5,3,4,5], 
 # 链接的目的地或目标节点
 value =[ 39, 41, 33, 38, 32, 18, 27, 14, 17], 
 # 链接的宽度(数量) 
 # 链接的颜色
 # 目标节点:3-Gold4 -Silver5-Bronze
 color = ["lightgreen", "lightgreen", "lightgreen", # 源节点:0 - 美国
"lightskyblue", "lightskyblue", "lightskyblue", # 源节点:1 - 中国
"bisque", "bisque", "bisque"],# 源节点:2 - 日本
 label = LINK_LABELS, 
 hovertemplate="%{label}",)

data = go.Sankey(node = NODES, link = LINKS)
fig = go.Figure(data)
fig.update_layout(title="Olympics - 2021: Country &Medals",
font_size=16, width=1200, height=500,)
fig.update_traces(valueformat='3d', 
valuesuffix='Medals', 
selector=dict(type='sankey'))
fig.update_layout(hoverlabel=dict(bgcolor="lightgray",
font_size=16,
font_family="Rockwell"))
fig.show("png") #fig.show()

Python draws stunning Sankey diagrams, have you learned it?

带有改进的悬停标签的桑基图

对多个节点和级别进行泛化相对于链接,节点被称为源和目标。作为一个链接目标的节点可以是另一个链接的源。

该代码可以推广到处理数据集中的所有国家。

还可以将图表扩展到另一个层次,以可视化各国的奖牌总数。

NUM_COUNTRIES = 5
X_POS, Y_POS = 0.5, 1/(NUM_COUNTRIES-1)
NODE_COLORS = ["seagreen", "dodgerblue", "orange", "palevioletred", "darkcyan"]
LINK_COLORS = ["lightgreen", "lightskyblue", "bisque", "pink", "lightcyan"]

source = []
node_x_pos, node_y_pos = [], []
node_labels, node_colors = [], NODE_COLORS[0:NUM_COUNTRIES]
link_labels, link_colors, link_values = [], [], [] 

# 第一组链接和节点
for i in range(NUM_COUNTRIES):
source.extend([i]*3)
node_x_pos.append(0.01)
node_y_pos.append(round(i*Y_POS+0.01,2))
country = df_medals['Country'][i]
node_labels.append(country) 
for medal in ["Gold", "Silver", "Bronze"]:
link_labels.append(f"{country}-{medal}")
link_values.append(df_medals[f"{medal} Medals"][i])
link_colors.extend([LINK_COLORS[i]]*3)

source_last = max(source)+1
target = [ source_last, source_last+1, source_last+2] * NUM_COUNTRIES
target_last = max(target)+1

node_labels.extend(["Gold", "Silver", "Bronze"])
node_colors.extend(["gold", "silver", "brown"])
node_x_pos.extend([X_POS, X_POS, X_POS])
node_y_pos.extend([0.01, 0.5, 1])

# 最后一组链接和节点
source.extend([ source_last, source_last+1, source_last+2])
target.extend([target_last]*3)
node_labels.extend(["Total Medals"])
node_colors.extend(["grey"])
node_x_pos.extend([X_POS+0.25])
node_y_pos.extend([0.5])

for medal in ["Gold","Silver","Bronze"]:
link_labels.append(f"{medal}")
link_values.append(df_medals[f"{medal} Medals"][:i+1].sum())
link_colors.extend(["gold", "silver", "brown"])

print("node_labels", node_labels)
print("node_x_pos", node_x_pos); print("node_y_pos", node_y_pos)
node_labels ['United States of America', "People's Republic of China", 
 'Japan', 'Great Britain', 'ROC', 'Gold', 'Silver', 
 'Bronze', 'Total Medals']
node_x_pos [0.01, 0.01, 0.01, 0.01, 0.01, 0.5, 0.5, 0.5, 0.75]
node_y_pos [0.01, 0.26, 0.51, 0.76, 1.01, 0.01, 0.5, 1, 0.5]
# 显示的图
NODES = dict(pad= 20, thickness = 20, 
 line = dict(color = "lightslategrey",
 width = 0.5),
 hovertemplate=" ",
 label = node_labels, 
 color = node_colors,
 x = node_x_pos, 
 y = node_y_pos, )
LINKS = dict(source = source, 
 target = target, 
 value = link_values, 
 label = link_labels, 
 color = link_colors,
 hovertemplate="%{label}",)
data = go.Sankey(arrangement='snap', 
 node = NODES, 
 link = LINKS)
fig = go.Figure(data)
fig.update_traces(valueformat='3d', 
valuesuffix=' Medals', 
selector=dict(type='sankey'))
fig.update_layout(title="Olympics - 2021: Country &Medals",
font_size=16,
width=1200,
height=500,)
fig.update_layout(hoverlabel=dict(bgcolor="grey", 
font_size=14, 
font_family="Rockwell"))
fig.show("png") 

Python draws stunning Sankey diagrams, have you learned it?

The above is the detailed content of Python draws stunning Sankey diagrams, have you learned it?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Explain the performance differences in element-wise operations between lists and arrays.Explain the performance differences in element-wise operations between lists and arrays.May 06, 2025 am 12:15 AM

Arraysarebetterforelement-wiseoperationsduetofasteraccessandoptimizedimplementations.1)Arrayshavecontiguousmemoryfordirectaccess,enhancingperformance.2)Listsareflexiblebutslowerduetopotentialdynamicresizing.3)Forlargedatasets,arrays,especiallywithlib

How can you perform mathematical operations on entire NumPy arrays efficiently?How can you perform mathematical operations on entire NumPy arrays efficiently?May 06, 2025 am 12:15 AM

Mathematical operations of the entire array in NumPy can be efficiently implemented through vectorized operations. 1) Use simple operators such as addition (arr 2) to perform operations on arrays. 2) NumPy uses the underlying C language library, which improves the computing speed. 3) You can perform complex operations such as multiplication, division, and exponents. 4) Pay attention to broadcast operations to ensure that the array shape is compatible. 5) Using NumPy functions such as np.sum() can significantly improve performance.

How do you insert elements into a Python array?How do you insert elements into a Python array?May 06, 2025 am 12:14 AM

In Python, there are two main methods for inserting elements into a list: 1) Using the insert(index, value) method, you can insert elements at the specified index, but inserting at the beginning of a large list is inefficient; 2) Using the append(value) method, add elements at the end of the list, which is highly efficient. For large lists, it is recommended to use append() or consider using deque or NumPy arrays to optimize performance.

How can you make a Python script executable on both Unix and Windows?How can you make a Python script executable on both Unix and Windows?May 06, 2025 am 12:13 AM

TomakeaPythonscriptexecutableonbothUnixandWindows:1)Addashebangline(#!/usr/bin/envpython3)andusechmod xtomakeitexecutableonUnix.2)OnWindows,ensurePythonisinstalledandassociatedwith.pyfiles,oruseabatchfile(run.bat)torunthescript.

What should you check if you get a 'command not found' error when trying to run a script?What should you check if you get a 'command not found' error when trying to run a script?May 06, 2025 am 12:03 AM

When encountering a "commandnotfound" error, the following points should be checked: 1. Confirm that the script exists and the path is correct; 2. Check file permissions and use chmod to add execution permissions if necessary; 3. Make sure the script interpreter is installed and in PATH; 4. Verify that the shebang line at the beginning of the script is correct. Doing so can effectively solve the script operation problem and ensure the coding process is smooth.

Why are arrays generally more memory-efficient than lists for storing numerical data?Why are arrays generally more memory-efficient than lists for storing numerical data?May 05, 2025 am 12:15 AM

Arraysaregenerallymorememory-efficientthanlistsforstoringnumericaldataduetotheirfixed-sizenatureanddirectmemoryaccess.1)Arraysstoreelementsinacontiguousblock,reducingoverheadfrompointersormetadata.2)Lists,oftenimplementedasdynamicarraysorlinkedstruct

How can you convert a Python list to a Python array?How can you convert a Python list to a Python array?May 05, 2025 am 12:10 AM

ToconvertaPythonlisttoanarray,usethearraymodule:1)Importthearraymodule,2)Createalist,3)Usearray(typecode,list)toconvertit,specifyingthetypecodelike'i'forintegers.Thisconversionoptimizesmemoryusageforhomogeneousdata,enhancingperformanceinnumericalcomp

Can you store different data types in the same Python list? Give an example.Can you store different data types in the same Python list? Give an example.May 05, 2025 am 12:10 AM

Python lists can store different types of data. The example list contains integers, strings, floating point numbers, booleans, nested lists, and dictionaries. List flexibility is valuable in data processing and prototyping, but it needs to be used with caution to ensure the readability and maintainability of the code.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool