如何将dataFrame中的groupby分组表写入word文档？

小能豆

如何将dataFrame中的groupby分组表写入word文档？

我是 pandas 和 python-docx 的新手，有一张使用 pandas 中的 groupby 的表格：

使用以下方式打印 dfprint(df)

输出：

Change Type      typeA        typeB     typeC    typeD    typeE    typeF
Component
A                  0            2        6         0         0       6
B                  0            3        2         1         1       3
C                  0            1        0         0         0       4
D                  0            2        2         0         0       3
E                  0            0        0         0         0       1
F                  0            3        0         0         1       2
G                  2            1        3         0         2       3
H                  0            0        0         0         0       1
I                  0            1        0         0         0       0

我使用以下内容将此 dataFramedf写入word document：

t = doc.add_table(df.shape[0]+1, df.shape[1])

for j in range(df.shape[-1]):
    t.cell(0,j).text = df.columns[j]


for i in range(df.shape[0]):
    for j in range(df.shape[-1]):
        t.cell(i+1,j).text = str(df.values[i,j])

我打印了以下内容：

typeA        typeB     typeC    typeD    typeE    typeF
  0            2        6         0         0       6
  0            3        2         1         1       3
  0            1        0         0         0       4
  0            2        2         0         0       3
  0            0        0         0         0       1
  0            3        0         0         1       2
  2            1        3         0         2       3
  0            0        0         0         0       1
  0            1        0         0         0       0

我是新手，因此无法弄清楚我错在哪里，我想打印整个表格？

阅读 17

2025-01-04

共1个答案

小能豆

问题的原因在于你的代码中没有包括 DataFrame 的索引列(Component)，导致生成的 Word 表格中缺少该列。

修复代码

你需要在创建 Word 表格时，手动处理 DataFrame 的索引列，以确保它被正确写入。以下是修复后的代码：

from docx import Document
import pandas as pd

# 创建示例 DataFrame
data = {
    'Change Type': ['typeA', 'typeB', 'typeC', 'typeD', 'typeE', 'typeF'],
    'A': [0, 2, 6, 0, 0, 6],
    'B': [0, 3, 2, 1, 1, 3],
    'C': [0, 1, 0, 0, 0, 4],
    'D': [0, 2, 2, 0, 0, 3],
    'E': [0, 0, 0, 0, 0, 1],
    'F': [0, 3, 0, 0, 1, 2],
    'G': [2, 1, 3, 0, 2, 3],
    'H': [0, 0, 0, 0, 0, 1],
    'I': [0, 1, 0, 0, 0, 0]
}
df = pd.DataFrame(data).set_index('Change Type')

# 创建 Word 文档
doc = Document()

# 添加表格
t = doc.add_table(rows=df.shape[0] + 1, cols=df.shape[1] + 1)

# 填写表头
t.cell(0, 0).text = 'Component'  # 索引列标题
for j in range(df.shape[1]):
    t.cell(0, j + 1).text = df.columns[j]

# 填写表格内容
for i in range(df.shape[0]):
    t.cell(i + 1, 0).text = df.index[i]  # 写入索引列
    for j in range(df.shape[1]):
        t.cell(i + 1, j + 1).text = str(df.iloc[i, j])

# 保存文档
doc.save('output.docx')

修复后输出的 Word 表格

Word 表格将包含完整的 DataFrame，包括索引列（Component）和所有的列数据。

Component	typeA	typeB	typeC	typeD	typeE	typeF
A	0	2	6	0	0	6
B	0	3	2	1	1	3
C	0	1	0	0	0	4
D	0	2	2	0	0	3
E	0	0	0	0	0	1
F	0	3	0	0	1	2
G	2	1	3	0	2	3
H	0	0	0	0	0	1
I	0	1	0	0	0	0

注意事项

确保 python-docx 已安装：运行 pip install python-docx。
Word 表格中不支持某些 pandas 的格式（如多级索引等），需要提前格式化好 DataFrame。
如果表格较大，建议对列宽进行调整或使用其他方式（如 Excel）存储数据以避免 Word 表格过于拥挤。

2025-01-04