我正在尝试将通过BeautifulSoup提取的表转换为JSON。
到目前为止,尽管我不确定如何从此处处理数据,但我设法隔离了所有行。任何建议将不胜感激。
[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, <tr><td>Card name</td><td>Name</td></tr>, <tr><td>Account holder</td><td>NAME</td></tr>, <tr><td>Card number</td><td>1234</td></tr>, <tr><td>Status</td><td>Active</td></tr>]
(为了方便阅读,换行了)
这是我的尝试:
result = [] allrows = table.tbody.findAll('tr') for row in allrows: result.append([]) allcols = row.findAll('td') for col in allcols: thestrings = [unicode(s) for s in col.findAll(text=True)] thetext = ''.join(thestrings) result[-1].append(thetext)
这给了我以下结果:
[ [u'Card balance', u'$18.30'], [u'Card name', u'NAMEn'], [u'Account holder', u'NAME'], [u'Card number', u'1234'], [u'Status', u'Active'] ]
您的数据可能类似于:
html_data = """ <table> <tr> <td>Card balance</td> <td>$18.30</td> </tr> <tr> <td>Card name</td> <td>NAMEn</td> </tr> <tr> <td>Account holder</td> <td>NAME</td> </tr> <tr> <td>Card number</td> <td>1234</td> </tr> <tr> <td>Status</td> <td>Active</td> </tr> </table> """
我们可以使用以下代码从中以列表的形式获得您的结果:
from bs4 import BeautifulSoup table_data = [[cell.text for cell in row("td")] for row in BeautifulSoup(html_data)("tr")]
要将结果转换为JSON(如果您不关心顺序):
import json print json.dumps(dict(table_data))
结果:
{ "Status": "Active", "Card name": "NAMEn", "Account holder": "NAME", "Card number": "1234", "Card balance": "$18.30" }
如果您需要相同的订单,请使用以下命令:
from collections import OrderedDict import json print json.dumps(OrderedDict(table_data))
这给你:
{ "Card balance": "$18.30", "Card name": "NAMEn", "Account holder": "NAME", "Card number": "1234", "Status": "Active" }