如何使用BeautifulSoup从内联样式中提取CSS属性

一尘不染

如何使用BeautifulSoup从内联样式中提取CSS属性

python

我有这样的事情：

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/>

我正在使用beautifulsoup解析html。有没有办法拉出“背景” css属性中的“ URL”？

阅读 237

2021-01-20

共1个答案

一尘不染

您有两种选择-快速，肮脏或正确的方法。快速而肮脏的方式（如果更改标记，则很容易中断）看起来像

>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
>>> style = soup.find('img')['style']
>>> urls = re.findall('url\((.*?)\)', style)
>>> urls
[u'/theRealImage.jpg']

显然，您必须使用它才能使其与多个img标签一起使用。

正确的方法是，建议有人在CSS字符串上使用正则表达式:)会很糟糕，因此使用CSS解析器。cssutils是我刚刚在Google上找到的一个库，可以在PyPi上找到，它看起来可以完成这项工作。

2021-01-20