文章/答案/技术大牛

发布

社区首页 >问答首页 >美丽汤HTML提取

问美丽汤HTML提取
EN

Stack Overflow用户

提问于 2013-05-24 18:06:47

回答 1查看 183关注 0票数 0

我正在努力获得我想要的数据，如果你知道如何使用BS，我相信这很简单。几个小时以来，我一直在尝试解决这个问题，但在阅读了文档后却一无所获。

目前，我的代码使用python输出以下内容：

[<td>0.32%</td>, <td><span class="neg color ">&gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, <td><span class="neu">0.00</span></td>]

如何隔离不包含标签的td标签的内容？

也就是说，我希望看到0.32%，0.29%，0.38%。

谢谢。

import urllib2
from bs4 import BeautifulSoup

fturl = 'http://markets.ft.com/research/Markets/Bonds'
ftcontent = urllib2.urlopen(fturl).read()
soup = BeautifulSoup(ftcontent)

ftdata = soup.find(name="div", attrs={'class':'wsodModuleContent'}).find_all(name="td",       attrs={'class':''})

python

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-05-24 18:16:36

这个解决方案对你来说合适吗：

html_txt = """<td>0.32%</td>, <td><span class="neg color">
    &gt;-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, 
    <td><span class="neu">0.00</span></td>
    """
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_txt)
print [tag.text for tag in soup.find_all('td') if tag.text.strip().endswith("%")]

输出为：

[u'0.32%', u'0.29%', u'0.38%']

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/16732485

复制

相似问题

问美丽汤HTML提取
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽汤HTML提取EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽汤HTML提取
EN