首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >美汤:从HTML有序列表中提取href

美汤:从HTML有序列表中提取href
EN

Stack Overflow用户
提问于 2013-04-29 04:55:53
回答 1查看 2.6K关注 0票数 2

我正在尝试使用HTML模块从BeautifulSoup有序列表中提取URL。我的代码返回一个列表,其中没有一个值的数量等于有序列表中的项数,因此我知道我在文档中的正确位置。我做错了什么?

我正在抓取的网址是http://www.dailykos.com/story/2013/04/27/1203495/-GunFAIL-XV,这里有50行中的5行(很抱歉长度问题):

代码语言:javascript
复制
> `<div id="body" class="article-body">
<ol>
<li><a href="http://www.wacotrib.com/news/city_of_waco/waco_police/grandfather-s-wound-is-a-gunshot-police-say/article_aeccbf93-4f81-5c3f-a304-bc91f6ba45a8.html">WACO, TX</a>, 3/18/13: Police responding to a domestic disturbance call found a man struggling to restrain his grandson, who was agitated and holding an AR-15. The cops shot grandpa. But that would totally never happen in a crowded theater.</li>
<li><a href="http://grossepointe.patch.com/articles/grosse-pointe-park-police-make-several-arrests">GROSSE POINTE PARK, MI</a>, 4/06/13: Grosse Pointe Park police arrested a 20-year-old Detroit man April 6 after he accidentally shot a 9mm handgun into the floor of a home in the 1000 block of Beaconsfield. The man was trying to make the gun safe when it discharged.</li>
<li><a href="http://ottawaherald.com/news/041613shooting">OTTAWA, KS</a>, 4/13/13: No one was injured when a “negligent” rifle shot rang out Saturday night inside a residence in the 1600 block of South Cedar Street in Ottawa. Dylan Spencer, 22, Ottawa, was arrested by Ottawa police about 7 p.m. on suspicion of unlawfully discharging an AR-15 rifle in his apartment, according to a police report. <a href="https://www.facebook.com/OttawaHerald/posts/000000000000000?comment_id=656061&amp;reply_comment_id=656512&amp;total_comments=3">The bullet</a> exited his apartment, passed through both walls of an occupied apartment and lodged into a utility pole. But of course, Dylan didn't think the gun was loaded. So it's cool.</li>
<li><a href="http://www.kobi5.com/component/zoo/item/klamath-falls-man-dead-after-a-shooting.html">KLAMATH FALLS, OR</a>, 4/13/13: An investigation into the shooting death of Lee Roy Myers, 47, has been ruled accidental. The Klamath County Major Crimes Team was called to investigate a shooting on Saturday, April 13. An autopsy concluded the cause of death was an accidental, self-inflicted handgun wound.</li>
<li><a href="http://westhampton-hamptonbays.patch.com/groups/police-and-fire/p/accidental-weapon-discharge">SOUTHAMPTON, NY</a>, 4/13/13: The report states that the detective visited the home and interviewed the man, who legally owned the Ruger 10/22 rifle. The man said he was cleaning the rifle when it accidentally discharged into his big toe. When the rifle was pointed in a downward angle, inertia caused the firing pin to strike the primer, which caused the rifle to fire, according to the incident report. The detective advised the man on safety techniques while cleaning his rifle. (Step one: unload it.)</li>`

下面是我的代码:

代码语言:javascript
复制
page= urllib2.urlopen(url)
soup = BeautifulSoup(page)
li=soup.select("ol > li")
for link in li:
    print (link.get('href'))
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-04-29 05:41:43

您正在迭代没有href属性的li元素。它们内部的a标记执行以下操作:

代码语言:javascript
复制
import urllib2
from bs4 import BeautifulSoup

url = "http://www.dailykos.com/story/2013/04/27/1203495/-GunFAIL-XV"

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
li = soup.select("ol > li > a")
for link in li:
    print(link.get('href'))
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/16267768

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档