文章/答案/技术大牛

发布

问PHP Curl to PHP DOMDocument
EN

Stack Overflow用户

提问于 2015-07-01 23:13:27

回答 1查看 309关注 0票数 0

下面是我从网页中提取的相同代码...

<div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Profession
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    administration
                </span>
            </div>
        </div>

<div class="user-details-narrow">
            <div class="profileheadtitle">
                <span class=" headline txtBlue size15">
                    Industry
                </span>
            </div>
            <div class="profileheadcontent-narrow">
                <span class="txtGrey size15">
                    banking
                </span>
            </div>
        </div>

我想要实现的是提取这些DIVs中的数据。例如..。

专业=管理员行业=银行

目前我正在使用Curl拉取网页，然后剥离html标签，并使用数百个preg_match和if函数。虽然该解决方案工作得很好，但它确实使用了大量的cpu和ram。

有人建议我使用DOMDocument，但我似乎不能让任何东西工作，主要是因为缺乏知识。

有人能告诉我如何提取这些数据吗？

php

preg-match

domdocument

php-curl

回答 1

Stack Overflow用户

发布于 2015-07-01 23:27:52

发布我之前的评论，作为可能的答案，并解释了为什么我认为这是你可以解决问题的方法：

$dom = new DOMDocument;
$dom->loadHTML($theHtmlString);
//get all profileheadtitle nodes
//they seem to contain the first bits of info you're after
$xpath = new DOMXpath($dom);
$titles = $xpath->query('//*[@class="profileheadtitle"]);
//let's iterate over them, using the `textContent` property to get the value
foreach ($titles as $div)
{
    //each node also has a second div right next to it
    //it's on the same level and we need its content, too
    //enter the DOMNode::$nextSibling property
    echo $div->textContent . ' ' . $div->nextSibling->textContent;
}

任务完成了。一定要检查the DOMNode class docs for details，你可能想要阅读关于the DOMXpath class, too的资料

注意，下面这段代码：$xpath->query('//*[@class="profileheadtitle"]);查询DOM中所有具有profileheadtitle类的节点。如果您希望将节点限制为只包含具有此类的<div>元素，则可以编写以下代码：

$xpath->query('//div[@class="profileheadtitle"]);

同样重要的是，尽管这种xpath表示法很有效，但如果某些(或全部)div有多个类，它就不会工作。它只返回具有一个类的节点。学术上更正确的方法应该是这样写：

$xpath->query(
    '//div/[contains(concat(" ", normalize-space(@class), " "), concat(" ", "profileheadtitle", " "))]'
);

这将能够处理像这样的节点：

和

<div id="bar" class="foo profileheadtitle mark-red" style="border: 1px solid black;"></div>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31165253

复制

相似问题

问PHP Curl to PHP DOMDocument
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PHP Curl to PHP DOMDocumentEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PHP Curl to PHP DOMDocument
EN