问Network.requestWillBeSent不考虑Javascript setTimeout重定向
EN

Stack Overflow用户

提问于 2021-08-10 01:26:10

回答 1查看 190关注 0票数 0

我在我的Node应用程序中使用Puppeteer来获取重定向链中的URL，例如:从一个URL转到另一个URL。到目前为止，我一直在创建ngrok，它使用简单的PHP头函数来重定向带有301和302请求的用户，而我的起始URL是在几秒钟后重定向到ngrok URL之一的页面。

但是，如果Network.requestWillBeSent遇到一个使用Javascript重定向的页面，它就会退出，我也需要它等待并获取这些页面。

URL的示例旅程：

启动-> https://example.com/ <-- setTimeout并重定向到ngrok
ngrok url使用PHP用301重定向
其他一些使用JS setTimeout重定向到的ngrok，例如，另一个https://example.com/
完成-> https://example.com/

在这种情况下，Network.requestWillBeSent取1和2，但在3上完成，因此不能达到4。

因此，我只得到两个URL，而不是控制台记录所有四个URL。

很难创建一个复制，因为我不能设置所有ngrok等等，但是下面是一个码箱链接和一个Github链接，下面是我的代码：

const dayjs = require('dayjs');
const AdvancedFormat = require('dayjs/plugin/advancedFormat');
dayjs.extend(AdvancedFormat);

const puppeteer = require('puppeteer');

async function runEmulation () {

  const goToUrl = 'https://example.com/';

  // vars
  const journey = [];
  let hopDataToReturn;

  // initiate a Puppeteer instance with options and launch
  const browser = await puppeteer.launch({
    headless: false
  });

  // launch a new page
  const page = await browser.newPage();

  // initiate a new CDP session
  const client = await page.target().createCDPSession();
  await client.send('Network.enable');
  await client.on('Network.requestWillBeSent', async (e) => {

    // if not a document, skip
    if (e.type !== 'Document') return;

    console.log(`adding URL to journey: ${e.documentURL}`)

    // the journey
    journey.push({
      url: e.documentURL,
      type: e.redirectResponse ? e.redirectResponse.status : 'JS Redirection',
      duration_in_ms: 0,
      duration_in_sec: 0,
      loaded_at: dayjs().valueOf()
    });
  });

  await page.goto(goToUrl);
  await page.waitForNavigation();
  await browser.close();

  console.log('=== JOURNEY ===')
  console.log(journey)
}

// init
runEmulation()

我在Network.requestWillBeSent中遗漏了什么，或者我需要添加什么才能在中间找到几秒钟后使用JS重定向到另一个站点的网站。

javascript

node.js

puppeteer

腾讯混元大模型产品特惠

由腾讯公司全链路自研，在高质量内容创作、数理逻辑、代码生成和多轮对话上性能表现卓越，业界领先水平。新用户19元起！

回答 1

Stack Overflow用户

发布于 2021-08-10 02:19:50

由于client.on("Network.requestWillBeSent")接受回调函数，因此不能在此使用await。await只对返回承诺的方法有效。每个async函数都会返回一个承诺。

当您需要等待回调函数完成执行时，您可以将代码放入回调函数中，如

client.on('Network.requestWillBeSent', async (e) => {

    // if not a document, skip
    if (e.type !== 'Document') return;

    console.log(`adding URL to journey: ${e.documentURL}`)

    // the journey
    journey.push({
      url: e.documentURL,
      type: e.redirectResponse ? e.redirectResponse.status : 'JS Redirection',
      duration_in_ms: 0,
      duration_in_sec: 0,
      loaded_at: dayjs().valueOf()
    });

    await page.goto(goToUrl);
    await page.waitForNavigation();
    await browser.close();
  
    console.log('=== JOURNEY ===')
    console.log(journey)

});