Many web applications rely on information from JSON data, either embedded inside a element or as XHR response payloads. When business logic and safety can accommodate it, browser (or even page reuse) can provide dramatic efficiency gains. But, in many cases, a browser can be reused safely, relying on pages to encapsulate tasks. It's a good idea to clear browser state after each run to maintain idempotency when testing, and in web applications that use Puppeteer with Express to perform tasks. Launching browsers is a heavy undertaking. waitForSelector ( "iframe" ) const src = await frame. If the frame source isn't known in advance, you can extract it and strip off the outer document with a goto:Ĭonst frame = await page. In many cases, the frame source URL can be navigated to directly, bypassing the hassle of working with the parent document. The same is often true for automation involving iframes. Skipping the intermediate page speeds up the script, requires less code, and typically improves reliability. While this may make sense for testing, in scraping contexts these steps can often be bypassed by adding a query parameter such as and using page.goto(searchResultURL) directly. Waiting for the second navigation to complete.Typing a search term into an input box.Navigating to a website's landing page.I often see scraping scripts automating a search by: Onward to the antipatterns! Antipatterns to Avoid in Puppeteer for Node.js Underusing page.goto We will assume you are familiar with ES6 JavaScript syntax, promises, the browser DOM, and Node, and have written a few Puppeteer scripts already.Īt the time of writing, the version of Puppeteer used was 20.3.0. While these antipatterns aren't quite full-fledged mistakes, weeding them out of your scripts (or being judicious when employing them) will increase the reliability of your Puppeteer code. There will be no overlap with previous installments, so you may wish to start with those. In this post, we'll add another dozen antipatterns to the list. This article is part of a series, starting with Avoiding Puppeteer Antipatterns and Puppeteer in Node.js: Common Mistakes to Avoid. However, the asynchronous, real-time API leaves plenty of room for gotchas and antipatterns to arise. Puppeteer is a powerful browser automation library for web scraping and integration testing.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |