What is IFrame?
An IFrame (Inline Frame) is an HTML code embedded inside another HTML document on a website. It is used to insert content from another source, for example, an advertisement or a table. An Inline Frame is specified by the <iframe> tag.
How to scrape from an IFrame with Octoparse?
Octoparse built-in browser can recognize IFrame automatically, so you just need to select the information which is in the IFrame and choose to extract it from the Action Tips like what you do when scraping pages without IFrame.
Octoparse locates the items in IFrame with the combination of IFrame Xpath and Matching XPath. If you need to modify the XPath of a data field, please note that you may need to modify both the IFrame XPath and Matching XPath.(Check how to modify XPath here )
What if Octoparse cannot recognize the IFrame automatically?
If Octoparse does not recognize the items in IFrame, there are two workarounds to scrape the data:
1. We can first extract any items outside the IFrame, and then modify the XPath of the data field.
Note to input both the IFrame XPath and Matching XPath.
2. We can get the IFrame link address from the source code and use IFrame link as the starting URL to build a task.
If you inspect the IFrame item in Chrome, you can see the IFrame tag contains a link:
Right-click the tag in the browser, choose the option "Copy the link address", and you can get the link.
Using the IFrame link to build a task, it is as easy as scraping a normal page without IFrame.
If there are several IFrame links in the source code, please make sure you are coping the one with the data you need.
Can Octoparse scrape IFrame within IFrame?
No, Octoparse can not scrape IFrame within IFrame. However, we can also get the IFrame link address from the source code, and then use that link as the starting URL to build a task.
Artículo en español: Scrape datos de IFrame
También puede leer artículos de web scraping en el website oficial