What is IFrame?
An IFrame (Inline Frame) is an HTML document that is embedded inside another HTML document on a website. It allows you to include a piece of content from external sources. Essentially, it is a window on your web page looking at another piece of online content.
Codewise, every Iframe includes the <iframe> HTML tag as well as a source attribute src which indicates the location of the content you want to embed.
While Iframes are helpful in maintaining separation between a site and external content, they have become roadblocks for web scrapers.
How to scrape from an IFrame with Octoparse?
Octoparse's built-in browser detects IFrame automatically, so all you have to do is to select the element in the IFrame and extract it normally - as if there were nothing called IFrame on the Internet!
When you extracted data within an iFrame, check its auto-generated element XPath to see if Octoparse has already detected it.
However, do note that Octoparse locates the elements in IFrames with the combination of IFrame XPath and Matching XPath. If the auto-generated XPath is not accurate, you'll have to rewrite both XPath expressions.
What if Octoparse does not recognize the IFrame automatically?
Don't panic - there are two workarounds for this kind of situation.
Extract any page element as a data field placeholder, and rewrite its XPath to locate the IFrame element instead. Remember to input both the IFrame XPath and Matching XPath when modifying the XPath.
Get the IFrame link address from the source code and use it as the starting URL for a new task.
Press F12, or Ctrl + Shift + I to open the Developer Tools in Chrome and locate the source code of the IFrame element. If there are multiple IFrame links in the source code, make sure you are dealing with the one with the data you need.
Right-click the iframe src tag and copy the link address to get the URL.
Use the URL to build a task, and it is as easy as scraping a normal page without IFrame.
Can Octoparse scrape from IFrame within IFrame?
No, Octoparse cannot scrape from IFrame with IFrame. However, you can still get the IFrame link from the source code in a browser first and then use it as the starting URL to build a new task.