Octoparse supports scraping data from over 98% of websites, including those with AJAX, JavaScript, and other dynamic elements. It also makes it easy to handle forms, drop-down lists, infinite scrolling, and much more.
Generally, if information on a website can be copied and pasted, it can be scraped with Octoparse. More specifically, as long as the target data exists in the website’s HTML source code (even if it isn’t directly visible on the page), Octoparse can extract it.
1. Elements visible on the webpage:
Text
Image URL
Links (URLs)
Inner/Outer HTML code
Attribute Value
For more information, please check out here: Extract attributes of a web element (text, URL, HTML, etc)
2. Any information hidden in the source code, such as:
Page URL
Page Title
Metadata
HTML source code
Current Time
Check out for more details:
3. What types of websites can't Octoparse scrape?
Currently, Octoparse is not capable of scraping data from:
XML Sitemap
PDF file
If you find it time-consuming to scrape data from complex websites or just want to concentrate on running your business to its full potential, please feel free to reach out to us for our Data Service.