Skip to main content

What types of websites/data can Octoparse scrape?

Updated over 2 weeks ago

Octoparse supports scraping data from over 98% of websites, including those with AJAX, JavaScript, and other dynamic elements. It also makes it easy to handle forms, drop-down lists, infinite scrolling, and much more.

Generally, if information on a website can be copied and pasted, it can be scraped with Octoparse. More specifically, as long as the target data exists in the website’s HTML source code (even if it isn’t directly visible on the page), Octoparse can extract it.


1. Elements visible on the webpage:

  • Text

  • Image URL

  • Links (URLs)

  • Inner/Outer HTML code

  • Attribute Value

For more information, please check out here: Extract attributes of a web element (text, URL, HTML, etc)


2. Any information hidden in the source code, such as:

  • Page URL

  • Page Title

  • Metadata

  • HTML source code

  • Current Time

Check out for more details:


3. What types of websites can't Octoparse scrape?

Currently, Octoparse is not capable of scraping data from:

  • XML Sitemap

  • PDF file


If you find it time-consuming to scrape data from complex websites or just want to concentrate on running your business to its full potential, please feel free to reach out to us for our Data Service.

Did this answer your question?