You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

A cryptocurrency is a digital currency designed to work as a medium of exchange through a computer network that is not reliant on any central authority to uphold or maintain it.

This tutorial will show you how to scrape cryptocurrency prices from CoinGecko, a platform to help cryptocurrency players monitor the crypto market.

example.jpg

To follow through with the tutorial, you may want to use the URL below:

https://www.coingecko.com/

Here are the main steps of this tutorial: [Download task file here]

  1. Create a Go to Web Page - to open the target webpage

  2. Auto-detect the webpage - to create a workflow

  3. Modify the XPath of Pagination - to fix endless scraping

  4. Run the task - to get your desired data

1. Create a Go to Web Page - to open the target webpage

  • Enter the target URL into the search bar on the home screen and click Start.

go_to_webpage.jpg

2. Auto-detect the webpage - to create a workflow

Octoparse's auto-detection function can help you quickly create a workflow according to the target website's design.

  • Click Auto-detect web page data in Tips and wait for the detection to complete

auto_detect.jpg
  • Check the data fields in Data preview and delete unwanted fields or rename them if needed

refine_field.jpg

Tip: When Octoparse goes on to detect the data on any web page, it screens the whole page and fetches one or more sets of data using its machine learning algorithm. If you don't see your target data being detected on the first attempt, you can switch to the second data set by clicking on Switch auto-detect results.

switch.jpg
  • Uncheck Add a page scroll

  • Click Create workflow

create_webpage.jpg

You will see a workflow created like the one below:

workflow_crypto.jpg

3. Modify the Xpath of Pagination - to fix endless scraping

The auto-generated XPath of Pagination needs to be modified; otherwise, Octoparse will keep clicking the Next button on the last page, and the scraping cannot be stopped.

  • Click Pagination to open its settings

  • Input the Matching XPath as: //a[contains(text(), 'Next') and not (@href='#')]

  • Click Apply to save the change

modify_xpath.jpg

4. Run the task - to get your desired data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

mceclip0.png

Here is a sample output from a local run:

coin_data.jpg
Did this answer your question?