You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

A cryptocurrency is a digital or virtual currency that is secured by cryptography, which makes it nearly impossible to counterfeit or double-spend. Many cryptocurrencies are decentralized networks based on blockchain technology—a distributed ledger enforced by a disparate network of computers.

Cryptocurrency players need to monitor price fluctuations in currencies as prices change in seconds. Octoparse can schedule the scraping to run instantly to help update the information in time.

In this tutorial, we are going to show you how to scrape cryptocurrency info from Yahoo Finance.

For Yahoo Finance, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need to do is type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates

To follow through, you may want to use this URL in the tutorial:

https://finance.yahoo.com/cryptocurrencies?count=50&offset=0


We will scrape data such as the Symbol and Name from the cryptocurrency chart with Octoparse.

1.1.png

Here are the main steps in this tutorial: [Download task file here]

  1. Go to Web Page - to open the targeted web page

  2. Auto-detect web page data - to create the workflow

  3. Extract data - to modify the data fields

  4. Modify XPath of Pagination - to fix endless scraping

  5. Start extraction - to run the task and get data


1. Go to Web Page - to open the targeted web page

  • Enter the page URL on the home screen and click Start to create a new task

2021-09-27_16-30-20.png

2. Auto-detect web page data - to create the workflow

  • Choose Auto-detect web page data and wait for detection to complete

2.png
  • Click Switch auto-detect results on the Tips panel until you see the table information be selected

switch.jpg
  • Uncheck Add a page scroll

  • Click Create workflow

3.png
  • Click on Click to Paginate action

  • Extend the AJAX timeout to 7-10s

  • Click Apply to save

4.png

3. Extract data - to refine the data fields

  • Switch to vertical view

  • Rename fields by double clicking each field name

  • Delete the fields by clicking the

    5.png
_1.gif

TIP: A field name can only include letters, numbers, and "_". Also, it must start with a number.

We need to modify the Xpath for some fields to make the data scraping more precisely.

  • Price: //fin-streamer[@data-field="regularMarketPrice"]

  • Marketcap: //fin-streamer[@data-field="marketCap"]

  • Click the ... -> Customize Xpath

11.png
  • Paste the Xpath provided above and click Apply to save

12.png

4. Modify XPath of Pagination - to fix endless scraping

The auto-generated XPath of Pagination needs to be modified; otherwise, the scraping cannot be stopped. Octoparse will keep scraping the last page. Check out details about this issue here.

  • Click on Pagination

  • Input the new XPath //button[not(@disabled)]//span[text()="Next"]

  • Click Apply to confirm

6.png

5. Start extraction - to run the task and get the data

  • Click Save

  • Click Run on the upper left side

  • Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only). You can also schedule a task to update the data frequently

7.png

You can export the result data in provided formats such as EXCEL, CVS, JSON or in your database.

Here is the sample output.

13.png
Did this answer your question?