You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Quora is a place to gain and share knowledge. It's a platform to ask questions and connect with people who contribute unique insights and quality answers. People here share their brilliant ideas from different countries and different careers.

This tutorial will show you how to scrape answers from Quora with question URLs. If you have no question URLs at hand, you can follow this tutorial first: Scrape questions from Quora

info.png

To follow through the tutorial, you may want to use the URL below:

https://www.quora.com/What-are-some-of-the-best-web-data-scraping-tools

Here are the main steps in this tutorial: [Download task file here]

  1. Enter the URL on the home page - to open the target website

  2. Set up a Page scroll - to load more data

  3. Create a Loop - to capture the list of answers from the webpage

  4. Set up a Branch - to extend the whole content of the answer

  5. Create an Extract Data Step - to extract data you need

  6. Modify the Xpath - to locate data accurately

  7. Run the task - to get the target data


1. Enter the URL on the home page - to open the target website

To start our scrape journey, the target website needs to be input first.

  • Enter the search URLinto the search box at the center of the home screen. Click Start to create a new task with Advanced Mode

url.png

2. Set up a Page Scroll - to load more data

  • Click on "+" under Go to Web Page Step

  • Click on Loop

QURO.png
  • Click Loop Item Iframe

  • Choose Loop Mode >>Scroll Page

  • Tick >> for one screen

  • Repeats >>100 times

  • Click Apply

quro2.png

Note: More knowledge about Page Scroll settings, please check this article: Set up a page scroll


3. Create a Loop - to capture the list of answers from the webpage

  • Click on "+" to add a step inside the scroll page loop

  • Click Loop

QURO3.png
  • Select Loop Mode as Variable List

  • Put the XPath in Matching XPath: //div[contains(@class,'question_answer_item')]

  • Click Apply to apply the settings

Loop_XPath.png

4. Set up a Branch - to extend the whole content of the answer

Some answers would be folded when it is too long, so we need to click "Continue Reading" on the page to extend the whole answer. While some may not need to be extended. So here we set a branch to let Octoparse judge whether we need to click the "Continue Reading" or not.

  • Click on "+" button inside Loop Item to set a Branch in the workflow

  • Click Branch Conditions

branch.png
  • Choose the left branch box

  • Tick Execute if the current Loop contains a specific element

  • Put Xpath in the Matching XPath box as: //div[contains(text(),'Continue Reading')]

  • Click Apply to apply the settings

branch_set.png
  • Click "+" in the left branch to add a Click step inside

CLICK.png
  • Click on the Click Item

  • Choose Relative XPath to the Loop Item

  • Set up the XPath for the Click Item as //div[contains(text(),'Continue Reading')]

Click_set.png
  • Click on Options

  • Set up AJAX Load as 5s

AJAX.png

The whole branch setting is mean to execute the click procedure if there's "Continue Reading" button.

NOTE: For More Branch setting details, please check this article: Branch Conditions


5. Create an Extract Data Step- to extract data you need

After the branch has been set up, we need to add a data extract step for final extraction. Also, make sure the step is included in the loop.

  • Click "+" under the Branch box

  • Click Extract Data

EXTRACT_DATA.png
  • Select the data field you need

  • Double-click the data fields to rename them if needed

DETAIL.png
  • Click Extract data in the Tips

EXTRACT_DATA.png

6. Modify the Xpath - to locate data accurately

To locate the data we want accurately, the XPath for the fields needs to be modified.

  • Switch Data Preview to Vertical View

Vertival_view.png
  • Put Below Xpath for each field in Field Settings:

user: //div[@class="q-inlineFlex qu-alignItems--center qu-wordBreak--break-word"]/span

career: //div[contains(@class,"truncateLines")]/span[1]/span[2]

date: //a[contains(@class,'answer_timestamp')]

views: //div[contains(@class,'gray_light')]//span[contains(string(),'views')]

answer: //span[@class="q-box qu-userSelect--text"]

XPath.png

The final workflow will look as below:

QURO4.png

7. Run the task - to get the target data

  • Click the Save button first to save all the settings you have made

  • Then click Run to run your task either locally or cloudly

  • Select Run on your device and click Run Now to run the task on your local device

  • Wait for the task to complete

Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.

DATA_EXTRACTED.png
Did this answer your question?