You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Groupon is a website that provides professional personal services including classes, photography, local services and so on.

In this tutorial, we are going to show you how to scrape information about photography services from Groupon.com.

DADA.png

To follow through, you may want to use this URL in the tutorial:

https://www.groupon.com/browse/chicago?category=personal-services&category2=photography

Here are the main steps in this tutorial: [Download task file here]

  1. Go to Web Page - open the target web page

  2. Click "X" - close the ad

  3. Start Auto-detect - generate a workflow

  4. Click on the link to go to thte detail page

  5. Extract Data - select the data to scrape

  6. Rewrite the Xpath - to locate the element accurately.

  7. Run the task - to get the target data


1. Go To Web Page - open the target web page

  • Enter the URL on the home page and click Start

start.png

2. Click "x" - close the ad

  • Click "x" on the upper right corner of the ad

  • Click "Click element" on the tips panel

CLOSE.png

3. Start Auto-detect - generate a workflow

  • Click Auto-detect web page data on the tips panel

  • Wait for the detection to complete

AUTODETACT.png
  • Double click to rename the data file or delete unwanted fields

RENAME_FILE.png
  • Untick Add a page scroll

  • Click Create workflow

GENERATE.png
  • Choose Click on link(s) to scrape the linked page(s)

Click_on_link.png
  • Select the Title URL field

  • Click on Confirm

confirm_link.png

4. Extract Data - select the data to scrape

  • Click on the wanted data

  • After all the chosen data turn green, Click > Extract data in the tips box

DATA.png
  • Edit the field name by double clicking it

mceclip0.png

The final workflow will look like this:

mceclip0.png

5. Rewrite the XPath - to locate the element accurately.

To locate target data accurately and avoid missing data, the XPath for start and rating needs to be modified.

Change the data preview into a vertical view and input below XPaths:

  • //span[@id="numerical-rating"] >> star field

  • //span[@class="star-rating-text"]>> rating field

field.png

6. Run the task - to get the target data

  • Click the Save button first to save all the settings you have made

  • Then click Run to run your task either locally or cloudly

mceclip8.png
  • Select Run on your device and click Run Now to run the task on your local device

  • Waiting for the task to complete

Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.

data.png
Did this answer your question?