In this tutorial, we are going to show you how to scrape news from Digital Journal.com.
To follow through, you might want to use the URL in this tutorial:
Here are the main steps in this tutorial: [Download demo task file here]
- "Go To Web Page" - open the target web page
- Create a pagination loop - scrape all the results from multiple pages
- Create a "Loop Item" - loop click into each item on each list
- Extract data - select the data for extraction
- Start extraction - run the task and get data
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
- Scroll down to the bottom of the page, click the "Next" button
- Click "Loop click the selected link" on the "Action Tips" panel
- Set up an AJAX timeout for 5s (optional according to your local network condition)
- Click "OK" to save
AJAX timeout can often be used as a web page timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached. Check this video if you want to know more about AJAX.
- Click on any product titles on the page
After clicking on the first and the second title consecutively, Octoparse will realize what we want to do and highlight all the titles in green here.
- Click "Loop click each element"
When a Loop click action is added, Octoparse will click through each link captured in the Loop Item, and open the product detail page one by one.
After you click "Loop click each element", Octoparse will open the detail page of the first title.
- Click on the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips" panel
- Rename the fields by selecting from the predefined list or inputting on your own
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here's the data we extracted.
Happy data hunting!
Was this article helpful? Contact us at any time if you need our help!