Scraping news from Digital Journal.com
FollowYou are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Digital Journal is a website that provides world news including but not only tech & science, social media, business...
In this tutorial, we are going to show you how to scrape search results from Digital Journal.com both the listing page and detailed page.
To follow through, you may want to use the URL in this tutorial:
https://www.digitaljournal.com/?s=covid
Note: Task otd file has been attached at the bottom of this tutorial, you can import it into Octoparse for further checking or direct use.
Here are the main steps in this tutorial:
- Go To Web Page - open the target web page
- Start auto-detection - to generate a workflow
- Click on link(s) to scrape the linked pages - loop click into each item on each list
- Extract Data - select the data to scrape
- Run the task - to get the desired data
1. Go to Web Page - open the target web page
- Enter the URL on the home page and click Start
2. Start auto-detection - to generate a workflow
Octoparse's internal auto-detect function can help to automatically generate a workflow quickly. Further modifications can be made based on this.
- Click on Auto-detect web page data and wait for the detection to complete
- Check the data fields in Data Preview and delete unwanted data or rename them if needed
- Untick Add a page scroll
- Click Create workflow
3. Click on link(s) to scrape the linked pages - loop click into each item on each list
- Click "Click on link(s) to scrape the linked pages" in the tips panel
- Choose Title_URL in the drop down box under Click on an extracted data field
- Click Confirm
4. Extract Data - select the data to scrape
- Click on the wanted data
- After all the chosen data turn green, Click > Extract data in the tips box
The final workflow will look like:
5. Run the task - to get the desired data
- Click the Save button first to save all the settings you have made
- Then click Run to run your task either locally or cloudly
- Select Run on your device and click Run Now to run the task on your local device
- Waiting for the task to complete
Below is a sample data run from the local. Excel, CSV, HTML, and JSON formats are available for export.
If you have further issues with the task or have any suggestions, we’d love to hear about them. Submit a request here.
Is this article helpful? Contact us at any time if you need our help!
Writer: Emma
Editor: Yina