Feeling trapped in designing workflows on your own for a long time? Still cannot manage to get the data by yourself? Here's the new solution for beginners: Auto-detect webpage.
The auto-detect function is one of the newest innovations of Octoparse version 8. With the function, users can easily start their work by simply clicking a single button. The function has been successfully proved to handle webpages of different designs with listings, tables, infinite scrolls, load more buttons, etc. Now it's the time to introduce this most useful and powerful function to our valuable users.
How to utilize the function
1. Enter the URL on the home page
Enter the example URL "https://www.ebay.com/b/Laptops-Netbooks/175672/bn_1648276" into the search box at the center of the home screen. Click "Start" to create a new task with Advanced Mode.
2. Start the auto-detection
The detection will start after you click on the "Auto-detect web page data" and we can just wait for it to complete.
3. Modify the settings
- Remove unwanted data
Click the icon on the "Data preview" to remove your unwanted data fields.
- Rename your data
Rename data fields by clicking the icon
- Confirm settings on"Tips"
There will be options like "extract list", "paginate", "page scroll" listed on the "Tips" panel,
- Extract the data in the list - This option is selected by default to help scrape the list of data on the page.
- Paginate to scrape more pages - It locates a "Next page" button to help to get data from multiple pages.
- Add a page scroll - This option is to scroll down the page after loading.
You can check/modify/unselect the settings.
a) Check the settings
Click "Check" under "Paginate to scrape more pages" and you will see the button of pagination being highlighted.
b) Modify the settings
Click the "Edit" button under one option to modify the settings.
c) Uncheck the settings
Once you don't need the option, just uncheck the box in front of it
- Click "Create workflow"
After confirming the options, you can choose "Create workflow" to generate the actions
4. More scraping actions
The auto-detection can help to configure the basic workflow with pagination and extract data. If you'd like to click on each link to get more information or click on the "Load More" button, you can select the options on the Tips panel to configure the actions easily.
- Click on a "Load More" button- If there is a load more button existing on the webpage, then you can choose this option, select the load more button on the page, and set click times to let the scraper automatically click the button to load more data for scraping.
- Click on link(s) to scrape the linked pages- If you want to click on the links detected and extract more information from the detail pages, choose this option and select a link you want to click on.
To confirm if the links are the ones you'd like to click through, click "Check" to have the links highlighted on the web page.
5. Add missing data manually
Sometimes there will be some data fields missed by the auto-detecter. You will need to add the data fields manually. Just select the information on the web page and choose "Extract the text of the element"
6. Save settings and start extraction
Click the Save button first to save all the settings you have made, then click Run to run your task either locally or cloudly.