Having data auto-detected is cool, but no algorithm is perfect, there will be occasions when the data you need is not accurately detected. In this lesson, we'll go over some easy fixes you can apply to optimize your scraping task.
1. If the data you need is not getting detected
When Octoparse goes on to detect the data on any web page, it screens the whole page and fetches one or more sets of data using its machine learning algorithm. If you don't see your target data being detected on the first attempt, you can switch to the second set of data by clicking on "Switch auto-detect results". The fraction here means Octoparse has detected 3 sets of data and you are looking at the first one.
2. If the auto-detected Next Page button is not right
If the auto-detection fails to locate the Next button correctly, you can easily fix it by clicking on "Edit", then follow the instructions on "Tips" to re-select the correct Next Page button.
3. If you need to scroll down the page more in order to load more data
Whenever a web page is detected with an infinitive scroll, Octoparse automatically specifies the number of times to scroll down the page. If you prefer to scroll more before capturing the data, you can easily adjust the number of scroll times by clicking on "Edit", then complete the settings.
4. Working with the workflow directly
When you build a scraping task in Octoparse, it simulates real human browsing actions, such as opening a web page and clicking on a page element/button to extract data automatically. The whole extraction process is defined automatically in a workflow with each individual step/action representing a particular instruction in the scraping task.
Though Octoparse tries to make things easier for you by auto-generating the workflow through auto-detection, you can technically build the workflow from scratch or edit the auto-generated workflow to ensure the task does what you need it to do.
There are many different types of actions you can add to the workflow. Each step/action has various settings that you can modify to fine-tune your scraping task.
1. Rearrange steps of the workflow by dragging and dropping to the right spot.
2. Hover over and check the settings of the specific step.
3. Modify action settings by clicking on the setting icon.
4. To add an extra step to the workflow, place your mouse at where you'd like to insert the step. Wait until you see the sign show up, click on it and select the action you'd like to add.
5. Rename, copy, or delete a step by clicking the shown more button.
If you'd like to further optimize your scraping task, see more task-building techniques here.
Artículo en español: Lección 2: Optimiza tu tarea
También puede leer artículos de web scraping en el sitio web oficial