Scrape restaurant info from Grubhub
FollowIn this tutorial, we are going to show you how to scrape restaurant information from Grubhub.
We will scrape data such as the title, address, and phone from each restaurant details page with Octoparse.
Here are the main steps in the tutorial:[Download demo task file here ]
- "Go To Web Page" - to open the targeted web page
- Create a pagination loop - to scrape all the results from multiple pages
- Create a "Loop Item" - to loop click into each restaurant on every page
- Extract data - to select data you need to scrape
- Save and start extraction - to run your task and get data
1. "Go To Web Page" - to open the targeted web page
- Create the task with "Advanced Mode"
- Paste the URL into the "Website" box
- Click "Save URL" to move on
- Scroll down and click the ">>" button on the web page
- Click "Loop click single element" on "Action Tips"
- Uncheck "Auto-Retry"
- Check "AJAX Load" and set up "AJAX Timeout" as 7 seconds
Set the timeout long enough for the page to load according to your network condition.
- Click "Save"
Tips! AJAX timeout can often be used as a web page timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached. If you want to learn more about AJAX, you can enjoy the video tutorial here |


- Click "Go To Web Page" in the workflow.
- Select the pagination loop in the workflow
- Click the first restaurant item, Octoparse will automatically identify the similar URLs on the page
- Click "Select All" on the "Action Tips"
- Select "Loop click each element"
- Uncheck "Auto-Retry"
- Uncheck “open the link in new tab"
- Check "AJAX Load" and set up "AJAX Timeout" as 7s (optional)
- Click on "Loop Item" and set up some wait time as 10s to ensure the web page loads completely (optional)
- Click "Save"


- Select the data you need on the item page to scrape, such as Name of the restaurant, Address, Opening hours, phone number...
- Select "Extract text of the selected element" and rename the "Field name" column if necessary.
- Click "OK" to save the result
- Drop a "Click Item" action into the workflow designer
- Click "Customize" and "Customize XPath"
- Set the XPath "//BUTTON[contains(@class,'returnToSearch')]" to locate the "<" (return to the list page button)
- Uncheck "Auto retry when no response"
- Check "Load the page with AJAX" and sets Time out
- Click "Save" to move on
Tips! |

- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)




Artículo en español: Scrape los resultados de búsqueda de Google Scholar
También puede leer artículos de web scraping en el website oficial