How to scrape the full image URLs instead of the thumbnails?
FollowSometimes we need to scrape the image URL from a website, but all we get is just the URL of a thumbnail picture instead of a normal size picture.
Here is a picture scraped from Amazon. As you can see, the picture is too small to see.
To get normal size images, all we should do is to modify the image URL that we already have with the following steps:
1. Observe the difference between full image URL and thumbnail URL.
In most cases, the URLs of different sizes only have a slight difference. What we need to do is to find out the difference and use Octoparse "Refine extracted data" function to reformat the thumbnail URL into full URL.
For example, the thumbnail in amazon is like this
https://images-na.ssl-images-amazon.com/images/I/51Icrvma7ZL._SR38,50_.jpg
And the full image URL is
https://images-na.ssl-images-amazon.com/images/I/51Icrvma7ZL.__.jpg
You can see thumbnail has 'SR38,50' in its URL. So we just need to delete that in the URL.
2. Select the data field with image URL and click "Customize data field"
3. Click "Refine extracted data"
4. Click "Add step" and then click "Replace"
5. Enter what's between "._" and "_." into the "Replace" Box.
For this example, the URL is 'https://images-na.ssl-images-amazon.com/images/I/51Icrvma7ZL._SR38,50_.jpg'. Type in SR38,50 into the Replace box and click "ok" to save.
Then you can get the full image URLs you need.
Artículo en español: ¿Cómo scrape las URL de imágenes completas en lugar de las miniaturas?
También puede leer artículos de web scraping en el sitio web oficial
Author: Eric
Editor: Kara