You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Sometimes we need to scrape the image URL from a website, but all we get is just the URL of a thumbnail picture instead of a normal size picture.

Here is a picture scraped from Amazon. As you can see, the image is too small to see.

mceclip3.png

To get the normal-size images, all we need to do is to modify the image URL that we already have, following the steps below:

  • Observe the difference between the full image URL and thumbnail URL

In most cases, the URLs of different sizes only have a slight difference. We need to find out the difference and use the Octoparse Clean Data function to reformat the thumbnail URL into a full URL.

For example, the thumbnail on amazon is like this

https://images-na.ssl-images-amazon.com/images/I/51Icrvma7ZL._SR38,50_.jpg

The full image URL is

https://images-na.ssl-images-amazon.com/images/I/51Icrvma7ZL.__.jpg

You can see the thumbnail has 'SR38,50' in its URL. We just need to delete it from the URL.

  • Click on More(...) button and click Clean data

1.png
  • Add a step as Replace

2.png
  • Type in SR38,50 into the Replace box and click Evaluate to check

You will find the result is the URL you need, then click "Confirm" to save

3.png
  • Click Apply to save the setting

4.png

Then you can get the full image URL you need.

5.png
Did this answer your question?