You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

The "Merge multiple rows" feature can be used to easily combine data of different rows into ONE single row.

Let's suppose you need to extract an article from a blog. In some cases, you might not be able to select the entire article to extract as there are different paragraphs, but you still want all the paragraphs in one single row instead of having different paragraphs in different rows like this:

This is the perfect time to take advantage of the "Merge multiple rows" feature for combining the extracted data into one single row of data. Let's see how to get this done with an example.

Here we use blog content from https://philipyancey.com/a-view-from-abroad to demonstrate.

There are two steps to merging the rows:

  1. Select the desired data to extract

  2. Merge the extracted data

1. Select the desired data to extract

  • Click on the first paragraph of the article and choose Select all in the Tips panel. A Loop Item will be created to extract every paragraph of the post.

1.png
  • Select Extract text of the selected elements

2.png

2. Merge the extracted data

  • Click on the Extract Data box and go to the Data Preview panel

4.png
  • Click on the More button, and select Merge multiple rows of data into one

5.png

You are all set! Let's run the task and see what the actual exported data looks like. You can see that paragraphs captured in Field 1 are now merged into a single row as one big chunk.

3.png

Tips:

  1. Merge multiple rows of data into one is especially useful for extracting articles from any website. You can extract the article as one whole chunk with no other elements like blank lines, comments, or images.

  2. When the data are conglomerated as one big chunk, you can further use Data reformat tools to add a prefix or suffix, such as "|" and "\" to reformat the data.

  3. If there are multiple fields to extract, you need to set up "Merge multiple rows of data into one" for every field.

  4. This feature can also be used to merge two fields. Use two Extract Data in the workflow, one field in one Extract Data action, then name the fields the same and set the "merge multiple rows" for the fields. As a result, the data scraped in the two fields will be merged into one cell.

Did this answer your question?