downloading pages according to keyword's precense

Comments

3 comments

  • Official comment
    Kara

    Hi user357,

    Thank you for reaching out. 

    Could you please send over the URL you want to scrape first? And what are the keywords that you want to search? Also, using a screenshot to show us where your desired data locates on the webpage would be super helpful. Then we can evaluate the feasibility of it.

    Looking forward to your reply.

    Best regards,

    Comment actions Permalink
  • user357

    hi kara, thank you so much for answering, it means a lot to me. there are a lot of URL i want to scrap, but for a example, lets take this one. https://elretratodehoy.com.ar/ its a digital newspaper. the keywords i want to look for are "consumo de alcohol" and "intoxicacion". as for what data i want to extract, there are 2 ways i can go. first way would be the scraper finding and downloading all pages that present the keywords. in this case the data i want is the whole page, images included (for example, https://elretratodehoy.com.ar/2018/12/25/diez-personas-debieron-ser-asistidas-intoxicadas-por-consumo-de-alcohol/ downloading this whole page, i guess in pdf or png or any other mainstream format). the second way i can go is the scraper only taking the URL of the news where this keywords appear. in this case, the URL of those specific news where the keywords appear would be the only desired information. i think option 2 is probably more feasible and easy, but i am ok with either.

    thanks for your help!

    0
    Comment actions Permalink
  • Kara

    Hi user357,

    Thank you for the detailed explanation.

    You can try it out by adding branch judgement to the workflow, to learn how, please check:

    Condition-based scraping using branch judgement

    If you want to get the URL of those pages under the "Yes" branch, just click "Add predefined field" as the image down below shows will do:

    As for option2, I'm afraid it's hard to define where to start the scraping and where to end it for the system since it's not some fixed value.

    Best regards,

    0
    Comment actions Permalink

Please sign in to leave a comment.