Skip to content

Scrape From Website

To extract data from websites, such as BBC, Yahoo or Wikipedia, you can utilize our web scraping functionality. This feature enables you to scrape HTML data from websites and convert it into structured datasets.

unstructured

unstructured

To scrape data, you can either input the URL manually, or upload a CSV with the URLs you would like to scrape.

Input Manually

unstructured

If you input the websites manually, be sure to add the https:// in front of the website (this automatically happens if you copy from the chrome browser).

From CSV

If you upload from a CSV, you just need to select the column with the URLs of websites you would like to scrape.

unstructured

unstructured

unstructured unstructured

Once finished, the content of the websites should appear on the customize tab. If the per line decomposer is selected, you will see each line of the website parsed.

unstructured

If the per document is selected that will grab the entire contents of each URL. All of the text in the url will be present upon annotation.

unstructured