Sunday, May 3, 2020

Web Scraping

From the evolution of WWW, the scenario of internet user and data exchange is fastly changes. As common people join the internet and start to use it, lots of new techniques are promoted to boost up the network. At the same time, to enhance computers and network facility new technologies were introduces which results into automatically decreasing in cost of hardware and website’s related costs. Due to all these changes, large number of users are joined and use the internet facilities. Daily use of internet cose in to a tremendous data is available on internet. Business, academician,  researchers all are share their advertisements, information on internet so that they can be connected to people fastly and easily. 

As a result of exchange, share and store data on internet, a new problem is arise that how to handle such data overload and how the user will get or access the best information in least efforts. To solve this issues, researcher spotout new technique called Web Scraping. Web scraping is very imperative technique which is used to generate structured data on the basis of available unstructured data on the web.
https://www.edureka.co/blog/web-scraping-with-python/ :image source


Scaping generated structured data then stored in central database and analyze in spreadsheets. Traditional copy-and-paste, Text grapping and regular expression matching, HTTP programming, HTML parsing, DOM parsing, Webscraping software, Vertical aggregation platforms, Semantic annotation recognizing and Computer vision web-page analyzers are some of the common techniques used for data scraping. Previously most user uses the common copy-pest technique for gathering and analyzing data on the internet, but it is a tedious technique where lot of data copied by the user and store on computer files. 

As compared to this technique web scraping software is easiest scraping technique. Now a days, there are lots of software are available in the market for web scraping. Our paper is focused on the overview on the information extraction technique i.e. web scraping, different techniques of web scraping and some of the recent tools used for a web scraping. 

Keywords- Web mining, information extraction, web scraping


What is web scraping?


If you’ve ever copy and pasted information from a website, you’ve performed the same function as any web scraper, only on a microscopic, manual scale.

Web scraping, also known as web data extraction, is the process of retrieving or “scraping” data from a website. Unlike the mundane, mind-numbing process of manually extracting data, web scraping uses intelligent automation to retrieve hundreds, millions, or even billions of data points from the internet’s seemingly endless frontier.

More than a modern convenience, the true power of web scraping lies in its ability to build and power some of the world’s most revolutionary business applications. ‘Transformative’ doesn’t even begin to describe the way some companies use web scraped data to enhance their operations, informing executive decisions all the way down to individual customer service experiences. 




Why Web Scraping?

Web scraping is used to collect large information from websites. But why does someone have to collect such large data from websites? To know about this, let’s look at the applications of web scraping:

Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.

Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.

Social Media Scraping: Web scraping is used to collect data from Social Media websites such as Twitter to find out what’s trending.

Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.

Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.


How does Web Scraping work?

When you run the code for web scraping, a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it.

To extract data using web scraping with python, you need to follow these basic steps:

  • Find the URL that you want to scrape
  • Inspecting the Page
  • Find the data you want to extract
  • Write the code
  • Run the code and extract the data
  • Store the data in the required format 

What are the different scraping technique


Manual scraping
  •     Copy-pasting

Automated Scraping
  •     HTML Parsing
  •     DOM Parsing
  •     Vertical Aggregation
  •     XPath
  •     Google Sheets
  •     Text Pattern Matching


Sources / References:

http://www.ijfrcsce.org/download/browse/Volume_4/April_18_Volume_4_Issue_4/1524638955_25-04-2018.pdf

https://scrapinghub.com/what-is-web-scraping

https://www.edureka.co/blog/web-scraping-with-python/

Web Scraping PPThttps://www.slideshare.net/SelectoCompany/web-scraping-76200621








No comments:

Post a Comment