(317) 456-C2IT (2248)
Toll Free: (866) 217-7478
info@c2itconsulting.net
skip to the main content area of this page


Web Page Scraping

One of the ways we've been gathering data for years has been called "screen scraping." You log onto a program, find certain text in certain portions of the page, and "scrape" it off for your own application's needs.

This has become a whole new animal with the explosion of the Internet. So much more information is available now, but it's constantly changing as well! We are able to use RSS, blogs, XML, Web Services and other types of feeds to obtain much of the information you may need on your site, but every once in a while we still have to get out there and scrape. The examples below demonstrate how this works.

Original Web Page

The frame below shows you a "stats" page for our website. Notice the table about halfway down that shows the percentage of hits from each country.

A bit hard to read, isn't it? Imagine all you really wanted to see was this chart. Normally you'd have to display the page in an IFRAME HTML element (like this), use a FRAMESET, or send the user off-site to view the page. Who wants that, when you could deliver THIS to them:




  Num Perc. Country Name    
 46292.40%United StatesUnited States
 132.60%CanadaCanada
 40.80%United KingdomUnited Kingdom
 40.80%IndiaIndia
 30.60%DenmarkDenmark
 20.40%Russian FederationRussian Federation
 20.40% 
 10.20%SwedenSweden
 10.20%South AfricaSouth Africa
 10.20%AustraliaAustralia
 10.20%GermanyGermany
 10.20%FranceFrance
 10.20%ItalyItaly
 10.20%Saudi ArabiaSaudi Arabia
 10.20%IndonesiaIndonesia
 10.20%PakistanPakistan
 10.20%UgandaUganda

Conclusion

So what would you rather deliver on your site? A link or an IFRAME displaying someone else's site, or would you rather scrape the information you want from their site and make it your own? We aren't advocating plagiarism or stealing copywritten information, and we encourage you to give credit where credit is due, but isn't this the way to go?

We can build a site for you that contains pages with this type of functionality, or we can provide you with development tools and controls to do it on your own!

Our scraper offers the following features

Another example of scraping functionality, this time completely removing images from the scraped website.



  Num Perc. Country Name    
46292.40%United States
132.60%Canada
40.80%United Kingdom
40.80%India
30.60%Denmark
20.40%Russian Federation
20.40% 
10.20%Sweden
10.20%South Africa
10.20%Australia
10.20%Germany
10.20%France
10.20%Italy
10.20%Saudi Arabia
10.20%Indonesia
10.20%Pakistan
10.20%Uganda

Bible Reading Scraper

One of our most recent projects involved this scraping technology. We used a database of Bible passages, along with BibleGateway.com's search engine, to create a smashup page of daily readings to read through the entire Bible in a year. You can check it out here.