<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SmokeDoc Web Screen Scraping Tool</title>
	<atom:link href="http://smokedoc.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://smokedoc.org</link>
	<description>Simple tool for parsing all type text-based documents</description>
	<lastBuildDate>Thu, 26 May 2011 13:10:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>What is web scraping?</title>
		<link>http://smokedoc.org/news/en/what-is-web-scraping/</link>
		<comments>http://smokedoc.org/news/en/what-is-web-scraping/#comments</comments>
		<pubDate>Thu, 26 May 2011 13:10:42 +0000</pubDate>
		<dc:creator>rebbort</dc:creator>
				<category><![CDATA[English]]></category>

		<guid isPermaLink="false">http://smokedoc.org/?p=1297</guid>
		<description><![CDATA[It is the process of extracting structured information from unstructured or semi-structured web data sources. Web Extraction also referred as Web Data Mining or Web Scraping. Web Scrapping/Extraction is done by creating programme or script written in any programming language that processes the unstructured or semi-structured html web pages of a target web site or [...]]]></description>
			<content:encoded><![CDATA[<p>It is the process of extracting structured information from  unstructured or semi-structured web data sources. Web Extraction also  referred as Web Data Mining or Web Scraping.</p>
<p>Web Scrapping/Extraction is done by creating programme or script written in any programming language that processes the unstructured or semi-structured html web pages of a target web site or another web text based documents to extract information or data for converting unstructured data into structured format. With help of web extraction you can connect to a website&#8217;s web pages and request  information or a pages, exactly as your browser would do. The web server  will send back the html web page which you can then extract specific  information from that web page.</p>
<p>Web data mining is also known as  web content mining, web text mining, because the content or text is the  most widely researched area in world of internet. Extracting data from  html web pages is an instance of web data mining. Web data mining tasks  are categorized into three main types: web content mining, web structure  mining, and web usage mining.</p>
]]></content:encoded>
			<wfw:commentRss>http://smokedoc.org/news/en/what-is-web-scraping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
