Overview
SmokeDoc is a powerful web scraping tool that extracts data from the web pages and other kind of text based documents (XML, CSV, SQL, TXT, HTML, XHTML, etc.). SmokeDoc enables users to automate the whole process of extracting and storing information from the web sites. You can capture large quantities of bad-structured data in minutes at any time in any place and save results in any format. Our customers use SmokeDoc to collect and analyze the wide range of data that exists on the Internet related to their industry.

Here you will find the most significant features of our Web Content Scraper.
- Our web extractor has a Visual Script Generator, which detects the patterns for exporting data to your applications. Just one mouse click is needed on the data area you want to collect, without any coding requirements.
- Our web screen scraper has its own scripting language for data extraction, which allows to harvest data even from those web-page fields which are hard to reach, on which you can’t click.
- Our web content extractor can crawl through entire web sites automatically and harvest the whole content structures like catalogues or search results.
- Just as you move through a website using internet browser, our web content scraper can do it itself, with AJAX fully supported.
- The SmokeDoc software possesses a speedy multi-threaded data harvester for web-sites with non-AJAX-based data collection.
- You can repeat forms submission for all probable combinations of input values in drop-down boxes, or provide a list of input values manually.
- You can use characteristic data from a database, such as form input values or URLs that should be visited.
- Using our web content extractor you can collect website information from the majority of framesets and iframes.
- You can gather website data even from sites that need authorization.
- Our software can re-extract only those data which have been updated.
- You can schedule content extraction to keep data up-to date.
- Notifications can be sent provided the tracked web-page data have been submitted.
- Our web screen scraping tool is unique in data collection from web-pages with unstructured content flow, in contrast to most scrapping tools which are unable to perform such an operation.
- The content you can harvest with our web content grabber is absolutely different: text, images, files, links, meta tags, tag attributes and the like.
- Our web scrapping tool supports AJAX, so now you can collect content from all the AJAX enabled web-sites.
- You can collect website data to spreadsheets, databases, XML and CSV files. The data used in conjunction with the API can also be extracted.
- Build-in and custom filters support the content transformation in state as it was extracted.
You can use built-in or custom modifiers to post-process data after their extraction. - Our web scraping tool includes a simple but practical API. You can use API to get and post-process the extracted data from the inside of your own applications.
| Product Overview | User Interface |

