Web scraping through WebBrowser (.Net WinForm Control)


In this post, I am going to explain how it is easy to extract and change the the web-page content through Web-Browser control. The Web-Browser is a Microsoft windows form control and easy to use. This post will explain you how you can use this control for your need.

The Web-Browser control is used to display the web-page in your application. The Web-Browser class is a powerful class that give you leverage to manipulate the html code, interact with JavaScript, automate the web scrapping and many more.

You can find more about the Web-Browser from MSDN library. The following steps will guide you how to use the Web-Browser.

1.       The first thing that you should do is to create a windows application project.

2.       Add a Web-Browser and two button controls on web form.  
3.       Set the “Url” property of the control or you can do this by writing code using the below code snippet.
webBrowser.Navigate("https://www.google.com");
4.       When you run the application, you will see the screen look as per below screen print

5.       If you want to set anything to web-page control use the below code and even you can fire the control events as well. Here, I am going to set the value to the search box (for this example I am going to set my name “Mohd Azharuddin Ansari” to search box) and then I will fire “Click” event of search button programmatically. If everything will go as per plan then google will present me the result based on the search criteria.
Code (This code will go on first button click event)
                webBrowser.Document.GetElementById("q").SetAttribute("value""Mohd Azharuddin Ansari");

            HtmlElement button = webBrowser.Document.GetElementById("btnK");

            button.InvokeMember("click");
Result

6.       Now if you need to extract this results on somewhere your code then you can do it using the below code
Code (This code will go on second button click event)
string searchResultText = "";
            HtmlElementCollection searchResult = webBrowser.Document.GetElementsByTagName("h3");

            foreach(HtmlElement he in searchResult)
            {
                searchResultText += he.InnerText + System.Environment.NewLine;
            }

            MessageBox.Show(searchResultText);
Result



Comments

Popular posts from this blog

SSIS Merge Join - Both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata SSIS

jsGrid

Add Item to SharePoint List with attachment using client object model