Friday, May 1, 2015

How To Change TOR Browser IP or Identity with C# Without Password

In our previous post here we described what Tor is and how to change identity using Python.

Here we will discuss how to change Tor identity in C# environment.

First of all, if you write authentication method in C#, you have to set an password to access Tor. And you have to hash that password and put under torcc-default file. Which is sometimes not possible due to different reasons.

We can use another way to change Tor identity, which was told in previous post. We can just write that simple python script and put it somewhere in our computer. Then we can just invoke it using CommandShell or cmd.exe.

To do this, we have to install Python first, then some changes in path variable and then we are good to go.

1. Download and install Python. Here you can get Python 2.7.9 which is last version for Python2.
2. After installation completes, we have to add something to path variable. You can find tutorials on it with some googling. If you have installed 2.7.9 then add "C:\Python27;C:\Python27\Lib\site-packages\;C:\Python27\Scripts\;" to the PATH envitonment variable.
3. This enables running Python scripts by just typing python "file-name".

Then here is C#.Net part for calling cmd.exe asynchronously which will not interrupt our own process.

System.Diagnostics.Process process = new System.Diagnostics.Process();
System.Diagnostics.ProcessStartInfo startInfo = new System.Diagnostics.ProcessStartInfo();
startInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
startInfo.FileName = "cmd.exe";
startInfo.Arguments = "/C python c:\renew_tor.py";
process.StartInfo = startInfo;
process.Start();

Here renew_tor.py file contains python code to refresh Tor. You can find it here.

This way you don't have to authenticate with Tor.

Mehmet

How To Change TOR Browser IP or Identity using Python

Tor is a good and huge organization which stand alone with contribution from all over the world. Their aim is to satisfy anonymity need of people for free. They have option for donation, even you can just donate a server, and they run tor exit nodes on it. That's how they are still around.

When it comes to scraping, anonymity may not be your first task to accomplish. You have to satisfy other stuff before that, like avoiding anti-scrape measures. Most famous one is ip-bans, and it may become annoying and unsolvable without using proxies.

Many company offers proxies on the internet, havent bought one and probably will never. That's because of Tor, and happy to use it.

In many scenarious when you banned and can't access to site no more, then you have to change identity and move on. Normallt this requires manual script run or clicking on browser. If we are building non-stop fully automatic bots, then we have to find out how to change ip automatic as well.

Our sample here is written in python.

from stem import Signal
from stem.control import Controller

with Controller.from_port(port = 9051) as controller:
  controller.authenticate()
  controller.signal(Signal.NEWNYM)

If we take a closer look, it connects to port of Tor console. Which is 9051 in this case, but it can be 9151 in some cases. Even you can change these settings in config files. Then we send Authenticate keyword to shake hands. After that we just send our command, NEWNYM which refers to new identity.If status is not "250" then it means you have an error, and attempt was not succesful.

To be able to use stem, you have to install it by pip install stem.

I hope this helps.

Mehmet

Wednesday, April 15, 2015

Selenium vs Watin using .Net C#

There are many other posts on this topic, but in the end reader has to make decision to which one to use. Both libraries are developed for test automation and have cons and pros to discuss. We will take a look into this matter from view of Scraping. As I have never used them for testing purposes.

I will just list what I experienced with both libraries, and then try to make a point. Lets start with Watin project.

Watin Cons

  • You can start here to learn something on Watin. Not updated since some serious time so you have to dig into Google.
  • Has support for IE only. (They supported Firefox v4 or something so it is not stable to run)
  • IE works fine on version 9. But Version 9 Support ended on Windows 8 machines. So you have to use IE 11 with Watin. This usage causes some undetermined states with program. E.g. if you touch IE11 window while watin is still managing it, then watin will lose control over it. 
  • Again IE 9 may deprecate soon, so not a good thing to rely on.
  • You can easily apply proxy on Watin, since you have to change Internet Options, which is very tricky.
Watin Pros
  • Watin controls IE in a native way, so its performance is much better than Selenium IE version. 
  • IE 9 does not crash easily, and you can run may simultaneous windows. 
Selenium Cons
  • Uses WebDriver to control browsers, which adds another layer so it is performance is slower than Watin.
  • Native Firefox webdriver sometimes crashes, and it does not have a solution. It says "Directory is not empty". And you have to kill all firefox exes, and restart.
Selenium Pros
  • You can find Selenium headquarters here, it gets an update once in a month or so. Have support forum which is alive, and you can almost find any fix to your problems.
  • Has support for almost every browser you can think of. Firefox, Chrome, Safari, etc.
  • Firefox comes with native support by Selenium, so you dont need external webdriver for that. 
  • HtmlUnit uses headless browser and even that one has webdriver support for Selenium.
  • Proxies can be easily applied for each webdriver instance, so you can run two browsers at the same time, one with proxy the other not.

If we come to the point, Watin has lost its support and this may be problem in a year or two since it runs good on Windows 7 but not on Windows8+. This is a huge issue, if you are going to invest for more than a year. You can choose Watin for short term projects which need performance.
On the other hand, Selenium has good support and despite the performance issues, it is reliable. It may lack some performance compared to Watin but I think it may be get better later. 

My vote goes for Selenium, as a former Watin-eer.

Mehmet





Tuesday, October 16, 2012

Web Scraping - Open Source .Net Libraries

Nowadays, everyone may need some data from internet for different purposes. Some may need listings of businesses that she have to work with, some may need some book listing that she will sell on her website, some may need entire data of a website, and this may go beyond your imagination. You never know who will need what.

This becomes a major problem and companies or individuals seek for solutions. Here you can guess manual data entry as a solution, indeed it may be a solution but when the problem gets bigger you have to spend that much time on manual data entry. 
There is an alternative solution which is called Web Scraping. Scraping action is accomplished with automated software. In my case, I use .Net and tried several different libraries. I may list them as Watin, WebZinc, HtmlAgilityPack and HtmlAgilityPack's wrapper Fizzler. I will try to explain their differences.

Let me start with WebZinc. I found it when I needed it actually. A former client requested it to be used as main component of scraper application. It is not totally open source but you can use it as free or just buy it for 99$. When you use it for free, it pops up an alert window which requires you to click "OK" each time application runs. WebZinc has ability to visual browsing, which means that it initiates a browser instance and you can see what is going on. It also has non-visual methods which is good for applications which will run simultaneous or on a web server.

Watin is another option for web scraping, which I usually use. It requires you either choose Firefox or IE as visual browser. It basically manages that browser instance so you can check status of web pages and what is going on. You should choose IE which is better since Watin supports Firefox 3.6.28 which is very old. Watin has better documentation than WebZinc since it is very hard to find anything on google about WebZinc. Using Watin convention is simple, it has objects for almost each Html tag like Table, TableRow, TableCell, Form, Div, Button, Image, Para (which is actually p tag), List (ul or ol element), ListItem (li element), etc. Each of these objects have almost same actions like you can simply just call .Click() method to click on that element.

HtmlAgilityPack is the last option that you may use. This is not as functional as Watin, like you have to code a lot to just simply mimic click action of a button. It is good for solid Html text where you do not need to post or get anything via buttons or javascript. It is based on Webclient so you can use proxies easily and just parse the acquired html. It is actually based on XPath to select Html elements. If you have to use HtmlAgilityPack, then you need to use something on top of it, or you have to write your own Wrapper library.

Fizzler uses HtmlAgilityPack at basic level and adds some more functionality to it. But not more functional than HtmlAgilityPack. It adds css selector ability and you easily adapt if you are familiar with javascript.

As a conlusion, I prefer Watin, but in some cases it is not enough or suitable for web scraping. I will write more about Watin on later posts.

Let me know if anything missing or there is a mistake.
Mehmet.

Friday, September 21, 2012

DropDownListFor vs ListBoxFor in Asp.Net MVC

Our task was simple as changing DropDownListFor into ListBoxFor. It does not return any errors on compile time when you just change DropDownListFor into ListBoxFor. But it is not that simple.

We did use MultiSelectLists to populate our DropDownList and it was working just fine until we decided to change it into ListBoxFor. Then we thought that there is something that we have missed. Our DropDownListFor was in the following format,

          <%:Html.DropDownListFor(
                    x => x.ContacId,

                    new MultiSelectList(Model.Contacts, "ContactId", "ContactName"))%>

It is just a simple dropdown, which have element id as ContactId, and text as ContactName. The populating IEnumerable was Model.Contacts. So far it is just fine and working well.

Then we need to change this into ListBoxFor since user needs to select multiple elements and dropdown component is just not enough for it.

When you just change dropdown into listbox it seems fine until you run the page. We got source is null error and at first glance did not have any idea about it. After some google searches we found a solution which simply indicates that listboxfor needs selected element list as first argument not id of select list (in this case it is ContactId).

We modified our model with a new list which just contains selected items in listboxfor. So our listboxfor turned into following,

          <%:Html.ListBoxFor(
                     x => x.SelectedContacts,

                    new MultiSelectList(Model.Contacts, "ContactId", "ContactName"))%>

You have to simply indicate selected items as a list since it has the ability to select multiple items.

Hope it helps,
Have a nice day.

Mehmet