Friday, May 1, 2015

How To Change TOR Browser IP or Identity with C# Without Password

In our previous post here we described what Tor is and how to change identity using Python.

Here we will discuss how to change Tor identity in C# environment.

First of all, if you write authentication method in C#, you have to set an password to access Tor. And you have to hash that password and put under torcc-default file. Which is sometimes not possible due to different reasons.

We can use another way to change Tor identity, which was told in previous post. We can just write that simple python script and put it somewhere in our computer. Then we can just invoke it using CommandShell or cmd.exe.

To do this, we have to install Python first, then some changes in path variable and then we are good to go.

1. Download and install Python. Here you can get Python 2.7.9 which is last version for Python2.
2. After installation completes, we have to add something to path variable. You can find tutorials on it with some googling. If you have installed 2.7.9 then add "C:\Python27;C:\Python27\Lib\site-packages\;C:\Python27\Scripts\;" to the PATH envitonment variable.
3. This enables running Python scripts by just typing python "file-name".

Then here is C#.Net part for calling cmd.exe asynchronously which will not interrupt our own process.

System.Diagnostics.Process process = new System.Diagnostics.Process();
System.Diagnostics.ProcessStartInfo startInfo = new System.Diagnostics.ProcessStartInfo();
startInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
startInfo.FileName = "cmd.exe";
startInfo.Arguments = "/C python c:\renew_tor.py";
process.StartInfo = startInfo;
process.Start();

Here renew_tor.py file contains python code to refresh Tor. You can find it here.

This way you don't have to authenticate with Tor.

Mehmet

How To Change TOR Browser IP or Identity using Python

Tor is a good and huge organization which stand alone with contribution from all over the world. Their aim is to satisfy anonymity need of people for free. They have option for donation, even you can just donate a server, and they run tor exit nodes on it. That's how they are still around.

When it comes to scraping, anonymity may not be your first task to accomplish. You have to satisfy other stuff before that, like avoiding anti-scrape measures. Most famous one is ip-bans, and it may become annoying and unsolvable without using proxies.

Many company offers proxies on the internet, havent bought one and probably will never. That's because of Tor, and happy to use it.

In many scenarious when you banned and can't access to site no more, then you have to change identity and move on. Normallt this requires manual script run or clicking on browser. If we are building non-stop fully automatic bots, then we have to find out how to change ip automatic as well.

Our sample here is written in python.

from stem import Signal
from stem.control import Controller

with Controller.from_port(port = 9051) as controller:
  controller.authenticate()
  controller.signal(Signal.NEWNYM)

If we take a closer look, it connects to port of Tor console. Which is 9051 in this case, but it can be 9151 in some cases. Even you can change these settings in config files. Then we send Authenticate keyword to shake hands. After that we just send our command, NEWNYM which refers to new identity.If status is not "250" then it means you have an error, and attempt was not succesful.

To be able to use stem, you have to install it by pip install stem.

I hope this helps.

Mehmet

Wednesday, April 15, 2015

Selenium vs Watin using .Net C#

There are many other posts on this topic, but in the end reader has to make decision to which one to use. Both libraries are developed for test automation and have cons and pros to discuss. We will take a look into this matter from view of Scraping. As I have never used them for testing purposes.

I will just list what I experienced with both libraries, and then try to make a point. Lets start with Watin project.

Watin Cons

  • You can start here to learn something on Watin. Not updated since some serious time so you have to dig into Google.
  • Has support for IE only. (They supported Firefox v4 or something so it is not stable to run)
  • IE works fine on version 9. But Version 9 Support ended on Windows 8 machines. So you have to use IE 11 with Watin. This usage causes some undetermined states with program. E.g. if you touch IE11 window while watin is still managing it, then watin will lose control over it. 
  • Again IE 9 may deprecate soon, so not a good thing to rely on.
  • You can easily apply proxy on Watin, since you have to change Internet Options, which is very tricky.
Watin Pros
  • Watin controls IE in a native way, so its performance is much better than Selenium IE version. 
  • IE 9 does not crash easily, and you can run may simultaneous windows. 
Selenium Cons
  • Uses WebDriver to control browsers, which adds another layer so it is performance is slower than Watin.
  • Native Firefox webdriver sometimes crashes, and it does not have a solution. It says "Directory is not empty". And you have to kill all firefox exes, and restart.
Selenium Pros
  • You can find Selenium headquarters here, it gets an update once in a month or so. Have support forum which is alive, and you can almost find any fix to your problems.
  • Has support for almost every browser you can think of. Firefox, Chrome, Safari, etc.
  • Firefox comes with native support by Selenium, so you dont need external webdriver for that. 
  • HtmlUnit uses headless browser and even that one has webdriver support for Selenium.
  • Proxies can be easily applied for each webdriver instance, so you can run two browsers at the same time, one with proxy the other not.

If we come to the point, Watin has lost its support and this may be problem in a year or two since it runs good on Windows 7 but not on Windows8+. This is a huge issue, if you are going to invest for more than a year. You can choose Watin for short term projects which need performance.
On the other hand, Selenium has good support and despite the performance issues, it is reliable. It may lack some performance compared to Watin but I think it may be get better later. 

My vote goes for Selenium, as a former Watin-eer.

Mehmet