Mehmet Kurtipek: September 2012

Our task was simple as changing DropDownListFor into ListBoxFor. It does not return any errors on compile time when you just change DropDownListFor into ListBoxFor. But it is not that simple.

We did use MultiSelectLists to populate our DropDownList and it was working just fine until we decided to change it into ListBoxFor. Then we thought that there is something that we have missed. Our DropDownListFor was in the following format,

<%:Html.DropDownListFor(
      x => x.ContacId,
      new MultiSelectList(Model.Contacts, "ContactId", "ContactName"))%>

It is just a simple dropdown, which have element id as ContactId, and text as ContactName. The populating IEnumerable was Model.Contacts. So far it is just fine and working well.

Then we need to change this into ListBoxFor since user needs to select multiple elements and dropdown component is just not enough for it.

When you just change dropdown into listbox it seems fine until you run the page. We got source is null error and at first glance did not have any idea about it. After some google searches we found a solution which simply indicates that listboxfor needs selected element list as first argument not id of select list (in this case it is ContactId).

We modified our model with a new list which just contains selected items in listboxfor. So our listboxfor turned into following,

  <%:Html.ListBoxFor(
       x => x.SelectedContacts,
      new MultiSelectList(Model.Contacts, "ContactId", "ContactName"))%>

You have to simply indicate selected items as a list since it has the ability to select multiple items.

Hope it helps,
Have a nice day.

Mehmet

Before starting with MODI.dll, I have to explain what OCR is. Converting images into text is called OCR (Optical Character Recognition). For example pdf files those are screenshots of a plain text page, can be converted into Microsoft Word files and this process requires OCR and this can be done in many ways.

There are lots of solutions which offers OCR in similar ways, some of them can be multilangual and some others can be better at recognizing only some set of fonts. Some of them can be open source and some others can require some licences.

If you need to use open source library, then I suggest you to start with Google's Tesseract. You can find some different C# wrappers like TessNet2, but it currently uses Tesseract2 and I suggest you to wait for a Tessseract3 wrapper.

I dig into Tesseract and at some point I was training my own font to improve results but then I decided to try other options since training is a little bit tricky and I will share it later. Then I found MODI and it actually does almost perfect OCR except that it requires Microsoft Office licence (I guess).

MODI was actually built in OCR dll which existed in Microsoft Office 2003 and 2007 but not in 2010 directly. They have moved MODI into OneNote 2010 but it is a different world.

If you have 2003 or 2007 Office then you just have to include Microsoft Office Document Imaging in your Office configuration. (Software -> Edit Microsoft Office Installation -> Add/Remove Components -> Office- Tools -> Microsoft Office Document Imaging has to be installed.)

If you do not have Office 2003 or 2007 like me, then you just have to get it from Microsoft website. Most simple way is to download this, and when you launch the setup click custom install, disable all components and under Office-Tools, enable just Microsoft Office Document Imaging. Then complete the install, and you should be able to add MODI as reference in your Visual Studio Solution. Other then that, in Start menu you can find an api for MODI under Microsoft Office Tools and try how good it is.

At this point, you have all you need to start OCR with MODI. For best results, I suggest you to save your images in TIF format, and then run OCR with MODI. Let me share my sample code, then you can develop your own methods.

I usually save images to a location and then use OCR so you can do that by web request.

byte[] imageBytes;

HttpWebRequest imageRequest = (HttpWebRequest)WebRequest.Create(imageUrl);
WebResponse imageResponse = imageRequest.GetResponse();
Stream responseStream = imageResponse.GetResponseStream();
using (BinaryReader br = new BinaryReader(responseStream))
{
imageBytes = br.ReadBytes(1500000);
br.Close();
}
responseStream.Close();
imageResponse.Close();

FileStream fs = new FileStream(saveLocation, FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs);
try
{
bw.Write(imageBytes);
}
finally
{
fs.Close();
bw.Close();
}

You can directly save downloaded image as tif or png but most ocr libraries requires TIF with no compression method so that we have to manipulate saved image with following code.

First we get saved image into a Bitmap. Then you should do the manipulations those are necessary. For example, I needed to extend image size by 4, and then I choose interpolation mode, smoothing mode and compositing quality to achieve better image quality for OCR.

Then you should pass encoder info as TIFF and in encoder parameters you should select it as last frame and compression none. Actually in a single tiff file you can add several images, so if you select last frame as multi frames then you can add more images into it. I should write about it as well in another day.

Bitmap bmp = new Bitmap("savedImage.png");

Bitmap dst = new Bitmap((int)(bmp.Width * 4), (int)(bmp.Height * 4));
using (Graphics g = Graphics.FromImage(dst))
{
Rectangle srcRect = new Rectangle(0, 0, bmp.Width, bmp.Height);
Rectangle dstRect = new Rectangle(0, 0, dst.Width, dst.Height);

g.InterpolationMode = InterpolationMode.HighQualityBilinear;
g.SmoothingMode = SmoothingMode.AntiAlias;
g.CompositingQuality = CompositingQuality.GammaCorrected;

g.DrawImage(bmp, dstRect, srcRect, GraphicsUnit.Pixel);
}

ImageCodecInfo encoderInfo = ImageCodecInfo.GetImageEncoders().First(i => i.MimeType == "image/tiff");

EncoderParameters encoderParams = new EncoderParameters(2);
EncoderParameter parameter = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionNone);
encoderParams.Param[0] = parameter;
parameter = new EncoderParameter(System.Drawing.Imaging.Encoder.SaveFlag, (long)EncoderValue.LastFrame);
encoderParams.Param[1] = parameter;

Image tif = (Image)dst;
tif.Save("fileName.tif", encoderInfo, encoderParams);

At the end we save our image file with the extension .tif. Here comes the best part where we do OCR. You should initialize MODI and then set our tif file's path and we will be good to go.

MODI.Document doc = new MODI.Document();
doc.Create("savedImage.tif");
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image img = (MODI.Image)doc.Images[0];
MODI.Layout layout = img.Layout;
string text = layout.Text;

Now text variable contains our image content and you can use it as you wish.

Have a nice day.

Mehmet.

Mehmet Kurtipek

Friday, September 21, 2012

DropDownListFor vs ListBoxFor in Asp.Net MVC

Sunday, September 16, 2012

OCR with MODI.dll (Microsoft Office Document Imaging)