Fun with weak CAPTCHAs

Reading yesterday Andres Riancho’s blog post about weak CAPTCHAs, it quickly crossed my mind that it would be ridiculously simple to adjust his script to defeat an unofficial CAPTCHA solution for BlogEngine.NET that I myself used a few days on my blog.

input1_1 input2_1

The image creation procedure used for this CAPTCHA, if one searches on Google, seems to be mentioned in a few places, a gradient technique mixing blue and red.

So what we have:

  • Only letters and numbers are used.
  • The letters and numbers are not rotated.
  • The letters and numbers are pretty clean.
  • All letters and numbers have the same height.
  • The letters and numbers do not have the same color, but the way the colors are used, this should not pose any major problems.
  • The way the background noise is used, we should be able to get rid of it pretty quick.

We just need to add a few lines to Andres Riancho’s script.

1. First convert it to gray scale.

input1_2 input2_2

2. Then adjust the contrast a little bit to get black letters and numbers with a some grey background noise.

input1_3 input2_3

3. And by now, we can use the original script since the letters and numbers should be black, so we can filter out the background noise, just like in the original script.

input1_4 input2_4

4. The results:



I tested 18 different such CAPTCHA images, it missed just 3(say confused O with 0), YMMV.

from PIL import Image
from PIL import ImageEnhance

#convert it to gray scale
img ='input2.gif')
img = img.convert('L')"1.gif", "GIF")

#adjust the contrast a little bit
test ='1.gif')
img = ImageEnhance.Contrast(test)
img.enhance(1.9).save("2.gif", "GIF")

img ='2.gif')
img = img.convert("RGBA")

pixdata = img.load()

# Clean the background noise, if color != black, then set to white.
for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y] != (0, 0, 0, 255):
            pixdata[x, y] = (255, 255, 255, 255)"3.gif", "GIF")

#   Make the image bigger (needed for OCR)
im_orig ='3.gif')
big = im_orig.resize((116, 56), Image.NEAREST)

ext = ".tif""4" + ext)

#   Perform OCR using pytesser library
from pytesser import *
image ='4.tif')
print image_to_string(image)

Comments (2) -

  • Nice! I liked the changing to grayscale and enhancing the contrast ideas Smile
    • The quickest way of doing it that came on my mind. Smile

      BTW, great script you've made, it can be adjusted with ease to solve various CAPTCHAs.
Comments are closed