Simple Captcha Reader - Blog | Full Stack Web Developer, Artificial Intelligence Specialist

Simple Captcha Reader

Simple captcha reader mechanism using Gaussian Naive Bayes algorithm

Use case

Let's say you have to read the text written over the captcha

import numpy as np
from skimage import io
from sklearn.naive_bayes import GaussianNB

image = io.imread('captcha.jpg', flatten=True)

First step is to convert the image captcha into the matrix. By doing this we will get a two-dimensional array something like this:

[
    [
        1.         0.98039216 0.99607843 0.98039216 0.99607843 1.
        0.99215686 0.99607843 1.         1.         1.         0.98039216
        1.         0.97647059 1.         0.99215686 0.99607843 0.91372549
        1.         0.97647059 1.         0.9372549  0.76470588 0.96862745
        0.96078431 0.97647059 0.98823529 0.96862745 0.99607843 1.
        0.96470588 0.99607843 1.         0.69803922 0.79607843 0.75294118
        0.73333333 0.76470588 0.74509804 0.74117647 0.75686275 0.76862745
        0.7372549  0.72941176 0.78823529 0.70588235 1.         1.
        0.94117647 1.         0.98039216 1.         1.         0.96862745
        0.98823529 0.97254902 0.96078431 1.         0.94901961 0.98431373
    ]
   [
       1.         0.99215686 1.         1.         1.         1.
        0.99607843 1.         0.97254902 1.         0.98039216 0.99215686
        0.98039216 0.98431373 0.98039216 0.98039216 1.         1.
        0.94901961 0.97647059 1.         1.         0.73333333 0.98823529
        1.         1.         1.         0.96078431 0.97647059 1.
        1.         1.         1.         0.76862745 0.96078431 0.97647059
        0.96862745 1.         0.98039216 0.98039216 0.97647059 0.99607843
        0.98823529 0.98823529 0.98823529 0.79607843 1.         0.96862745
        1.         0.97254902 0.96862745 0.96078431 0.98431373 0.98039216
        1.         1.         1.         0.98823529 0.97254902 0.95294118
    ]
    ..................................................................
   [
       0.98431373 0.96862745 0.98431373 0.98039216 0.98823529 1.
        0.94117647 1.         0.97254902 1.         0.99215686 1.
        0.98431373 1.         0.99215686 1.         0.92941176 1.
        0.97647059 0.98431373 0.98431373 0.98431373 0.70196078 1.
        0.99607843 0.99215686 0.98431373 0.99215686 0.99215686 0.98039216
        0.98431373 0.99215686 0.95294118 0.75294118 0.99607843 1.
        1.         1.         1.         1.         1.         1.
        1.         1.         0.99607843 0.76470588 0.98431373 0.97647059
        0.94117647 0.7254902  0.81176471 0.74901961 0.76470588 0.75294118
        0.76470588 0.74509804 0.72156863 0.76470588 1.         0.98823529
    ]

where the row correspond to each of the X pixels, and the column to each of the Y pixels. As you can see the values in array are in the range from 0 to 1. Where 0 is the most dark color and 1 is the most white color we have in the image. Ideally would be to simplify our array so it would contain just 0 or 1 numbers.

image = np.where(image > 0.4, 1, 0)

As you can see our captcha text has darker color then the background. Thus, we can change our array in a way by specifying if a number is bigger then 0.4 - we change it to 1 and otherwise we replace to 0. Our array should look something like this now:

[
    [ 1 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 ... ]
    [ 1 0 0 1 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 ... ]
    [ 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 ... ]
    [ 1 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 1 1 ... ]
    [ 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 ... ]
    [ 1 1 0 0 1 1 0 1 1 1 1 1 1 1 0 0 0 0 1 ... ]
    [ 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 ... ]
    [ 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 1 ... ]
    [ 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 ... ]
    [ 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 ... ]
]

As you can see in the image when removing all of the 1 you will get a clear picture about seeking symbols. In our case they are VS.

[
    [   0 0         0 0         0 0 0 0     ... ]
    [   0 0         0         0 0     0 0   ... ]
    [   0 0       0 0         0             ... ]
    [     0       0 0         0 0           ... ]
    [     0 0     0 0         0 0 0 0 0     ... ]
    [     0 0     0               0 0 0 0   ... ]
    [     0 0   0 0                   0 0   ... ]
    [       0 0 0 0                   0 0   ... ]
    [       0 0 0           0 0 0 0 0 0 0   ... ]
    [       0 0 0             0 0 0 0 0     ... ]
]

The next step would be to extract the symbols you need in a separate, smallest array:

symbol1 = image[11:21,5:13]
symbol2 = image[11:21,14:22]
...

From the other side you have to create your models for each letter. Let's take Z as an example:

Z = np.array(
    [
        0,0,0,0,0,0,0,1,
        1,1,1,1,1,0,0,1,
        1,1,1,1,1,0,0,1,
        1,1,1,1,0,0,1,1,
        1,1,1,0,0,1,1,1,
        1,1,0,0,1,1,1,1,
        1,0,0,1,1,1,1,1
        0,0,1,1,1,1,1,1,
        0,0,1,1,1,1,1,1,
        0,0,0,0,0,0,0,1
    ]
)

Is the time to train our model now:

clf = GaussianNB()
clf.fit(
    [
        N0,N1,N2,N3,N4,N5,N6,N7,N8,N9,
        A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,
        P,Q,R,S,T,U,V,W,X,Y,Z
    ],
    [
        '0','1','2','3','4','5','6','7',
        '8','9','A','B','C','D','E','F',
        'G','H','I','J','K','L','M','N',
        'O','P','Q','R','S','T','U','V',
        'W','X','Y','Z'
    ]
)

predicted = clf.predict(
    [
        symbol1.ravel(), 
        symbol2.ravel(), 
        symbol3.ravel(), 
        symbol4.ravel(), 
        symbol5.ravel()
    ]
)

print(predicted)

We should get our seeking symbols now. In our case that's "VSM2A". Thank you for reading to the end :)

Technologies

Python
sklearn
GaussianNB
numpy