Building Keras powered OCR webapp

I was tasked with building a webapp/webservice based on neural network which was built on keras. So, I decided to write down what and how I did it.
This article is divided into 3 sections.

  • Building the Neural Network.
  • Building the frontend for webservice.
  • Building the backend for webservice.

Building the Neural Network

Keras makes it very easy to architct and train neural network hence I decided to use it for this task. I built Convolutional Neural Network for recognizing digits based on the MNIST handwritten digit dataset. Since CNNs perform better than traditonal multi layered peceptrons(MLPs), using CNNs was the obvious choice. This CNN is built based upon the following diagram.

architecture

MNIST dataset consists of total 70,000 handwritten images of digits. Keras is shipped with functions for preprocessing this dataset which splits this dataset into 60,000 images for training the model and other 10,000 for evaluating the model. Dataset can be loaded by

1
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

This statement returns two tuples , each containing image vector and image label for training. As shown in diagram our CNN expects input to be of size 28x28x1, thus array is reshaped and normalized for better performance.

1
2
X_train = (X_train.reshape(60000, 28, 28, 1).astype('float32')) / 255
X_test = (X_test.reshape(10000, 28, 28, 1).astype('float32')) / 255

Note that I’ll be using the one hot vector for the output label. Keras makes it pretty easy with
keras.utils.to_categorical()

1
2
Y_train = keras.utils.np_utils.to_categorical(Y_train, 10)
Y_test = keras.utils.np_utils.to_categorical(Y_test, 10)

After this layers can be stacked as given by the diagarm using a sequential model.

1
2
3
4
5
6
7
8
9
10
11
12
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(Convolution2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

Loss function can be optimized with various algorithms ( RMSprop, Adam, etc) which happens to be adadelta in this case. Adadelta is an extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate.

Full Code

After training is over, this model can be exported to HDF5 file on disk by

1
model.save('mnist_cnn.h5')

This file saves all the weights for this neural network, Allowing anyone to use this trained network as a blackbox on anyother machine. If any exception is encountered make sure h5py package is installed on your system.

Building the Webapp.

To build the frontend I used jquery + HTML5 canvas. HTML5 Canvas facilitates user to draw handwritten digits which then passed to backend for processing. So, HTML file looks like

1
2
3
4
5
6
7
8
9
10
11
12
<div id="canvascontainer">
<canvas id="ocrCanvas"></canvas>
</div>
<div class="buttons">
<button id="clear" onclick="clearCanvas()">Clear</button>
<button id="predict" onclick="predict()" >Predict</button>
</div>
<p>Prediction : <strong class="prediction"></strong></p>
<script
src="https://code.jquery.com/jquery-3.2.1.min.js"></script>
<script src="/static/app.js"></script>
</div>

Jquery is mainly for handling various DOM events and send AJAX requests to backend, although pure Javascript can do this aswell ( using addEventListener() and fetch() API).
In this canvas app, Two arrays X and Y are used for storing the coordinates. When mouse is first clicked on canvas, mousedown event fires and stars logging the X and Y coordinates into an array which is used to draw stuff on the canvas.

1
2
3
4
5
canvas.mousedown(function (e){
paint = true;
addClick(e.offsetX, e.offsetY);
redraw();
});

Paint variable is for tracking the state if user is still drawing on the canvas or not. When mouse is dragged after clicking somewhere on canvas, coordinates is pushed into an array and drawn at the same time.

1
2
3
4
5
6
canvas.mousemove(function(e){
if(paint){
addClick(e.offsetX, e.offsetY);
redraw();
}
});

When user releases the mouse or if the mouse goes out of canvas, The mouse coordinates arrays is cleared.

1
2
canvas.mouseup(cleanArray);
canvas.mouseleave(cleanArray);

Where cleanArray looks like

1
2
3
4
5
function cleanArray() {
paint = false;
clickX = [];
clickY = [];
}

When user clicks on predict button , predict() is fired and image is sent to backend using canvas.toDataURL() function through AJAX in JSON form. This function converts image to base64 data URLs when can be processed by the backend.

Data URLs are composed of four parts: a prefix (data:), a MIME type indicating the type of data, an optional base64 token if non-textual, and the data itself:

1
data:[<mediatype>][;base64],<data>

Full code

Building the backend

Building is backend is easy with Flask than any other python web framework. I tried to use Nodejs but that approach was inefficient, since it involved spawning new process for every request when using child_process module . I also tried various other Node/Python IPC but it did not feel smooth enough , besides Flask is easy to get up and running.

Data sent by the webapp is in data URL form, using request.get_json() first I extracted the base64 text and decoded it to binary form.

1
imagebytes = base64.b64decode(imageb64[imageb64.index(',') + 1:])

Pillow library provides methods which can be used to convert bytes to Image, thus

1
PIL.open(io.BytesIO(imagebytes))

converts raw bytes into an PIL object which numpy can parse.

I made a seperate module to preprocess the image using scikit-image library before I threw the task of prediction at the network. It involves these three steps-

preprocess

Network was trained on MNIST dataset which contains black and white 28x28 image in normalized float32 form thus preprocessing is necessary for correct output. This is also the reason why I made canvas background black.

Saved model can be loaded by keras.models.load_model() which provides predict() function to make prediction. In this case this methods returns probability vector because softmax was is being used for activation layer. Thus, we can simply return np.argmax() on the predict().

1
np.argmax(model.predict(image).reshape(10))

this result can be sent back to webapp in JSON form by

1
2
3
4
return jsonify(
result=str(result),
success=True
)

And final result looks like

success

Link to Project

References

An overview of gradient descent optimization algorithms
Keras Documentation
Flask Documentation
Data URLs
CS231n Convolutional Neural Networks for Visual Recognition
HTML5 Canvas APIs