There’s an amazing Android app called CamScanner which lets you use the camera of your mobile phone and scan any text document. I’ve been using the app since few months and the best thing about the app I like is its perspective transformation i.e. to transform an angled image (non-top-down clicked image) and display it as if it was captured top-down at 90 degrees. What is worth praising is that the transformed image is quite clear and sharp. Another good feature I like is its smart cropping. It automatically detects the document boundary and even allows the user to crop it as per the requirement.
Being a Computer Vision enthusiast, I thought of building a pretty unsophisticated and rustic implementation of a document scanner using OpenCV and Python.
For all the impatient folks, TL;DR here is the link to the code : https://github.com/vipul-sharma20/document-scanner
My sincere thanks to the article and the author here: http://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/ which has some really good set of articles on OpenCV and way more informative.
Implementation of Scanner
In layman’s terms:
- Capture image
- Detect edges
- Extract desired object / define contours
- Apply perspective transformation on extracted object
- Thresholding text content (If required)
I could’ve used my webcam but it cannot capture images which are readable enough. Therefore, for illustration I’ve captured a test image of a document from my phone’s camera.
Original Image (Document to scan)
The original image is resized and scaled down as OpenCV’s methods may not perform accurately for very large dimensions. (Above image is the scaled down/resized version)
The original image is converted to grayscale and then blurred using Gaussian Blur technique.
Original Image (Grayscaled)
Original Image(Gaussian Blurred)
(notice that this is image is smoother than above)
By blurring, we create smooth transition from one color to another and to reduce noise and edge content. But, we have to be careful with the extent of blur as we DO want our script to detect edges of the document.
Edge detection technique is used to find boundaries of objects in an image by analyzing varying brightness in the image. Here, it is being used for segmenting image. More precisely, we’ll use Canny Edge Detection technique.
Edged Image (Canny Edge Detection)
After performing Edge Detection, we’ll try to extract the document to be scanned from the image. Therefore, we’ll find the document boundary by drawing contours around the edges detected and choose the appropriate contour.
Drawing all contours
Looks beautiful right 🙂 ?
Here, we can see that there is a boundary traced along the edges of our document but there are some other irrelevant contours too. Also, it is clearly visible that the area within the contour of the document is larger than the area enclosed by any other contours and we can use this fact to get the right boundary to extract our document.
Let’s get rid of the extraneous contours by selecting the contour of largest area. To get a boundary with only 4 vertices, I have approximated the contour; which means, to approximate a contour to another shape which has a less number of vertices.
Boundary around the document (Contour Approximated)
The original image is captured at an angle and is not perfectly top-down image which was deliberately done. Even if we crop the image around the contour, the cropped content would not look like a scanned document. A scanned document is always as if it was captured/scanned exactly from vertically above.
Therefore, we’ll apply perspective transformation. In perspective transformation, one arbitrary quadrilateral is mapped to another and hence, a skewed image (quadrilateral) can be transformed into a square/rectangle by defining a new mapping for each pixel.
Some nice discussion regarding the equations involved and what takes place behind the scenes: http://stackoverflow.com/questions/3190483/transform-quadrilateral-into-a-rectangle
This looks better. If someone wants to give it a B/W look and feel, one can always try thresholding!
If we threshold the above image using Adaptive Gaussian Thresholding method we can get a B/W document.
Adaptive Gaussian Thresholding
As mentioned earlier, the original image was scaled down before processing. Therefore, the above two images are not as sharp and clear as they could’ve been which is one of the issue I am looking forward to fix. I need to find out a better way to get an optimally scaled image.
TODO (I would love to hear your suggestions):
- Resolve issue regarding the use of scaled down image
- Maybe use an image to pdf converter to convert the scanned image to pdf
- Refactor the code (like an API ?) before it wreaks havoc
- Test with more images, angles, colors, sizes, background and optimize .. optimize .. optimize
- Add issues here : https://github.com/vipul-sharma20/document-scanner/issues
GitHub Repository: https://github.com/vipul-sharma20/document-scanner
Learn Coldplay origami star: http://cldp.ly/ASFOSpdf 🙂