{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using OCR in cloud\n", "\n", "This notebook shows how to send an image for optical character recognition (OCR) to Google Vision service.\n", "\n", "We call the ability to programs (in this case Google Vision) to communicate directly with other programs an application programming interface (API).\n", "\n", "First, we install modules to communicate with Google Vision and to display images. This notebook uses my credentials (in the file `key.json`) that will not be available after the lesson. However, you can generate your own (more information here: https://pypi.org/project/google-cloud-vision/).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip3 install --user google-cloud-vision" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip3 install --user pillow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The image\n", "The following image comes from David Rumsey Map Collection: https://www.davidrumsey.com/luna/servlet/detail/RUMSEY~8~1~247417~5515422:Text-Page--V--2--Preface--Guthrie,-#" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = \"2647012.jpg\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image \n", "Image(filename=path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the Google Vision API\n", "Replace the file `key.json` by your own after this lesson." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"]=\"key.json\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "from google.cloud import vision\n", "import io\n", "\n", "client = vision.ImageAnnotatorClient()\n", "\n", "with io.open(path, 'rb') as image_file:\n", " content = image_file.read()\n", "\n", "image = vision.types.Image(content=content)\n", "\n", "response = client.text_detection(image=image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Resulting text\n", "Observe the result. Are there OCR errors? What is their origin?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "texts = response.text_annotations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(texts[0].description)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for text in texts[1:]:\n", " print('\\n\"{}\"'.format(text.description))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualization\n", "In this step, we go through all extracted characters or group of characters and draw a rectangle. What do the rectangles mean? How does the OCR \"understand\" the text?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "from PIL import Image\n", "import numpy as np\n", "\n", "im = np.array(Image.open(path), dtype=np.uint8)\n", "\n", "# Create figure and axes\n", "\n", "fig,ax = plt.subplots(1, figsize=(18,16))\n", "\n", "\n", "# Display the image\n", "ax.imshow(im)\n", "\n", "for text in texts[1:]:\n", " from_vertex = text.bounding_poly.vertices[0]\n", " width = text.bounding_poly.vertices[-2].x - from_vertex.x\n", " height = text.bounding_poly.vertices[-2].y - from_vertex.y\n", " # Create a Rectangle patch\n", " rect = patches.Rectangle((from_vertex.x,from_vertex.y),width,height,linewidth=1,edgecolor='r',facecolor='none')\n", "\n", " # Add the patch to the Axes\n", " ax.add_patch(rect)\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }