Not able to use tessract ocr with FAST API on hugging face space

Hey guys,

I am trying to implement FAST API on hugging face space using docker in which I am developing 2 endpoints 1 for OCR and 1 for translation. but whatever I do the OCR endpoint keeps throwing error “tesseract is not installed or it’s not in your PATH. See README file for more information.” please see below is the code for OCR endpoint. When I tried same code (without FAST API) with gradio app it works as expected and I am able to extract text from the image. It’s just not working for FAST API don’t know why. Please suggest.

import os
os.system("sudo apt-get install xclip")
import nltk
from fastapi import FastAPI, File, Request, UploadFile, Body, Depends, HTTPException
from fastapi.security.api_key import APIKeyHeader
from typing import Optional, Annotated
from fastapi.encoders import jsonable_encoder
from PIL import Image
from io import BytesIO
import pytesseract
from nltk.tokenize import sent_tokenize
from transformers import MarianMTModel, MarianTokenizer

API_KEY = os.environ.get("API_KEY")

app = FastAPI()
api_key_header = APIKeyHeader(name="api_key", auto_error=False)

def get_api_key(api_key: Optional[str] = Depends(api_key_header)):
    if api_key is None or api_key != API_KEY:
        raise HTTPException(status_code=401, detail="Unauthorized access")
    return api_key

@app.post("/api/ocr", response_model=dict)
async def ocr(
    api_key: str = Depends(get_api_key),
    image: UploadFile = File(...),
    # languages: list = Body(["eng"])
):
    
    try:
        content = await image.read()
        image = Image.open(BytesIO(content))
        print("[image]",image)
        if hasattr(pytesseract, "image_to_string"):
            print("Image to string function is available")
            print(pytesseract.image_to_string(image, lang = 'eng')) # this line is not working
            text = ocr_tesseract(image, ['eng'])
        else:
            print("Image to string function is not available")
        # text = pytesseract.image_to_string(image, lang="+".join(languages))
    except Exception as e:
        return {"error": str(e)}, 500

    return {"ImageText": "text"}

Below is the full error

===== Application Startup at 2023-12-10 09:01:32 =====

sh: 1: sudo: not found
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
[image] <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=245x88 at 0x7FB3E5A84D60>
Image to string function is available
INFO:     10.16.34.18:2801 - "POST /api/ocr HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 292, in app
    content = await serialize_response(
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 155, in serialize_response
    raise ResponseValidationError(
fastapi.exceptions.ResponseValidationError: 1 validation errors:
  {'type': 'dict_type', 'loc': ('response',), 'msg': 'Input should be a valid dictionary', 'input': ({'error': "tesseract is not installed or it's not in your PATH. See README file for more information."}, 500), 'url': 'https://errors.pydantic.dev/2.5/v/dict_type'}

Are you installing tesseract in your docker image?
Edit your docker file to install tesseract.

Also the line os.system("sudo apt-get install xclip") is failing. Insider adding that to your docker image instead of installing via the python script.

My guess is that the gradio image likely comes with tesseract installed.

1 Like

As suggested I have added a command for installing tesseract-ocr in my dockerfile and it works now. Thanks for your suggestion :innocent: :+1:. Also, I removed the os.system("sudo apt-get install xclip") line as it is not required.

2 Likes