Help Needed: Extracting Blood Pressure & Glucose Readings Using ML

Hi everyone,

I’m working on a project where I need to extract readings from Blood Pressure and Glucose Machines using Machine Learning. These devices typically display values using 7-segment digits, which makes OCR challenging.

What I’ve Tried So Far:

  1. Open-source OCR models (e.g., Hugging Face, Tesseract, EasyOCR) – but they struggle with 7-segment digits.
  2. Google Cloud Vision API – This gives much better accuracy, but the problem is:
  • Different devices show varying amounts of information (e.g., time, date, previous readings, current readings, etc.).
  • The API returns a long string, making it difficult to extract the specific readings I need.

Additional Challenge:

I also attempted to fine-tune an open-source AI model that accepts image data, but I couldn’t train it on Google Colab’s T4 GPU due to memory limitations.
Need Help With:

  1. How can I accurately extract the correct values (e.g., systolic, diastolic, BPM, glucose level) from the text output of Cloud Vision API?
  2. Are there any efficient open-source models or techniques that handle 7-segment OCR better?
  3. Any recommendations on training an AI model on a lower-memory environment?

I’d really appreciate any guidance or suggestions to overcome these issues. Thanks in advance!

1 Like

There also seem to be some lightweight methods that extract using image processing with OpenCV etc. without using ML, but how about trying out VLM, which is provided by Google, Microsoft, etc.?
These models are relatively small, so training them doesn’t take as much resources as larger models.

1 Like

Hello,
thanks for your question!
+1 to @John6666 response.

For a super quick prototype, I tried to search for famous vision language models available as serverless: Models - Hugging Face.

I gave a try with a few images like these: readings from Blood Pressure and Glucose Machines - Google Search

Qwen 2 VL got every value right. You can try with Qwen 2.5 VL too once available, or self-host it.

No training needed

3 Likes

Hi, Thanks for trying to help me. But when I wnat to run Qwen2-VL-2B / 3B/ 7B or others, there is some common problem I face is,

OutOfMemoryError: CUDA out of memory. Tried to allocate 230.66 GiB. GPU 0 has a total capacity of 39.56 GiB of which 3.03 GiB is free. Process 24867 has 36.52 GiB memory in use. Of the allocated memory 35.26 GiB is allocated by PyTorch, and 774.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

While I have used Colab Pro using a 40GB GPU. I have no idea how I can fix this. I do some optimization to save GPU. But nothing positive happened.

Can you tell me how I can fix this issue or run this model on Colab?

1 Like

Can you release the code for the model loading part?

According to the error message, it seems that the program is trying to allocate about 230GB of VRAM, which is strange no matter how you look at it…
Or, are you loading the model itself multiple times in the loop?

1 Like

Here is the model loading part.

# Fix PyTorch & torchvision CUDA mismatch
!pip uninstall -y torch torchvision torchaudio
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install required libraries
!pip install transformers accelerate peft safetensors
!pip install openai qwen-vl

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

# Model name
model_name = "Qwen/Qwen2-VL-7B"

# Load processor (for handling both text and images)
processor = AutoProcessor.from_pretrained(model_name)

# Load model (correct model type for VL tasks)
model = AutoModelForVision2Seq.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Move to GPU
model.to("cuda")

This model loading part runs on my GPU with around 15GB or less. However, when I provide an image for processing, I encounter a CUDA out-of-memory error.

def generate_text(prompt,image, max_new_tokens=1000):
    inputs = processor(images=image,text=prompt, return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return processor.batch_decode(output, skip_special_tokens=True)[0]


from google.colab import files
from PIL import Image

# Upload image
uploaded = files.upload()
image_path = list(uploaded.keys())[0]

# Open & resize image
image = Image.open(image_path)#.resize((512, 512))  # Reduce resolution
prompt = "describe and give me full reading from this picture!"
output_text = generate_text(prompt, image)

Is any optimization needed to fix this issue?

1 Like

It seems that the error was probably just the result of forgetting to apply the Chat Template. The pipeline will handle all of that for you, but in many cases it is more memory efficient to do it manually.

import torch
from transformers import AutoProcessor, AutoModelForVision2Seq

# Model name
#model_name = "Qwen/Qwen2-VL-7B"
model_name = "Qwen/Qwen2-VL-2B-Instruct"
# Load processor (for handling both text and images)
processor = AutoProcessor.from_pretrained(model_name)
# Load model (correct model type for VL tasks)
model = AutoModelForVision2Seq.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
# Move to GPU
model#.to("cuda") # If you do this, there is no point in having device_map=“auto”, so delete one of them.

def generate_text(prompt, image, max_new_tokens=1000):
    import gc
    inputs = processor(images=[image], text=[prompt], return_tensors="pt").to("cuda")
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=max_new_tokens)
    # Clear GPU cache
    inputs.to("cpu")
    del inputs
    gc.collect()
    torch.cuda.empty_cache()
    return processor.batch_decode(output, skip_special_tokens=True)[0]

#from google.colab import files
from PIL import Image

# Upload image
#uploaded = files.upload()
#image_path = list(uploaded.keys())[0]

# Open & resize image
#image = Image.open(image_path)#.resize((512, 512))  # Reduce resolution

prompt = "describe and give me full reading from this picture!"

import requests
from io import BytesIO
url = "https://huggingface.co/qresearch/llama-3-vision-alpha-hf/resolve/main/assets/demo-2.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
messages = [{"role": "user", "content": [{"type": "image", "image": url}, {"type": "text", "text": prompt}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

output_text = generate_text(text, image)
print(output_text)
2 Likes

Thanks. This codebase resolves the issue. but upload image gets old error.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.