Real-time OCR and Translation for All Web Images in Chrome

Currently, BetterTouchTool (BTT) has implemented a feature for quick OCR of screenshots to extract text. Big thanks to Andrew for his brilliant work on this!

But we can take it a step further. What we truly need is real-time OCR and translation for all images on a webpage.

For reference, those familiar with Safari might already know how this works:

  1. When you pause a YouTube video or view an image in a post, you can directly copy the text on the image without needing to take a screenshot first.
  2. When reading foreign-language comics or browsing the web in Safari, text on multiple images can not only be directly copied but also translated. (This leverages macOS's Vision framework for real-time OCR and translation.)

This approach eliminates the tedious process of taking screenshots, copying text, and pasting it into a translation tool.

However, due to Safari's limitations, we need to implement this functionality on Chrome.

Does anyone have ideas on how to achieve this with BetterTouchTool?

For now, I’ve created a prototype project
:GitHub - louishino/BetterChrome: This is a memo for improving Chrome.

I think the only way to achieve this in BTT is by using macOS's Vision framework. At that point, I think creating a Chrome Extension would be better suited than trying to integrate it into BTT. I see you've already started working on a Chrome Extension GitHub - louishino/ChromeLivetext: A Chrome extension for real-time image OCR and translation like macOS Safari's live text feature..

Cool idea!