Currently, BetterTouchTool (BTT) has implemented a feature for quick OCR of screenshots to extract text. Big thanks to Andrew for his brilliant work on this!
But we can take it a step further. What we truly need is real-time OCR and translation for all images on a webpage.
For reference, those familiar with Safari might already know how this works:
- When you pause a YouTube video or view an image in a post, you can directly copy the text on the image without needing to take a screenshot first.
- When reading foreign-language comics or browsing the web in Safari, text on multiple images can not only be directly copied but also translated. (This leverages macOS's Vision framework for real-time OCR and translation.)
This approach eliminates the tedious process of taking screenshots, copying text, and pasting it into a translation tool.
However, due to Safari's limitations, we need to implement this functionality on Chrome.
Does anyone have ideas on how to achieve this with BetterTouchTool?
For now, I’ve created a prototype project
:GitHub - louishino/BetterChrome: This is a memo for improving Chrome.