Boom box!

I haven’t found a tool that I am comfortable with that generates good OCR for my writing style because I write in cursive. I want to finetune a model specifically for my needs so I need a bunch of data for that.

I developed a tiny new tool with help of Claude 3.5 called “Boom Box” or for short bbox. Its a single file. It is used to create bounding boxes and generate captions.

I am going to train this models by passing my two notebooks of notes and hoping that it is enough for a finetune for a model like the new Florence-2 or TrOCR.

The app is really simple, you can draw bounding boxes and input text on them but there is one cool trick. I have added a TrOCR model to generate captions automatically and it all happens in browser, no calls to an inexistant backend! The model is generally good for human written text but still fails on me consistently.

Eventually this project will lead to have a way to just take a picture of my notes and have perfect OCR over them but that’s still a bit far away. For now, I am testing the UI on this blogpost (it was written by hand and later will be used as training data) and I used the app to see how uncomfortable it got and felt it’s pain points. For now I am staying simple and releasing to everyone at bbox.snats.xyz.

It downloads the entire OCR model so you have to wait a little bit for the app to be usable with the autocaption feature. This was a fun tiny release and if someone else finds it useful, please let me know!

The entire code can be read in my monorepo m.

Back to articles