The Problem With Most Online OCR
If you've ever needed to extract text from an image, you've probably noticed something annoying about the existing options. Free online OCR sites work by asking you to upload your image to their server, where their software runs the recognition and sends the text back to you. That's fine for a recipe screenshot. It's much less fine for a medical letter, a legal contract, a confidential work document, or anything personal. Your image now lives on someone else's machine, potentially in logs, possibly used to train future models.
The paid options (Adobe Acrobat, ABBYY FineReader, OCR plug-ins for office software) are excellent, but most charge ongoing subscriptions and require you to install desktop software. Then there's the awkward middle ground: free apps that pretend to be local but quietly send your data to their servers anyway.
What's been missing is a tool that does OCR in your browser, with no upload step at all. Your image stays on your computer. The recognition happens locally. The text appears in front of you a few seconds later. That's exactly what Type Shifter now does.
How Type Shifter's OCR Works
Under the bonnet, Type Shifter uses a library called Tesseract.js, which is the JavaScript and WebAssembly port of Tesseract, the open-source OCR engine that Google has been maintaining since 2006. It's the same engine that powers a huge amount of professional document processing, just running inside a browser tab instead of on a server.
When you drop an image into the upload zone, Type Shifter does the following, all locally on your machine:
- Loads the OCR engine. The first time you use it, the library and the English language data download (roughly 50 MB combined, cached by your browser forever after). Subsequent uses skip this step entirely.
- Preprocesses your image. Doubles the resolution, converts to greyscale, and boosts contrast. This helps the recogniser pick up small text and faint characters that it would otherwise miss.
- Runs the recogniser. Tesseract scans the image, identifies characters and words, and produces the recognised text. Takes anywhere from a few seconds for a small image to half a minute for a high-resolution full page.
- Cleans up the output. Trims garbled UI chrome from screenshot edges, rejoins hyphenated words split across lines, and converts hard line breaks within paragraphs into proper flowing text. The result is text that reads naturally instead of looking like a column of fragments.
- Drops the text into your input area. Ready to format with any of the 60 templates, listen to with the Read Aloud feature, save as MP3, or export to PDF, DOCX, EPUB, or HTML.
Genuinely local. Genuinely private.
The image you scan never leaves your device. There is no upload step. There is no server in the loop. If you disconnect from the internet after the first model download, the OCR feature still works perfectly. That matters for anyone scanning sensitive documents, draft writing, medical records, or anything personal.
How to Use OCR in Type Shifter (Step by Step)
Type Shifter integrates OCR straight into the existing file upload flow. There's no separate menu or hidden button. You just drop an image where you'd drop a document, and the app figures out what to do.
Step 1: Open Type Shifter
Head to typeshifter.com/app in any modern browser (Chrome, Edge, Firefox, Safari, anything from the last few years).
Step 2: Drop or pick your image
You have two options:
- Drag and drop the image from your file manager onto the "Drop a file or click to upload" zone in the input panel.
- Click the upload zone to open a standard file picker, then choose your image.
Supported image formats: JPG, JPEG, PNG, WebP, BMP, GIF, TIFF. The format badges in the upload zone confirm what works.
Step 3: Wait for the OCR to run
A cyan progress strip appears below the upload area showing what's happening: "Loading OCR engine", then "Downloading English data (one-time)", then "Preparing image", then "Recognising text". On your first use this takes a minute or two depending on your connection speed. On every subsequent use it's much faster because the engine and language data are already cached.
While it runs, you can leave the tab open and switch to something else. The recognition won't pause if you're not watching.
Step 4: Review the recognised text
The text appears in the input area as soon as recognition completes. A green toast at the bottom of the screen confirms the character count: "Recognised 1,247 characters and cleaned the layout."
Have a quick look at the text and edit anything Tesseract got wrong (we'll cover what kinds of errors are common further down). The text is in a normal editable textarea, so you can type, paste, or delete just like any other document.
Step 5: Use the text however you like
From here, you have all the usual Type Shifter options:
- Click "SHIFT MY TEXT" to format the recognised text with whatever template you've picked.
- Click the 🔊 Listen button to have the text read aloud using one of the 28 neural voices.
- Click "Save Full Doc" or "Save Recording" to download an MP3 of the audio.
- Export to PDF, DOCX, EPUB, or HTML once you've formatted.
The fact that the source was an image rather than a document changes nothing about what you can do with it from here. OCR is just another way to get text into Type Shifter.
What Kinds of Images Work Best
Tesseract is genuinely excellent on cleanly printed text. Here are the categories that produce the best results:
- Photos of book pages. Hold the book flat, take the photo in even light, and you'll often get near-perfect recognition. Particularly good for novels, textbooks, and printed essays.
- Scanned PDFs. If a PDF is actually an image-of-text (which a lot of older or scanned-from-paper PDFs are), Tesseract handles it just fine. Take a screenshot of a page, or convert the PDF to PNG with a free tool, then drop the image in.
- Screenshots of articles, emails, and webpages. Phones and computers both let you screenshot the visible content. As long as the text is clear and reasonably large in the image, OCR handles it well.
- Printed letters and documents. A photo of a letter, contract, or printed page works well, especially if the paper is white and the text is black.
- Receipts and invoices. Good for receipts with clear printed text. Struggles with thermal receipts that have faded or come out grainy.
And here are the categories that don't work as well:
- Handwriting. Tesseract is built for printed text. It handles tidy block-letter handwriting passably, cursive very poorly. Use a different tool (such as Google Keep's handwriting recognition) for handwritten notes.
- Skewed or rotated photos. Try to keep the page roughly aligned with the photo frame. Tesseract has automatic deskew, but it's not magic, especially on heavily rotated images.
- Very low resolution images. If the text is fewer than about 30 pixels tall in your image, recognition gets unreliable. Take a closer photo or use a higher-resolution scan.
- Complex multi-column layouts. Newspaper or magazine pages with several text columns can confuse the recogniser into reading across columns. Crop each column separately if you need perfect order.
- Images with heavy decoration. Logos, watermarks, decorative overlays, and "stylised" text (like a fancy font heading) often confuse OCR. Plain typesetting works best.
The First-Use Download (and Why It's Worth It)
The very first time you use OCR, your browser downloads two things: the Tesseract.js library itself (roughly 2 MB) and the English language data file (roughly 50 MB). Both get cached by your browser and never re-downloaded.
We use the higher-accuracy English data file rather than the smaller default. It's about five times bigger but produces noticeably better results, especially on tricky fonts and slightly skewed text. Given that this is a one-time download you'll never see again, the trade-off is well worth it.
Roughly how long the first use takes
On a typical 50 Mbps home broadband connection, the library and language data combined take about 20 to 30 seconds to download. On slower connections it can take a couple of minutes. After that first time, OCR runs feel near-instant because everything is cached locally. Even disconnecting from the internet doesn't break it.
Common Errors and How to Spot Them
Tesseract is good but not perfect. The most common categories of error to look out for in the recognised text:
- Number/letter confusion. "0" mistaken for "O", "1" for "l" or "I", "5" for "S". Particularly common in monospace fonts.
- Missing punctuation. Full stops and commas sometimes drop out, particularly on lower-resolution images.
- Weird symbols at edges. Bits of UI chrome (status bars, navigation icons, menu buttons) sometimes get recognised as garbled characters. Type Shifter's cleanup automatically trims most of these from the start and end of the output, but occasionally an extra symbol sneaks through.
- Joined or split words. Spaces between words sometimes disappear ("twowords" instead of "two words"), or get inserted where they shouldn't ("p arsley" instead of "parsley").
- Capitalisation drift. Headings in unusual fonts sometimes come out with mixed case ("ACkNowLedgmEnts" instead of "Acknowledgments").
None of these are unfixable. Skim the recognised text in the input area, fix anything obviously wrong (it's a normal editable textarea), and then format or listen as usual.
Using OCR With the Listen Feature
One of the most useful combinations is OCR + the Listen feature. Take a photo of a book chapter, scan it with OCR, then have Type Shifter read it aloud. Suddenly you have an audiobook of any printed material you own, free, in minutes, with no copyright complications because you're listening to your own copy.
A few practical tips for this workflow:
- Take photos in batches, not all at once. Tesseract handles one image at a time, so doing a whole chapter means scanning page by page. Get into a rhythm: photo, OCR, paste, photo, OCR, paste. Or scan one page and listen while you scan the next.
- Edit before listening for longer documents. A two-minute audio clip with one OCR error is fine. A thirty-minute clip with the same error rate is annoying. Quick proof-read pays off.
- Pick a voice that fits the content. For non-fiction or formal text, try George (British male, BBC-newsreader feel) or Michael (American male, professional). For novels and warmer reading, try Emma (British female, calm) or Bella (American female, very natural).
- Save the audio as MP3. Click "Save Full Doc" once OCR is done and Type Shifter generates the whole audio silently while you do something else. The MP3 lands in your downloads folder, ready to listen to on a phone, in a car, or on a walk.
Using OCR With the Bionic Reading Feature
Another useful combination: OCR a difficult-to-read photo (small text, faded print, awkward font) and then turn on Bionic Reading on the result. The text becomes much easier to skim because Bionic Reading's bolded word-beginnings act as visual anchors for your eyes.
This is particularly helpful for accessibility users (dyslexia, ADHD, eye fatigue) who frequently encounter printed material they'd struggle to read in its original form. OCR converts it, Bionic Reading makes it easier to process.
Using OCR With Templates
Once you've got the recognised text, applying a template gives it the visual identity you want. A few common pairings:
- OCR a recipe from a printed cookbook, then apply the Kitchen Recipe template to make it look like a proper cookbook page rather than raw text.
- OCR an academic paper from a journal photo, then apply Academic Formal for a clean, scholarly layout.
- OCR a vintage letter or handwritten note (if the handwriting is block enough), then apply Vintage Typewriter or Ancient Parchment for an aesthetic match.
- OCR a news article screenshot, then apply Editorial Press or Magazine Feature for a newsprint look.
The OCR output is always plain text, so you have complete freedom over how it gets rendered. Combine with custom fonts, sizes, colours and spacing if you want a specific look.
What OCR Costs (Spoiler: Nothing)
Tesseract.js is open source under the Apache 2.0 licence. Tesseract itself is open source under the Apache 2.0 licence. Type Shifter pays nothing to use them, and you pay nothing too. There's no API fee, no per-character pricing, no monthly subscription. The 50 MB language data downloads once and stays cached locally.
Compared to paid alternatives (Adobe Acrobat Pro at roughly £15 per month, ABBYY FineReader at roughly £120 one-off, the various "free with paid pro tier" web tools), Type Shifter offers OCR at zero cost as part of the existing 14-day free trial. After that, the £49.99 lifetime licence covers OCR along with everything else.
Other Things You Can Do With the Recognised Text
Once OCR has done its job and the text is in front of you, the rest of Type Shifter takes over. Here are the things people most commonly do next:
- Format and export as a clean document. Apply a template, then export as PDF, DOCX, EPUB, or HTML to share or archive.
- Save as an audiobook. Use the Listen feature plus the Save Full Doc option to generate an MP3 of the recognised text read by a neural voice.
- Translate (with another tool). Copy the recognised text and paste it into Google Translate, DeepL, or whichever translator you prefer. OCR is the hard part. Translation is straightforward once you have plain text.
- Search and edit. The text becomes a normal document you can search through, edit, annotate, and rewrite. Particularly useful for old printed material you want to update.
- Quote or cite. No more retyping passages from books or articles. Scan the page, OCR it, copy the bit you need, and quote it accurately.
A Quick Sanity Check on Privacy
Because this is important, here's exactly what happens to your image:
- The image is read into your browser's memory from your local disk (or paste buffer).
- Tesseract.js (loaded earlier from a CDN) processes the image entirely within the browser. No network requests are made with the image data.
- The recognised text appears in the input area on the page.
- Nothing is sent to Type Shifter's servers. Nothing is logged. Nothing is stored anywhere except in your own browser's memory until you close the tab.
If you're paranoid, you can verify this yourself by opening the browser's Network tab in developer tools while you scan an image. You'll see the initial library and language data downloads (from a public CDN, on first use only). After that, no further network requests are made during recognition.
Try It With Your Own Document
The fastest way to get a feel for the OCR feature is to try it on something you actually want to convert. Take a photo of a page from a book you're reading, or grab a screenshot of an article you've been meaning to get through. Drop it into Type Shifter. Wait a moment. Then experiment with formatting it, listening to it, or saving an audiobook of it.
You'll probably notice two things. First: the recognised text is usually a lot better than you'd expect from a free in-browser tool, especially on clean printed material. Second: the rest of Type Shifter (templates, neural voices, Bionic Reading, export formats) makes OCR much more useful than it would be as a standalone feature. Together they turn printed pages into searchable, editable, listenable, exportable text in seconds.
That's the whole pitch. No upload. No subscription. No sign-up. Just photos in, text out, ready to do whatever you want with.
Try OCR free for 14 days
Drop a photo, screenshot or scanned page. Get editable text in seconds. Nothing leaves your device.