π Getting Started
Type Shifter transforms plain text into beautifully formatted documents with optional Bionic Reading technology to help you read faster. Here's how to get started in minutes.
Quick Start Guide
Choose a Style Template
Select one of the 60 professional templates from the dropdown menu at the top of the app. Each template is designed for different types of content - from business documents to creative writing, sci-fi to storybook.
Not sure which to choose? Start with "Clean Business" for professional documents or "Literary" for books and articles.
Add Your Text
You have two options:
- Upload a file: Drag and drop a file onto the upload area, or click to browse. Supports TXT, RTF, DOCX, PDF, MD, EPUB, and HTML files up to 10MB.
- Paste text: Simply paste your text directly into the text area. This works great for content copied from websites, emails, or other sources.
Click "SHIFT MY TEXT"
Press the green "SHIFT MY TEXT" button to transform your document. Your text will be instantly reformatted using your chosen template.
Customise Your Output (Optional)
Use the enhancement options to further customise your document:
- Bionic Reading: Toggle on and adjust intensity (5-45%)
- Custom Fonts: Choose from 1,900+ Google Fonts for headings and body text independently
- Light/Dark Mode: Toggle the app theme between light and dark with the switch at the top
- Dark Mode Output: Switch exported documents to dark background for easier reading
- OpenDyslexic: Use the dyslexia-friendly font
- Compare Mode: View original and transformed text side-by-side
- Listen Mode: Hear your text read aloud with 28 neural voices, save the audio as MP3
- Image OCR: Drop a photo or screenshot and Type Shifter extracts the text automatically
Export Your Document
Click the export dropdown and choose your preferred format:
- HTML: Best for web viewing and sharing online
- DOCX: Opens in Microsoft Word and Google Docs
- EPUB: Perfect for e-readers
- PDF: Ideal for printing and universal sharing
π‘ Pro Tip
Your last 5 processed files are saved in the "Recent Files" section for quick access. File names are stored locally - your content is never saved anywhere.
ποΈ Bionic Reading
Bionic Reading is a revolutionary reading method that can help you read up to 25% faster while maintaining or improving comprehension.
How It Works
Bionic Reading highlights the initial letters of each word (typically the first 2-4 letters) in bold. These bolded portions are called "artificial fixation points."
When you read text with Bionic Reading:
- Your eyes are guided by the bold portions of each word
- Your brain recognises these anchor points
- Your mind automatically fills in the rest of each word
- This reduces cognitive load and speeds up reading
Example
Standard text: "The quick brown fox jumps over the lazy dog."
Bionic Reading: "The quick brown fox jumps over the lazy dog."
Adjusting Intensity
The intensity slider (5% to 45%) controls what percentage of each word is highlighted:
- Low intensity (5-15%): Subtle highlighting, just the first 1-2 letters. Good for experienced readers who want minimal visual guidance.
- Medium intensity (20-30%): Recommended starting point. Provides clear guidance without being overwhelming. Most users find their sweet spot here.
- High intensity (35-45%): More letters highlighted per word. Helpful for very long words, unfamiliar content, or readers who need stronger visual anchors.
π― Finding Your Perfect Setting
Start at 25% intensity and read a paragraph. If it feels too subtle, increase to 30-35%. If it feels too bold, decrease to 15-20%. Everyone's optimal setting is different - experiment to find yours!
Who Benefits Most
- People with ADHD: The fixation points help maintain focus and reduce mind-wandering
- Readers with dyslexia: Combined with OpenDyslexic font, provides significant reading improvement
- Students: Process textbooks and study materials faster
- Professionals: Get through reports, emails, and documents more efficiently
- Non-native speakers: Visual anchors help with unfamiliar vocabulary
- Anyone who reads extensively: Reduce eye strain and reading fatigue
β οΈ Note
Results vary by individual. Some people see dramatic improvement (30%+ faster), while others prefer standard text. The 14-day free trial gives you plenty of time to discover if Bionic Reading works for you.
π¨ Style Templates
Type Shifter includes 60 professionally designed templates, each optimised for different types of content and reading contexts.
Compare Mode
Enable Compare Mode to view your original text alongside the transformed version. This is useful for:
- Seeing exactly what changes Type Shifter made
- Comparing readability before and after
- Quality checking before export
- Demonstrating Bionic Reading to others
π‘ Template Tip
You can change templates after transforming your text - the content will instantly reformat. Try several templates to find the one that works best for your specific document.
π Listen & Save Audio
Type Shifter reads your documents aloud with 28 high-quality neural voices and lets you save the audio as MP3 files. The whole feature runs inside your browser using a small neural model called Kokoro-82M, so your text never leaves your device and there is no monthly fee, no API key, and no usage limit. Listen to a research paper while you commute, save a long article as an audiobook for the gym, or have a draft of your own writing read back to you for proofreading.
What Powers the Listen Feature
Behind the scenes, Type Shifter uses three components that all run locally:
- Kokoro-82M: An open-source neural text-to-speech model. 82 million parameters, distilled from larger models for fast on-device synthesis. Quality is roughly comparable to commercial offerings like Amazon Polly Neural and Microsoft Azure Neural.
- WebGPU / WebAssembly runtime: Runs the neural model in your browser. WebGPU uses your graphics card for acceleration; WebAssembly uses your CPU as a fallback. The system picks the faster option automatically.
- lamejs MP3 encoder: A pure-JavaScript MP3 encoder. Compresses Kokoro's raw audio into MP3 files that play anywhere.
The Listen Panel
Look directly above the document preview canvas. You'll find a cyan-bordered panel with several controls. The controls reveal themselves as you use the feature: pick a voice and click Listen, and the save buttons appear.
- Voice dropdown: 28 neural voices grouped by accent and gender.
- Speed slider: Adjust playback from 0.5x (slow) to 2.0x (fast). Default is 1.0x.
- π Listen (cyan): Start reading the formatted text aloud. Also pauses and resumes mid-playback.
- βΉ Stop (red): Stop playback immediately and cancel any audio still being generated.
- πΎ Save Recording (green): Save what's been played so far as an MP3.
- π₯ Save Full Doc (purple): Generate the entire document as an MP3 in the background.
- π Save Selection (orange): Save only the highlighted portion as an MP3.
The 28 Voices in Full
All 28 voices are English, but spread across two accents and both genders so you can pick something that matches the document's tone or your personal preference. Switching between voices is instant because they all share the same model file.
π¬π§ British Female (8 voices)
- Emma: Calm, conversational, warm. The default British voice. Excellent for general reading and audiobooks.
- Isabella: Slightly brighter and more youthful than Emma. Good for blog posts and articles.
- Alice: Soft, gentle, slightly slower delivery. Excellent for poetry and reflective writing.
- Lily: Bright and cheerful. Best for short, upbeat content like marketing copy or social posts.
- Charlotte: Formal and precise. Reads journals, legal text, and academic papers clearly.
- Sophia: Mature, authoritative tone. Excellent for business writing and news.
- Olivia: Neutral, professional. A safe choice for technical documents.
- Mia: Energetic, conversational. Best for storytelling and informal writing.
π¬π§ British Male (6 voices)
- Daniel: Calm, measured, slightly deep. The default British male voice. Excellent for non-fiction.
- Henry: Authoritative and clear. Best for news, research papers, and documentaries.
- James: Warmer and friendlier than Henry. Best for fiction and personal essays.
- Oliver: Slightly faster pace, brighter tone. Best for tutorials and how-to content.
- William: Mature, gravelly. Best for historical content and serious analysis.
- George: Mid-range, neutral. A reliable all-purpose male voice.
πΊπΈ American Female (8 voices)
- Bella: Particularly natural-sounding, expressive intonation. A standout voice for fiction and casual content.
- Nicole: Bright, clear, energetic. Best for marketing and educational content.
- Sarah: Warm, friendly, slightly conversational. Best for blog posts and personal writing.
- Sky: Soft and gentle. Best for meditation, poetry, and slow reads.
- Adam: Versatile, neutral tone (despite the name, listed as female in the Kokoro voice set). General-purpose.
- Heart: Emotive, expressive. Best for narrative fiction and emotional writing.
- Bonbon: Cheerful, upbeat. Best for short, light content.
- Aurora: Calm and slightly ethereal. Best for reflective and philosophical writing.
πΊπΈ American Male (6 voices)
- Michael: The default American male voice. Clear, neutral, professional. Excellent for almost any document type.
- Eric: Casual, conversational. Best for blog posts and personal essays.
- Liam: Bright, slightly youthful. Best for tutorials and how-to content.
- Onyx: Deeper, more resonant. Best for serious non-fiction, documentaries, and audiobooks.
- Echo: Mid-range, neutral. A reliable all-purpose American male voice.
- Fenrir: Authoritative and gravelly. Best for historical and dramatic content.
π‘ Trying Voices Out
Before committing to a long generation, try a voice on a short paragraph. Click anywhere in the preview to set the cursor, hit Listen, and stop after a few sentences. Compare three or four voices before picking one for an audiobook. Some voices that look generic on paper are surprisingly characterful in practice (Bella, Onyx, Heart in particular).
The Speed Slider
The speed slider sits next to the voice dropdown and adjusts playback rate from 0.5x (half speed) to 2.0x (double speed). The default is 1.0x (natural pace). Speed changes apply mid-playback without restarting, so you can drag the slider while you're listening and the rate adjusts smoothly.
Practical rates:
- 0.5x to 0.75x: Useful for language learning, poetry, or technical content where you want to absorb each word.
- 1.0x: Natural speaking pace. Good for fiction and content you're hearing for the first time.
- 1.25x to 1.5x: The audiobook sweet spot. Most listeners find this comfortable for non-fiction. Saves significant time on long content.
- 1.75x to 2.0x: Fast skim mode. The voice stays intelligible but starts to sound rushed. Best for material you already know.
The speed setting is applied at playback time, not generation time. So when you Save Full Doc, the speed control is irrelevant for the saved file; the MP3 is always generated at natural pace and your playback app (Apple Books, Pocket Casts, etc.) can speed-shift the file itself.
WebGPU vs WebAssembly
Generating neural-quality voice in real time is computationally expensive. Type Shifter uses whichever acceleration path your browser supports best, automatically.
WebGPU (faster, modern browsers with a graphics card)
WebGPU is the new browser standard for running computations on the graphics card. When available, Type Shifter uses it to generate audio roughly 5 to 10 times faster than the CPU path. A one-hour audiobook generates in about 5 to 8 minutes.
WebGPU is supported in:
- Chrome 113+ on Windows, macOS, ChromeOS
- Edge 113+ on Windows, macOS
- Firefox 121+ on Windows (experimental on macOS and Linux)
- Safari 18+ on macOS 15 and iOS 18 (newer Apple devices)
WebAssembly (universal fallback)
If your browser doesn't support WebGPU, or your GPU isn't compatible, Type Shifter falls back to WebAssembly. This runs the same neural model on the CPU instead. Quality is identical; the only difference is speed. A one-hour audiobook generates in about 30 to 50 minutes.
You can check which mode is active by opening the browser's developer console (F12 in most browsers, then click the Console tab) while pressing Listen. If you see "WebGPU initialised", you're on the fast path. If you see "WebGPU failed, falling back to WebAssembly", your hardware doesn't support GPU acceleration, but the feature still works.
Step-by-Step: Listen to Your Document
1. Format Your Text First
Type Shifter reads the text in your shifted output, not the input area. Paste, upload, or scan some text, then click SHIFT MY TEXT so the formatted version appears in the preview canvas. If you forget this step, the Listen button will be disabled.
2. Pick a Voice
Open the Voice dropdown and choose one. The list is sorted by accent and gender. Switching voices is instant.
3. Press π Listen
First time only: The neural voice model downloads in the background (around 80 MB, one-time, cached forever afterwards). You'll see "Downloading voice model... 47%" while it works. On a decent broadband connection, this takes 30 seconds to 2 minutes. Every subsequent use skips this download entirely because the model is cached.
Every other time: Generation typically takes under a second on WebGPU, or 3 to 5 seconds on WebAssembly, before audio starts playing. The text is split into ~250-character chunks; the first chunk plays while the second is being generated in parallel, so playback is gapless.
4. Adjust Speed if Needed
Drag the slider during playback. The voice rate updates smoothly. Find your sweet spot and you'll likely use the same speed across many documents.
5. Start from a Specific Point (Optional)
If you only want to hear part of a long document, click anywhere inside the formatted preview before pressing Listen. The cursor position determines where reading starts. Brilliant for resuming where you left off, skipping the front matter of a book chapter, or hearing just the conclusion of a research paper.
Pause and Resume vs Stop
The Listen button doubles as pause and resume. The Stop button is separate and behaves differently. Knowing the difference saves frustration on long generations.
- Pause (Listen button while playing): Click π Listen during playback to pause. The current chunk finishes playing, then audio stops. The next chunk stays loaded in memory.
- Resume (Listen button while paused): Click π Listen again to continue from where you paused. No re-generation needed.
- Stop (Stop button): Click βΉ Stop to cancel playback entirely. Any chunks still being generated are discarded. Pressing Listen again starts over from the current cursor position.
Rule of thumb: pause for short breaks (answering the door, taking a sip of tea), stop when you're done or want to start over with different settings.
Saving Audio as MP3
Three save buttons sit alongside Listen and Stop. They serve different use cases. They appear once there is text in the preview and a voice selected.
πΎ Save Recording (green)
Saves whatever has been read aloud during the current Listen session. The save button captures exactly what you heard, including any partial chunks at the end if you stopped mid-sentence.
- When to use it: You started listening, decided you wanted a saved copy of what you'd heard so far, and clicked save before finishing.
- What you get: An MP3 of every chunk that was played, in order. If playback finished, you get the whole document.
- When it's enabled: As soon as at least one chunk has finished playing. Until then, the button stays greyed out.
- Filename: The MP3 is named after the first few words of the document, with the date appended. For example:
type-shifter-the-quick-brown-fox-2026-05-15.mp3.
π₯ Save Full Doc (purple)
The most powerful save option. Generates the entire document silently in the background without playing audio, then downloads the resulting MP3 automatically when finished. You don't need to sit through the playback or keep the audio playing.
- When to use it: You want a complete audiobook of a long document to listen to later, offline, on a different device. Or you want all of it but don't want to sit through 47 minutes of TTS.
- Live progress counter: The button text updates to show progress: π₯ 7 of 47, then π₯ 8 of 47, and so on. Once two or three chunks have been generated, an estimated time remaining appears above the button: "About 4 minutes left".
- What happens during generation: Your browser uses your GPU (or CPU on WebAssembly). You can still use the rest of the app for things that don't need the GPU. You can change tabs, but keep the Type Shifter tab open so the generation doesn't get throttled. If you close the tab, generation aborts.
- How long it takes: Roughly 1 second of audio per 1 second of WebGPU generation (1:1 ratio), or 1 minute of audio per 1 minute of WebAssembly. So a 47-minute audiobook needs ~47 minutes on WebGPU, ~3-4 hours on WebAssembly. Plan accordingly.
- What you get: A single MP3 file containing the entire document, gapless between chunks.
- Mobile note: Save Full Doc works on phone browsers but is much slower because most phones don't expose WebGPU. For long documents, generate on a laptop and transfer the file.
π Save Selection (orange)
Saves only the text you've highlighted in the preview canvas. Useful for pulling key passages out of a long document without saving the whole thing.
- When to use it: You want just the abstract of a research paper, the conclusions section of a report, a specific quote, or a particular chapter of a longer document.
- How to highlight: Click and drag across the text in the preview canvas (same as selecting text anywhere on the web). Touch and drag on mobile. The selection can span multiple paragraphs.
- What happens with partial sentences: If your selection cuts off mid-sentence, the audio reads exactly the selected portion. The voice doesn't try to be clever and round to the nearest sentence; what you highlight is what you get.
- Why this exists: Faster than Save Full Doc for short clips, and gives you finer control than waiting for Listen to reach the bit you want.
Step-by-Step: Make an Audiobook of a Long Document
1. Get the Text In
Paste, upload, or scan a document. Long articles, book chapters, research papers, even entire short books work, as long as the file fits within the 10 MB cap.
2. Apply a Template (Optional)
The audio reads the underlying text, not the visual styling, so the template you pick doesn't change the voice. But formatting it nicely first means if you also want to read along while you listen, the on-screen version looks right.
3. Try a Voice for a Sentence or Two
Click somewhere near the start, hit Listen, hear how it sounds, click Stop. Switch voice if you want, try again. Repeat until you're happy.
4. Click π₯ Save Full Doc
Type Shifter generates the whole document silently. The button counter updates: "π₯ 1 of 47", "π₯ 2 of 47", and so on. The progress strip above the panel shows estimated time remaining once two or three chunks have been generated.
Make a cup of tea. You don't need to watch the screen, but keep the tab open. When generation finishes, the MP3 downloads automatically.
5. Transfer to Your Listening Device
The MP3 is a standard audio file. Drop it into Apple Books, sync via iCloud or Google Drive, copy to a USB stick, email it to yourself, or upload to a podcast app that supports private feeds. Plays anywhere MP3 plays.
Audio Quality and File Specs
π Technical Details
- Format: MP3 (MPEG-1 Audio Layer III)
- Bitrate: 128 kbps constant bitrate (CBR)
- Sample rate: 24 kHz mono (Kokoro's native output)
- Channels: 1 (mono. The model only outputs mono, which is fine for speech)
- Approximate file size: 1 MB per minute of audio
- 10-minute reading: ~10 MB
- 30-minute audiobook: ~30 MB
- 1-hour audiobook: ~60 MB
- 3-hour audiobook: ~180 MB
128 kbps is high quality for spoken word. Music typically wants 192-320 kbps for transparency, but speech only uses a narrow frequency range and 128 kbps captures it cleanly without bloating file size.
Where MP3 Files Are Saved
The MP3 saves to your browser's default download folder, exactly like any other file you'd download from a website. Specifically:
- Windows:
C:\Users\YourName\Downloads\ - macOS:
~/Downloads/ - iOS Safari: Files app β "Downloads" folder (or wherever your default download location is set)
- Android Chrome: Internal Storage β Download folder, also visible in the Files app
Your browser may prompt to confirm the download. If you've disabled the "Ask where to save each file" setting, the MP3 saves silently to your default Downloads folder.
Combining Listen with Other Features
The most useful Type Shifter workflows combine Listen with other features:
- Listen + Bionic Reading: Listen to a document while also seeing the Bionic-formatted version on screen. Both the voice and the visual cue reinforce each other, useful for active comprehension.
- Listen + Dark Mode: Save your eyes at night while still consuming content. Great for evening reading sessions.
- Listen + Compare Mode: Hear the original text while comparing it visually against your edited version (or vice versa).
- Listen + OCR: Photograph a printed book or article, OCR it, then save as an audiobook. Effectively turns any printed material into a personal audiobook.
- Listen + Speed: Use 1.5x for non-fiction to save time, drop to 0.75x when you encounter technical sections you want to absorb carefully.
Limitations and Known Issues
- English only currently. All 28 voices are English, with British or American accents. Other languages may come in a future update but are not supported today.
- No voice cloning. You cannot upload a sample of your own voice to clone it. The 28 voices are the only options.
- Numbers and abbreviations. Most numbers, dates, and common abbreviations are pronounced correctly, but obscure abbreviations (industry-specific jargon, drug names, place names) may be read letter-by-letter. Edit the input to spell them out phonetically if needed.
- Punctuation matters. Kokoro uses punctuation to determine pacing and intonation. Documents with poor punctuation (missing full stops, run-on sentences) will sound rushed or unnatural. Add punctuation before generating if needed.
- Very long sentences. Sentences over ~250 characters get split into multiple chunks at the nearest punctuation. This is invisible in playback but means very long sentences without commas may sound slightly choppy at the split points.
- Mobile WebGPU. Most phone browsers don't yet expose WebGPU, so generation is slow on phones. Save Full Doc on a laptop and transfer the file for the best experience.
π Privacy Guarantee
The text-to-speech model, the WebGPU/WebAssembly runtime, and the MP3 encoder all run entirely inside your browser. The text you generate audio for never leaves your device. No API calls, no server in the loop, no logs of what you generated. Suitable for sensitive documents, medical letters, draft writing, or anything personal. Verify this yourself by opening the browser's Network tab (F12 β Network) while generating audio; you'll see exactly one initial model download (cached forever after) and no traffic during actual generation.
π· Image OCR (Scan Photos and Screenshots)
Type Shifter can extract editable text from any image you drop into the upload zone. Photos of book pages, screenshots of articles, scanned PDFs, printed letters, recipes from a cookbook, even photographs of street signs. The recognition runs entirely inside your browser using Tesseract.js with the high-accuracy English data file, so your images never leave your device, there's no usage limit, and it works offline after the first run.
OCR (Optical Character Recognition) is the technology that converts pictures of text into actual machine-readable text. Done well, it bridges the gap between paper and digital, turning a 200-page printed book into something you can search, edit, listen to, or convert into other formats.
What Powers the OCR Feature
Three components work together to make in-browser OCR fast and accurate:
- Tesseract.js: A JavaScript port of Google's open-source Tesseract OCR engine. Version 5 with LSTM neural network recognition, the same engine used in Google Drive's OCR scanning. Apache 2.0 licensed.
- tessdata_best (English): The highest-accuracy English language data file available. About 50 MB compared to ~10 MB for the default file. Roughly five times larger, but significantly more accurate on difficult images.
- Image preprocessor: A Type Shifter pre-processing step that prepares each image before sending it to Tesseract. 2x upscale, greyscale conversion, and contrast boost to make text edges sharper.
How to Use OCR
1. Get Your Image Ready
Best results come from clean, well-lit images of printed text. The clearer the source, the better the OCR result. A few guidelines:
- Photos: Take from directly above the page (not at an angle). Use bright, even lighting; avoid shadows from your hand or phone. Hold the camera steady or lean on a surface. Most modern phone cameras produce excellent OCR source material when used correctly.
- Screenshots: These are inherently pixel-clean and produce excellent results. Crop out status bars and app navigation chrome before uploading for the very best output.
- Scans: 300 DPI is the OCR sweet spot. Lower resolutions still work but with reduced accuracy.
- Supported formats: JPG, JPEG, PNG, WebP, BMP, GIF, TIFF.
- Maximum file size: 10 MB.
2. Drop or Pick the Image
Drag the image onto the upload zone in the input panel (the area that says "Drop a file or click to upload"). On mobile, tap the upload zone to open a file picker, then choose either an image from your photo library or take a fresh photo with the camera. Either route works the same way.
3. Wait for OCR to Run
A cyan progress strip appears below the upload zone showing what's happening. You'll see four phases on the first run, three on every subsequent run:
- Phase 1: Loading OCR engine (first use only, ~2 MB). The Tesseract.js JavaScript engine downloads from a public CDN and is cached in your browser.
- Phase 2: Downloading English data (first use only, ~50 MB). The high-accuracy English language data file downloads and is cached. This is the longest phase on first use.
- Phase 3: Preparing image (every run). The image is upscaled 2x, converted to greyscale, and contrast-boosted for better text edge detection.
- Phase 4: Recognising text (every run). Tesseract actually does the OCR. Progress is shown as a percentage from 0 to 100.
On your first OCR run, the whole process takes 30 seconds to 2 minutes depending on internet speed (most of it is downloading the 50 MB data file). On every subsequent run it's just phases 3 and 4, typically 5-15 seconds for an average book page.
4. Review the Recognised Text
The text appears in the input area as soon as OCR finishes. A green toast at the bottom of the screen shows the character count ("Recognised 1,847 characters"). The text is in a normal editable textarea, so skim it for any obvious errors and fix them with the keyboard before proceeding.
5. Use the Text However You Want
From here, the OCR'd text behaves like any other text in Type Shifter:
- Click SHIFT MY TEXT to format it with any of the 60 templates.
- Click π Listen to have it read aloud (great for book chapters you've photographed).
- Click π₯ Save Full Doc to turn it into an MP3 audiobook.
- Click any export button to save as PDF, DOCX, EPUB, or HTML.
- Combine with Bionic Reading, OpenDyslexic, dark mode, and anything else.
Which Images Work Best
OCR quality depends heavily on the source image. From the recogniser's point of view, what matters is contrast (dark text on light background, or vice versa), resolution (text at least 30 pixels tall), sharpness (no motion blur), and geometry (text aligned, not at an angle).
Here's the rough order from best to worst:
- Excellent (95%+ accuracy): Screenshots of articles or webpages, scanned PDFs of typeset documents at 300+ DPI, photos of book pages taken straight-on with good lighting on a clean white background.
- Good (85-95% accuracy): Printed letters, magazine articles, journal papers, photos of receipts with clear printing, well-lit signs.
- Variable (60-85% accuracy): Newspaper photos (multi-column layouts can confuse the recogniser), photos taken at a slight angle, lower-resolution scans, slightly faded prints, photos with mixed lighting.
- Poor (under 60% accuracy): Cursive handwriting, faded or thermal-printed receipts (the kind that fade after weeks), very small text (under 30 pixels tall in the image), text overlaid on busy graphics, severely angled photos, motion-blurred images.
- Unsupported: Pure handwriting (Tesseract is trained on printed type), languages other than English (English-only currently), mathematical equations and scientific notation, and image-only logos where the "text" is really a stylised graphic.
How to Photograph Books and Pages Well
If you're photographing printed material, a few minutes of technique training dramatically improves OCR accuracy:
- Lighting: Use natural daylight or two soft lamps to eliminate shadows. A single light source casts a hand shadow that confuses the recogniser. Avoid yellow indoor light if you can; daylight or daylight-balanced LED produces cleaner results.
- Angle: Hold the phone directly above the page, parallel to the page surface. If you're at an angle, the text on the far side of the page becomes smaller in the image and OCR accuracy on that side drops sharply.
- Distance: Fill the frame with the page (not your whole desk and a corner of the page). Closer means more pixels per character, which means more for Tesseract to work with.
- Focus: Tap on the text to set focus before shooting. Most phone cameras default to centre-focus, which is usually fine, but for safety tap to confirm.
- Stability: Use a tripod, lean your elbows on the desk, or rest the phone against a stack of books. Motion blur is the most common quality killer.
- Page flatness: Press down the book pages flat if you can, or photograph one page at a time rather than a spread. Curved page edges produce distorted text near the spine.
What Type Shifter Cleans Up Automatically
Raw OCR output is rarely clean. Every visual line in the image becomes a separate paragraph, hyphenated words get split across lines, and screenshot edges often contain garbled icon characters. Type Shifter applies several cleanup passes automatically before the text reaches the input area:
- Chrome-line trimming: Garbled status bar text and navigation icons at the very start and very end of the recognised text are detected and removed. The detector looks for lines that are mostly symbols (battery icons, time stamps, wifi bars) rather than letters. The body of the document is left untouched.
- Hyphenation rejoin: When a word is split across lines with a hyphen (like "pri-" on one line and "mary" on the next), the hyphen is removed and the word rejoined. This handles the typesetter's hyphenation convention found in books and printed articles.
- Soft-wrap collapse: Single line breaks within a paragraph (the visual line wraps from the image) become spaces, so sentences flow naturally instead of fragmenting. Without this step, a single sentence might span ten lines in the textarea.
- Paragraph preservation: Double line breaks (genuine paragraph boundaries in the source image) are kept intact. So you still get distinct paragraphs in the output, just with each paragraph flowing as one piece.
All four steps run automatically. There's no setting to disable them, but they're conservative. They only act on patterns that are clearly typesetting artefacts rather than intentional formatting.
π‘ Pro Tip for Screenshots
For best results on phone or tablet screenshots, crop out the status bar (with battery, signal, time) and any navigation icons at the bottom before uploading. The chrome-line trimmer catches most of this automatically, but a clean crop produces noticeably better OCR. iOS Photos and Android Photos both have a built-in crop tool.
Common OCR Errors to Look For
Tesseract is excellent but not perfect, especially on lower-quality source images. After scanning, skim the input area for these common mistakes:
- Number and letter swaps: "0" for "O", "1" for "l" or "I", "5" for "S", "8" for "B", "6" for "G". Most common in monospace fonts (computer printouts) or sans-serif fonts where letter shapes are simpler.
- Missing punctuation: Full stops, commas, and apostrophes sometimes drop out on lower-resolution images. The recogniser is conservative about punctuation it isn't sure of.
- Joined or split words: "twowords" instead of "two words" (happens when the space is narrow), or "p arsley" instead of "parsley" (happens when a character has a small internal gap).
- Capitalisation drift: Headings in decorative fonts sometimes come out with mixed case. "CHAPTER 1" might become "ChAPTER 1" or "ChApTer 1".
- Mis-recognised similar characters: "rn" looks like "m", "cl" looks like "d", "vv" looks like "w". Look out for these in the words.
- Apostrophes flipped: Curly typographer's quotes (' ') sometimes come out as straight quotes ('), and contractions like "don't" become "don t" if the apostrophe is unclear.
- Bullet points and list markers: Some bullet glyphs come through as letters or symbols rather than β’. Easy to fix manually.
Fix these in the input area before clicking SHIFT MY TEXT or Listen. The textarea is fully editable, so you can use keyboard shortcuts like Ctrl+F (or Cmd+F on Mac) in some browsers to find and replace.
The Image Preprocessing Pipeline
Before sending your image to Tesseract, Type Shifter runs three preprocessing steps in a small canvas. These dramatically improve recognition accuracy on real-world photos and screenshots:
- 2x upscale. The image is scaled up to twice its original dimensions using bilinear interpolation. Tesseract is happiest with characters that are at least 30 pixels tall; upscaling small text first gives the recogniser more pixels to work with.
- Greyscale conversion. Colour is discarded. OCR doesn't use colour information, so removing it speeds up processing and prevents background colours from interfering with character recognition.
- Contrast boost (1.4x). A contrast multiplier of 1.4 is applied, deepening blacks and brightening whites. This sharpens text edges and reduces grey "halo" around characters in low-contrast images.
These run automatically. There's no setting to adjust them, but they're tuned for the common case of "photo of printed page" or "screenshot of webpage" and rarely hurt.
Page Segmentation Mode
Tesseract has different modes for how to interpret the layout of an image. We use mode 6 (PSM 6), which assumes the image contains a single block of uniform text. This is the right choice for most book pages, articles, and screenshots.
If you're scanning a complex layout (multi-column newspaper, a heavily-formatted brochure, a webpage with sidebar and main content), the recogniser may try to read the columns in unexpected order. For best results on complex layouts, crop the image to a single column before OCR, then paste each column separately.
Combining OCR With Other Features
The most useful Type Shifter workflows combine OCR with other features:
- OCR + Listen + Save Full Doc: Photograph a book chapter, OCR it, save as MP3, listen on your commute. Effectively turns any printed material you own into a personal audiobook. Particularly useful for textbooks, academic papers, and out-of-print books that aren't available as audiobooks.
- OCR + Bionic Reading: Scan a difficult-to-read photograph or faded document, then turn on Bionic Reading on the result. Particularly helpful for dyslexic readers and ADHD users who struggle with dense printed material. The OCR makes the text editable, and Bionic Reading makes it visually easier to track.
- OCR + Templates: Scan a recipe from a cookbook, apply the Kitchen Recipe template, export as PDF. Suddenly your printed-only recipes become digital, searchable, shareable, and printable in your preferred format.
- OCR + Export to DOCX: Convert scanned PDFs (image-only) into proper editable DOCX files. Useful for older documents, archived PDFs that were saved as images rather than text, or printed letters you want to digitise for editing.
- OCR + Dark Mode: Read photographed material on screen at night without the original page's bright white background straining your eyes. The OCR'd text uses Type Shifter's dark theme regardless of the source image colours.
- OCR + Save Selection: Photograph an entire book page, OCR it, highlight just the paragraph you want, save as MP3. Perfect for pulling key quotes out of longer printed material.
Why English Only (For Now)
Type Shifter currently ships with only the English language data because each additional language adds another ~50 MB download. Bundling all of Tesseract's supported languages would balloon the first-use download to several hundred megabytes, which is impractical for a web app.
Tesseract itself supports over 100 languages including French, Spanish, German, Italian, Dutch, Portuguese, Russian, Chinese (Simplified and Traditional), Japanese, Korean, Arabic, Hindi, and many more. A future update may add an optional language picker that downloads additional language data on demand. For now, English-only.
First-Use Performance Notes
The first time you use OCR, your browser downloads two assets:
- The Tesseract.js engine itself (~2 MB JavaScript bundle).
- The high-accuracy English language data, also known as
tessdata_bestfor English (~50 MB).
Both get cached permanently in your browser's IndexedDB storage. Subsequent OCR runs skip these downloads entirely, so all you wait for is image preprocessing (1-2 seconds) and actual recognition (5-15 seconds for an average page).
We deliberately chose the higher-accuracy data file (about five times larger than the default tessdata_fast) because the quality difference is significant on real-world images. The default file works fine for very clean scans, but stumbles on photos, faded prints, and unusual fonts where the better file shines.
OCR Limitations and Known Issues
- English only currently. See above.
- Handwriting is not supported. Tesseract is trained on printed type. Cursive and most handwriting will produce gibberish.
- Mathematical equations. Symbols, fractions, integrals, and Greek letters often come out wrong. OCR for math is a specialised field; consider tools like Mathpix for equation-heavy content.
- Multi-column layouts. Tesseract reads top-to-bottom, left-to-right. Multi-column newspapers may produce text where columns interleave unexpectedly. Crop to one column at a time for best results.
- Tables. Table structure is not preserved. Tables come out as plain text with the cells flowed together.
- Images within images. OCR ignores graphics and reads only the printed text, but illustrated content (textbooks, comics) may produce text that's missing context.
- Rotated text. Text rotated 90 degrees (like book spines) is not auto-detected. Rotate the image before uploading.
- Very small text. Text under ~30 pixels tall in the image is unreliable. Re-photograph closer if the result is poor.
π Privacy Guarantee
Both the OCR engine (Tesseract.js) and the English language data are downloaded once from a public CDN (unpkg) and cached in your browser's storage. From that point on, no network requests are made during recognition. Your images stay on your device entirely. The image is loaded into a JavaScript canvas, preprocessed in memory, passed to the OCR worker (which runs in the same browser process), and the resulting text is placed in the input area. No upload, no server, no logs. Verify this yourself by opening the browser's Network tab (F12 β Network) while running OCR; you'll see only the initial library downloads (cached forever after) and nothing during actual recognition.
π Import & Export
Supported Import Formats
Type Shifter can read the following file formats:
- TXT - Plain text files. The simplest format, works universally.
- RTF - Rich Text Format. Preserves basic formatting from word processors.
- DOCX - Microsoft Word documents (2007 and later). Extracts text content while preserving paragraph structure.
- PDF - Portable Document Format. Text is extracted from the PDF (note: scanned/image PDFs may not extract properly).
- MD - Markdown files. Popular with writers and developers. Markdown formatting is converted to proper headings and lists.
- EPUB - E-book format. Extract text from e-books for reformatting.
- HTML - Web pages. Useful for reformatting online articles.
π File Size Limit
Maximum file size is 10MB. For larger documents, consider splitting them into smaller sections or using the desktop app which handles larger files more efficiently.
Export Formats
Export your transformed documents in these formats:
-
HTML - Best for web viewing
Opens in any browser. Preserves all formatting including dark mode. Ideal for sharing online or viewing on any device. Smallest file size. -
DOCX - Best for editing
Opens in Microsoft Word, Google Docs, LibreOffice. Allows further editing. Good for documents you need to modify or collaborate on. -
EPUB - Best for e-readers
Works with Kobo, Apple Books, and most e-readers. Can be converted to Kindle format. Perfect for reading long documents on dedicated e-ink devices. -
PDF - Best for printing and sharing
Universal format that looks the same everywhere. Ideal for printing, archiving, and sharing when you don't want the content edited.
π‘ Export Tips
Dark mode is preserved in HTML and EPUB exports. For PDF exports, dark mode creates a dark-background document - make sure that's what you want before exporting for print.
βΏ Accessibility Features
OpenDyslexic Font
OpenDyslexic is a free, open-source typeface specifically designed to increase readability for people with dyslexia. It's included in Type Shifter and can be enabled with one click.
Key features of OpenDyslexic:
- Weighted bottoms: Each letter has a heavier bottom, which helps prevent visual "flipping" or rotation - a common issue for dyslexic readers.
- Unique letter shapes: Similar letters like b, d, p, and q have distinct shapes to prevent confusion.
- Increased spacing: More space between letters reduces visual crowding.
- Consistent baseline: Helps eyes track along lines more smoothly.
Who benefits:
- People with dyslexia (affects approximately 10% of the population)
- Readers who experience visual stress or fatigue
- Anyone who finds standard fonts difficult to read
- Students and professionals who read for extended periods
π€ Combining Features
OpenDyslexic works excellently with Bionic Reading. The "Dyslexia Friendly" template automatically enables both features together for maximum readability improvement.
Custom Fonts
Type Shifter includes over 1,900 Google Fonts you can apply to your documents, with separate controls for heading and body fonts:
- Sans-Serif: Including Inter, Roboto, Open Sans, Lato, Montserrat, and hundreds more
- Serif: Including Merriweather, Playfair Display, Lora, Georgia, and many more
- Display & Handwriting: Including Abril Fatface, Bebas Neue, Caveat, Cinzel, and many more
- Monospace: Including Fira Code, JetBrains Mono, Source Code Pro, and more
Custom fonts are found in the "Customize Sizes & Colours" section. Note: When OpenDyslexic mode is active, it takes priority over custom font selections.
Light & Dark App Theme
Use the toggle switch at the top of the app to switch between light and dark themes. Your preference is saved automatically and persists between sessions.
Dark Mode Output
Dark mode output inverts the colour scheme of your exported documents to display light text on a dark background. Benefits include:
- Reduced eye strain: Especially beneficial when reading at night or in low-light environments
- Less blue light: Dark backgrounds emit less blue light, which may help with sleep patterns
- Battery saving: On OLED screens, dark mode can significantly reduce battery consumption
- Reduced glare: White backgrounds can cause uncomfortable glare in dark environments
Dark mode is preserved when you export to HTML and EPUB formats.
π§ Troubleshooting
Having issues? Find solutions to common problems below. If your issue isn't listed, contact us at [email protected].
My file won't upload
Common causes and solutions:
- File too large: Maximum size is 10MB. Try splitting the document into smaller sections.
- Unsupported format: Ensure your file is TXT, RTF, DOCX, PDF, MD, EPUB, or HTML. Other formats like DOC (old Word) are not supported.
- Corrupted file: Try opening the file in another application first to verify it works correctly.
- Browser issue: Try refreshing the page, clearing cache, or using a different browser (Chrome, Firefox, and Edge work best).
- File permissions: Ensure the file isn't open in another program or protected by permissions.
Text formatting looks wrong
To get the best formatting results:
- Use clear paragraph breaks: Type Shifter uses blank lines to identify separate paragraphs. Double-press Enter between paragraphs in your source text.
- Place headings on their own line: Headings should be on a separate line, not inline with paragraph text.
- Use standard list markers: Start list items with bullets (β’), dashes (-), asterisks (*), or numbers (1. 2. 3.).
- Check your source: If the original document has formatting issues, they may carry over. Try cleaning up the source first.
If a PDF is producing poor results, the PDF may be scanned (image-based) rather than text-based. Try a different source file if available.
Export isn't downloading
If your exported file isn't downloading:
- Check browser permissions: Some browsers block downloads. Look for a blocked download notification in the address bar.
- Check your Downloads folder: The file may have downloaded to a different location than expected.
- Try a different format: If PDF isn't working, try HTML or DOCX instead.
- Disable ad blockers: Some ad blockers interfere with file downloads. Try temporarily disabling them.
- Clear browser cache: Old cached data can sometimes cause issues.
- Try another browser: If issues persist, try Chrome, Firefox, or Edge.
The app isn't loading properly
If the web app doesn't load or displays incorrectly:
- Refresh the page: Press Ctrl+R (Windows) or Cmd+R (Mac)
- Clear browser cache: Go to browser settings β Privacy β Clear browsing data
- Update your browser: Ensure you're using the latest version
- Disable extensions: Browser extensions can interfere with web apps. Try incognito/private mode.
- Check internet connection: The web app requires internet to load initially
- Try another browser: Chrome, Firefox, Edge, and Safari all work well
If you consistently have issues with the web app, consider downloading the desktop app which doesn't require internet after installation.
Bionic Reading doesn't seem to be working
If Bionic Reading appears inactive:
- Check the toggle: Ensure the Bionic Reading toggle is switched ON (green)
- Adjust intensity: At very low intensities (5-10%), the effect may be subtle. Try increasing to 25-30%.
- Transform again: After toggling Bionic Reading, click "SHIFT MY TEXT" again to reprocess
- Check the output: Look closely at the transformed text - bolded letters may be subtle depending on the template font weight
My trial period questions
Common trial questions:
- When does my trial start? The 14-day trial begins when you first use the app, not when you download it.
- Can I extend my trial? The trial cannot be extended, but it includes full access to all features so you can thoroughly test the app.
- What happens when the trial ends? You'll see a message prompting you to purchase. Your settings and preferences are retained if you decide to buy.
- How do I purchase? Click the "Buy Premium" button or visit our website to purchase a lifetime license for Β£29.
- Is payment secure? Yes, all payments are processed securely through Stripe, a trusted payment platform.
Desktop app installation issues
Windows installation troubleshooting:
- Windows Defender warning: You may see a "Windows protected your PC" message. Click "More info" then "Run anyway" - this happens because we're a new app without an expensive code signing certificate.
- Installation fails: Right-click the installer and select "Run as administrator"
- App won't launch: Try restarting your computer after installation
- Missing icon: If the desktop shortcut shows wrong icon, right-click β Properties β Change Icon β Browse to the installed .exe file
The desktop app requires Windows 10 or later and approximately 100MB of disk space.
Still Need Help?
Can't find the answer you're looking for? Our support team is here to help.
Contact Support: [email protected]Response time: 24-48 hours for general enquiries, priority response for purchase issues.