Text Extraction Process

MText extraction from non-editable file types/ Content Extraction and Recreation: Content for translation may not always be available in a ready-to-edit format. Often, sensitive or highly specialized content, for example, legal documentation, medical files or technical manuals, may only be available as hard copy or scanned pages. We’re able to deploy tools such as optical character recognition, combined with expert human editors to quickly convert source materials into editable files which can work further on by your translation teams

This also means we can supply our customers with editable versions of both the input and output files, which is helpful for any company seeking to digitize their archives. From scanned pages of typewritten text to handwriting and even hand-drawn diagrams and tables, our expert team will extract the necessary content and create an accurate yet editable match of the original item.

File formats where text extraction is often required include InDesign (.indd), QuarkXPress (.qxp), PDF documents (.pdf), Adobe Photoshop (.psd) and PowerPoint (.ppt). Expert engineers go through your ‘non-editable’ content and extract it to an editable format (e.g. in Microsoft Word, the LanguageWire Editor or other CAT tools). When the text has been translated, proofread or received another language service, it is then added back into your document in a way that it looks visually as close as possible to the source content.

Share This Story, Choose Your Platform!