How to change PDF text encoding ? (ANSI to UNICODE)

I have this problem with a PDF I am trying to copy the text from… I have this text in a pdf and I need to insert in into a HTML page, the problem is that when I copy the text some of the letters(the one with diacritics(like: Ț or Ș) are being left out, the words containing them are not correct anymore…

I found out that this is because the PDF is using ASNI font encoding while the browser uses UNICODE … how can I change the ANSI encoding in the PDF to transform it to UNICODE ?

Answer

If the problem is indeed what you describe, Notepad++ should do what you want, it’s free. Create a new document in Notepad++, make sure ‘Encode in ANSI’ is selected in the Encoding menu, paste the text there, then choose ‘Convert to UTF-8 without BOM’ in the Encoding menu.

You can also try using Decoder, a free online tool for fixing encoding problems. It’s in Russian, but usage is pretty straightforward – paste mangled text into the text box and hit the button that says “Расшифровать”.

Attribution
Source : Link , Question Author : Flavius Frantz , Answer Author : kotekzot

Leave a Comment