[kwlug-disc] weird characters in files

Khalid Baheyeldin kb at 2bits.com
Thu Dec 10 15:25:34 EST 2009


You can use the hd command to see what the exact hex value is.

To remove all non printable characters from a file, you can do:

tr -cd '\11\12\15\40-\176' < inputfile.txt > outputfile.txt


This uses octal, where \11 is tab, \12 is linefeed, and \15 is carriage
return. \40 to \176 is the normal ASCII characters.

You can adjust the above for more (or less) characters by looking up the
values in the output of the 'ascii' command.

On Thu, Dec 10, 2009 at 2:47 PM, Insurance Squared Inc. <
gcooke at insurancesquared.com> wrote:

> I've got some 'text' files created by an OCR program.  Some of the text
> files have the occassional weird character in them that is causing issues
> when I import.  How can I get rid of them from the command prompt?
>
> When I 'nano' one file, it shows a question mark with a white background.
>  When I view the file with vi, not that I use vi :) , I see <97> where the
> character is - probably the decimal representation.
>
> I tried "perl - p -i -e 's/?//g' *" and "perl -p -i -e 's/\<97\>/g' *" as a
> search and replace but neither removed the character from the file.   Grep
> doesn't find the characters either.
> g
>
> _______________________________________________
> kwlug-disc_kwlug.org mailing list
> kwlug-disc_kwlug.org at kwlug.org
> http://astoria.ccjclearline.com/mailman/listinfo/kwlug-disc_kwlug.org
>



-- 
Khalid M. Baheyeldin
2bits.com, Inc.
http://2bits.com
Drupal optimization, development, customization and consulting.
Simplicity is prerequisite for reliability. --  Edsger W.Dijkstra
Simplicity is the ultimate sophistication. --   Leonardo da Vinci
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://astoria.ccjclearline.com/pipermail/kwlug-disc_kwlug.org/attachments/20091210/d9f3dcc8/attachment.html>


More information about the kwlug-disc_kwlug.org mailing list