[kwlug-disc] weird characters in files

Brad Bierman bbierman42 at gmail.com
Thu Dec 10 15:17:54 EST 2009


Sounds like you have non printable characters and nano and vi are just
trying to present them.

I would parse the file in binary mode and remove all values that are
not white space, punctuation or alphanumeric.

To see what the characters are open up the file in vi(m).

Once the file is open type :%!xxd

This will bring you into a hex editor.  On the right side is the
ASCII, on the left is the offset into the file.

Once you find the offending character you will see the associated hex
number.  You can filter out the hex value by substituting it with the
hex value for space 0x20.


Hope this cryptic response helps.
Brad

On Thu, Dec 10, 2009 at 2:47 PM, Insurance Squared Inc.
<gcooke at insurancesquared.com> wrote:
> I've got some 'text' files created by an OCR program.  Some of the text
> files have the occassional weird character in them that is causing issues
> when I import.  How can I get rid of them from the command prompt?
>
> When I 'nano' one file, it shows a question mark with a white background.
>  When I view the file with vi, not that I use vi :) , I see <97> where the
> character is - probably the decimal representation.
>
> I tried "perl - p -i -e 's/?//g' *" and "perl -p -i -e 's/\<97\>/g' *" as a
> search and replace but neither removed the character from the file.   Grep
> doesn't find the characters either.
> g
>
> _______________________________________________
> kwlug-disc_kwlug.org mailing list
> kwlug-disc_kwlug.org at kwlug.org
> http://astoria.ccjclearline.com/mailman/listinfo/kwlug-disc_kwlug.org
>



-- 
http://www.google.com/profiles/bbierman42




More information about the kwlug-disc mailing list