Using tr to strip strange characters from a text file…

I had a text file that had several weird characters in it and I was struggling to remove them using substitution in vi.

^X This is my test file ^Y ^S It^Xs full of problems ^\ like this ^]

First find the octal number for the problem characters. Use  od -c on a sample problem text (seriously – not the whole file, it’s very difficult to read output)

me@home2 # od -c mel.txt
0000000 030       T   h   i   s       i   s       m   y       t   e   s
0000020   t       f   i   l   e     031     023       I   t 030   s
0000040   f   u   l   l       o   f       p   r   o   b   l   e   m   s
0000060     034       l   i   k   e       t   h   i   s     035      \n

You can see from this my problem characters are octal 031030 023 034 035

From context I can guess what these characters should be – 030 and 031 are single quotes, 034 and 035 are double quotes and 023 is a -.

I’m going to use tr ‘<string1>’ ‘<string2’ What this format of the tr command does is when it encounters a character in <string1> it substitutes it with the character in the same position in <string2>. The only complication in my case is that I need to substitute in ‘ characters so I’m going to specify their octal value.

cat mel.txt | tr '2334353130' '-""4747'
' This is my test file ' - It's full of problems " like this "

I’ve been left with some padding spaces around my replaced characters, but these are pretty simple to remove in vi.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s