Monday, September 7, 2009

Truncate string using bash script


Input file:

$ cat spears.txt
Baby-One-More-Time.mp3
Autumn-Goodbye.mp3
Baby-One-More-Time.mp3
Cant-Make-You-Love-Me.wmv
Crazy.mp3
Crazy---Stop-Remix.mp3
Dont-Go-Knocking-on-My-Door.mp3
Dont-Let-Me-Be-The-Last-to-Know.flv
From-The-Bottom-of-My-Broken-Heart.mp3
Im-Not-a-Girl-Not-Yet-a-Woman.mp3


Required: Truncate the lines of the above file (filename part, and not the extension) to 15 character long.
Also insert a string "..." in between the filename part and extension in case the line is truncated.

The bash script:

#!/bin/sh
#Bash Script to truncate string
#

while read filename
do
name=${filename%%.*}
extn=${filename##*.}
if [ ${#name} -gt 15 ]
then
nfile=$(echo $name | cut -c1-15)
fullname=${nfile}...${extn}
echo $fullname
else
echo $filename
fi
done < spears.txt > spears.txt.truncated

The output file produced after execution of the above bash script:

$ cat spears.txt.truncated
Baby-One-More-T...mp3
Autumn-Goodbye.mp3
Baby-One-More-T...mp3
Cant-Make-You-L...wmv
Crazy.mp3
Crazy---Stop-Re...mp3
Dont-Go-Knockin...mp3
Dont-Let-Me-Be-...flv
From-The-Bottom...mp3
Im-Not-a-Girl-N...mp3

9 comments:

Mahesh Kharvi said...

Using awk ...

awk -F. '{str="";if ( length($1) > 15 ) str="...";print substr($1,0,15)str"."$2}' spears.txt

Unknown said...

@Mahesh, thanks. Its useful.

Unknown said...

could you please elaborate (or post a hint) on the usage of the %% and ## in your code?

Unknown said...

I Just got it working for my purpose, but more with trial and error, help to really understand would be helpful...

Unknown said...

@tobias.opialla,

%%
---
Meaning:

${string%%substring}
It deletes the "longest" match of $substring from 'back' of $string

In this case: suppose

prompt> filename="my.txt.py"
prompt> echo ${filename%%.*}
my

string=filename
substring=.* (i.e a dot and anything, so the longest substring match will be ".txt.py" and it deletes that)

See this:

prompt> echo ${filename%%.py}
my.txt
prompt> echo ${filename%%py}
my.txt.


##
--
Meaning:

${string##substring}
It deletes the "longest" match of $substring from 'front' of $string.

In this case: suppose


See this:

$ echo ${filename##*.}
py

prompt> echo $filename
my.txt.py

prompt> echo ${filename##my}
.txt.py

prompt> echo ${filename##my.}
txt.py


Please let me know if you have any doubt on this.

Unknown said...

You can do a bit shorter like this :

<_code>
1 #!/bin/bash
2 #Bash Script to truncate string
3 #
4
5 while read filename
6 do
7 name=${filename%%.*}
8 extn=${filename##*.}
9 [ ${#name} -gt 15 ] && name=${name:0:15}...
10 echo $name.$extn
11 done < spears.txt #> spears.txt.truncated
<_/code>

Note the shebang, I changed it to bash. Indeed, it seems sh doesn't manage to handle substring expansion (like ${parameter:offset:length}). But with bash (GNU bash, version 3.2.39) it's all right.

Unknown said...

Furthermore, it would be more accurate to use the following to extract the name from the filename :

<_code>
name=${filename%.*}
<_/code>

with just 1 '%'.

If you had filenames with several dots inside (which might not be recommanded but can exist anyway) it would give you a better result.

Let's look at a file named "Baby.One.More.Time.mp3" : the 2-% version results in "Baby" whereas the 1-% version gives "Baby.One.More.Time".

Unknown said...

Verry camarade ! o_O

Unknown said...

A robust way to determine filenames and their extensions is the following:

filename=${1%.*}; # strip from the right to the nearest ., what remains is filename
ext=${1#$filename}; # strip from the left to the nearest filename determined above
ext=${ext#.}; # then strip the leading . also

© Jadu Saikia www.UNIXCL.com