EX.NO:4a
TEXTPROCESSING WITH PERL
24.07.12
AIM:
To learn how to do
some common text processing tasks using Perl.
INTRODUCTION:
Perl is a high-level, general-purpose, interpreted, dynamic
programming language.
Perl
was originally developed by Larry Wall in 1987 as a general-purpose
Unix scripting language to make report processing easier. Perl
borrows features from other programming languages including C, shell
scripting(sh), AWK, and sed. The language provides powerful text
processing facilities without the arbitrary data length limits of
many contemporary Unix tools, facilitating easy manipulation of text
files. Though originally developed for text manipulation, Perl is
used for a wide range of tasks including system administration, web
development, network programming, games, bioinformatics, and GUI
development.
The
language is intended to be practical (easy to use, efficient,
complete) rather than
beautiful
(tiny, elegant, minimal). Its major features include support for
multiple programming paradigms (procedural, object-oriented, and
functional styles), reference counting memory management , built-in
support for text processing, and a large collection of third-party
modules.
Program
1:
[root@localhost Desktop]# vi hello.pl
#!/usr/bin/env perl
#The above statement tells the system
that this is a perl program.
print "Hello World!\n";
#print the text Hello World and a newline.
Output:
[root@localhost Desktop]# perl hello.pl
Hello World!
Program:2
[root@localhost Desktop]# vi name.pl
#!/usr/bin/env perl
# name.pl
print "Enter you name and press
return:";
$name=<STDIN>; #read the data
chomp($name); #remove the newline
print "\nEnter your birth year and
press return:";
$byear=<STDIN>;
chomp($byear);
#localtime gives the data with 9
distinct values. Collect them.
my ($sec, $min, $hour, $mday, $mon,
$year, $wday, $yday, $dst) =
localtime time;
output:
[root@localhost Desktop]# perl name.pl
Enter you name and press return:priya
Enter your birth year and press
return:1992
Hello, priya!
You are 20 years old.
Program:3
[root@localhost Desktop]# vi words.pl
#!/usr/bin/env perl
#
#words.pl word FILE
#
#if no data filename is given, this
program will hang
print "Enter the word you want to
search for an press return:";
$sword=<STDIN>;
chomp($sword);
$scount = 0; #search counter
$bcount = 0; #blank line counter
while(<>) #continue reading as
long as there is input
{
chomp; #remove newline from each line
foreach $w (split) #split each line
into words
{
if ($w eq $sword)
{
$scount++; #search hit counter
}
$words++;
$char += length($w);
}
#if the length of the current line is
0, we have a blank line
if (length($_) == 0)
{
$bcount++;
}
}
$avgw = $words/$.; #average words per
line including blank lines
$avgc = $char/$words; #average
characters per word
print "There are $. lines in this
file including $bcount blank
lines.\n";
print "There are $words words in
this file.\n";
print "There are $char characters
in this file.\n";
print "The average number of words
per line is $avgw.\n";
print "The average number of
characters per word is $avgc.\n";
print "the word $sword occurs in
the text $scount times.\n";
output:
[root@localhost Desktop]# vi word.txt
hai i am priya
have a nice day for all.....
[root@localhost Desktop]# perl words.pl
word.txt
Enter the word you want to search for
an press return:priya
There are 3 lines in this file
including 1 blank lines.
There are 10 words in this file.
There are 34 characters in this file.
The average number of words per line is
3.33333333333333.
The average number of characters per
word is 3.4.
the word priya occurs in the text 1
times.
program:4
[root@localhost Desktop]# vi
wordcount.pl
#!/usr/bin/env perl
#
#wordcount.pl FILE
#
#if no filename is given, print help
and exit
if (length($ARGV[0]) < 1)
{
print "Usage is : words.pl word
filename\n";
exit;
}
my $file = $ARGV[0]; #filename given in
commandline
open(FILE, $file); #open the mentioned
filename
while(<FILE>) #continue reading
until the file ends
{
chomp;
tr/A-Z/a-z/; #convert all upper case
words to lower case
tr/.,:;!?"(){}//d; #remove some
common punctuation symbols
#We are creating a hash with the word
as the key.
#Each time a word is encountered, its
hash is incremented by 1.
#If the count for a word is 1, it is a
new distinct word.
#We keep track of the number of words
parsed so far.
#We also keep track of the no. of words
of a particular length.
foreach $wd (split)
{
$count{$wd}++;
if ($count{$wd} == 1)
{
$dcount++;
}
$wcount++;
$lcount{length($wd)}++;
}
}
#To print the distinct words and their
frequency,
#we iterate over the hash containing
the words and their count.
print "\nThe words and their
frequency in the text is:\n";
foreach $w (sort keys%count)
{
print "$w : $count{$w}\n";
}
#For the word length and frequency we
use the word length hash
print "The word length and
frequency in the given text is:\n";
foreach $w (sort keys%lcount)
{
print "$w : $lcount{$w}\n";
}
print "There are $wcount words in
the file.\n";
print "There are $dcount distinct
words in the file.\n";
$ttratio = ($dcount/$wcount)*100;
#Calculating the type-token ratio.
print "The type-token ratio of the
file is $ttratio.\n";
output:
[root@localhost Desktop]# perl
wordcount.pl word.txt
The words and their frequency in the
text is:
a : 1
all : 1
am : 1
day : 1
for : 1
hai : 1
have : 1
i : 1
nice : 1
priya : 1
The word length and frequency in the
given text is:
1 : 2
2 : 1
3 : 4
4 : 2
5 : 1
There are 10 words in the file.
There are 10 distinct words in the
file.
The type-token ratio of the file is
100.
[root@localhost Desktop]#
RESULT:
Thus,
the simple programs for text processing in perl had been done and the
output was verified.
No comments:
Post a Comment