Tuesday, July 31, 2012

TEXTPROCESSING WITH PERL :: OPEN SOURCE LAB



EX.NO:4a
TEXTPROCESSING WITH PERL
24.07.12

AIM:

To learn how to do some common text processing tasks using Perl.

INTRODUCTION:

Perl is a high-level, general-purpose, interpreted, dynamic programming language.
Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Perl borrows features from other programming languages including C, shell scripting(sh), AWK, and sed. The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools, facilitating easy manipulation of text files. Though originally developed for text manipulation, Perl is used for a wide range of tasks including system administration, web development, network programming, games, bioinformatics, and GUI development.
The language is intended to be practical (easy to use, efficient, complete) rather than
beautiful (tiny, elegant, minimal). Its major features include support for multiple programming paradigms (procedural, object-oriented, and functional styles), reference counting memory management , built-in support for text processing, and a large collection of third-party modules.

Program 1:
[root@localhost Desktop]# vi hello.pl

#!/usr/bin/env perl
#The above statement tells the system that this is a perl program.
print "Hello World!\n"; #print the text Hello World and a newline.

Output:

[root@localhost Desktop]# perl hello.pl

Hello World!



Program:2

[root@localhost Desktop]# vi name.pl

#!/usr/bin/env perl
# name.pl
print "Enter you name and press return:";
$name=<STDIN>; #read the data
chomp($name); #remove the newline
print "\nEnter your birth year and press return:";
$byear=<STDIN>;
chomp($byear);
#localtime gives the data with 9 distinct values. Collect them.
my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $dst) =
localtime time;

output:

[root@localhost Desktop]# perl name.pl
Enter you name and press return:priya
Enter your birth year and press return:1992

Hello, priya!

You are 20 years old.

Program:3

[root@localhost Desktop]# vi words.pl
#!/usr/bin/env perl
#
#words.pl word FILE
#
#if no data filename is given, this program will hang
print "Enter the word you want to search for an press return:";
$sword=<STDIN>;
chomp($sword);
$scount = 0; #search counter
$bcount = 0; #blank line counter
while(<>) #continue reading as long as there is input
{
chomp; #remove newline from each line
foreach $w (split) #split each line into words
{
if ($w eq $sword)
{
$scount++; #search hit counter
}
$words++;
$char += length($w);
}
#if the length of the current line is 0, we have a blank line
if (length($_) == 0)
{
$bcount++;
}
}
$avgw = $words/$.; #average words per line including blank lines
$avgc = $char/$words; #average characters per word
print "There are $. lines in this file including $bcount blank
lines.\n";
print "There are $words words in this file.\n";
print "There are $char characters in this file.\n";
print "The average number of words per line is $avgw.\n";
print "The average number of characters per word is $avgc.\n";
print "the word $sword occurs in the text $scount times.\n";

output:

[root@localhost Desktop]# vi word.txt
hai i am priya
have a nice day for all.....

[root@localhost Desktop]# perl words.pl word.txt
Enter the word you want to search for an press return:priya

There are 3 lines in this file including 1 blank lines.
There are 10 words in this file.
There are 34 characters in this file.
The average number of words per line is 3.33333333333333.
The average number of characters per word is 3.4.
the word priya occurs in the text 1 times.


program:4

[root@localhost Desktop]# vi wordcount.pl

#!/usr/bin/env perl
#
#wordcount.pl FILE
#
#if no filename is given, print help and exit
if (length($ARGV[0]) < 1)
{


print "Usage is : words.pl word filename\n";
exit;
}
my $file = $ARGV[0]; #filename given in commandline
open(FILE, $file); #open the mentioned filename
while(<FILE>) #continue reading until the file ends
{
chomp;
tr/A-Z/a-z/; #convert all upper case words to lower case
tr/.,:;!?"(){}//d; #remove some common punctuation symbols
#We are creating a hash with the word as the key.
#Each time a word is encountered, its hash is incremented by 1.
#If the count for a word is 1, it is a new distinct word.
#We keep track of the number of words parsed so far.
#We also keep track of the no. of words of a particular length.
foreach $wd (split)
{
$count{$wd}++;
if ($count{$wd} == 1)
{
$dcount++;
}
$wcount++;
$lcount{length($wd)}++;
}
}
#To print the distinct words and their frequency,
#we iterate over the hash containing the words and their count.
print "\nThe words and their frequency in the text is:\n";
foreach $w (sort keys%count)
{
print "$w : $count{$w}\n";
}
#For the word length and frequency we use the word length hash
print "The word length and frequency in the given text is:\n";
foreach $w (sort keys%lcount)
{
print "$w : $lcount{$w}\n";
}
print "There are $wcount words in the file.\n";
print "There are $dcount distinct words in the file.\n";
$ttratio = ($dcount/$wcount)*100; #Calculating the type-token ratio.
print "The type-token ratio of the file is $ttratio.\n";





output:
[root@localhost Desktop]# perl wordcount.pl word.txt

The words and their frequency in the text is:
a : 1
all : 1
am : 1
day : 1
for : 1
hai : 1
have : 1
i : 1
nice : 1
priya : 1
The word length and frequency in the given text is:
1 : 2
2 : 1
3 : 4
4 : 2
5 : 1
There are 10 words in the file.
There are 10 distinct words in the file.
The type-token ratio of the file is 100.
[root@localhost Desktop]#




RESULT:
Thus, the simple programs for text processing in perl had been done and the output was verified.

No comments:

Post a Comment