TEXTPROCESSING WITH PERL :: OPEN SOURCE LAB

EX.NO:4a

TEXTPROCESSING WITH PERL

24.07.12

AIM:

To learn how to do some common text processing tasks using Perl.

INTRODUCTION:

Perl is a high-level, general-purpose, interpreted, dynamic programming language.

Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Perl borrows features from other programming languages including C, shell scripting(sh), AWK, and sed. The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools, facilitating easy manipulation of text files. Though originally developed for text manipulation, Perl is used for a wide range of tasks including system administration, web development, network programming, games, bioinformatics, and GUI development.

The language is intended to be practical (easy to use, efficient, complete) rather than

beautiful (tiny, elegant, minimal). Its major features include support for multiple programming paradigms (procedural, object-oriented, and functional styles), reference counting memory management , built-in support for text processing, and a large collection of third-party modules.

Program 1:

[root@localhost Desktop]# vi hello.pl

#!/usr/bin/env perl

#The above statement tells the system that this is a perl program.

print "Hello World!\n"; #print the text Hello World and a newline.

Output:

[root@localhost Desktop]# perl hello.pl

Hello World!

Program:2

[root@localhost Desktop]# vi name.pl

#!/usr/bin/env perl

# name.pl

print "Enter you name and press return:";

$name=<STDIN>; #read the data

chomp($name); #remove the newline

print "\nEnter your birth year and press return:";

$byear=<STDIN>;

chomp($byear);

#localtime gives the data with 9 distinct values. Collect them.

my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $dst) =

localtime time;

output:

[root@localhost Desktop]# perl name.pl

Enter you name and press return:priya

Enter your birth year and press return:1992

Hello, priya!

You are 20 years old.

Program:3

[root@localhost Desktop]# vi words.pl

#!/usr/bin/env perl

#words.pl word FILE

#if no data filename is given, this program will hang

print "Enter the word you want to search for an press return:";

$sword=<STDIN>;

chomp($sword);

$scount = 0; #search counter

$bcount = 0; #blank line counter

while(<>) #continue reading as long as there is input

{

chomp; #remove newline from each line

foreach $w (split) #split each line into words

{

if ($w eq $sword)

{

$scount++; #search hit counter

}

$words++;

$char += length($w);

}

#if the length of the current line is 0, we have a blank line

if (length($_) == 0)

{

$bcount++;

}

$avgw = $words/$.; #average words per line including blank lines

$avgc = $char/$words; #average characters per word

print "There are $. lines in this file including $bcount blank

lines.\n";

print "There are $words words in this file.\n";

print "There are $char characters in this file.\n";

print "The average number of words per line is $avgw.\n";

print "The average number of characters per word is $avgc.\n";

print "the word $sword occurs in the text $scount times.\n";

output:

[root@localhost Desktop]# vi word.txt

hai i am priya

have a nice day for all.....

[root@localhost Desktop]# perl words.pl word.txt

Enter the word you want to search for an press return:priya

There are 3 lines in this file including 1 blank lines.

There are 10 words in this file.

There are 34 characters in this file.

The average number of words per line is 3.33333333333333.

The average number of characters per word is 3.4.

the word priya occurs in the text 1 times.

program:4

[root@localhost Desktop]# vi wordcount.pl

#!/usr/bin/env perl

#wordcount.pl FILE

#if no filename is given, print help and exit

if (length($ARGV[0]) < 1)

{

print "Usage is : words.pl word filename\n";

exit;

}

my $file = $ARGV[0]; #filename given in commandline

open(FILE, $file); #open the mentioned filename

while(<FILE>) #continue reading until the file ends

{

chomp;

tr/A-Z/a-z/; #convert all upper case words to lower case

tr/.,:;!?"(){}//d; #remove some common punctuation symbols

#We are creating a hash with the word as the key.

#Each time a word is encountered, its hash is incremented by 1.

#If the count for a word is 1, it is a new distinct word.

#We keep track of the number of words parsed so far.

#We also keep track of the no. of words of a particular length.

foreach $wd (split)

{

$count{$wd}++;

if ($count{$wd} == 1)

{

$dcount++;

}

$wcount++;

$lcount{length($wd)}++;

}

#To print the distinct words and their frequency,

#we iterate over the hash containing the words and their count.

print "\nThe words and their frequency in the text is:\n";

foreach $w (sort keys%count)

{

print "$w : $count{$w}\n";

}

#For the word length and frequency we use the word length hash

print "The word length and frequency in the given text is:\n";

foreach $w (sort keys%lcount)

{

print "$w : $lcount{$w}\n";

}

print "There are $wcount words in the file.\n";

print "There are $dcount distinct words in the file.\n";

$ttratio = ($dcount/$wcount)*100; #Calculating the type-token ratio.

print "The type-token ratio of the file is $ttratio.\n";

output:

[root@localhost Desktop]# perl wordcount.pl word.txt

The words and their frequency in the text is:

a : 1

all : 1

am : 1

day : 1

for : 1

hai : 1

have : 1

i : 1

nice : 1

priya : 1

The word length and frequency in the given text is:

1 : 2

2 : 1

3 : 4

4 : 2

5 : 1

There are 10 words in the file.

There are 10 distinct words in the file.

The type-token ratio of the file is 100.

[root@localhost Desktop]#

RESULT:

Thus, the simple programs for text processing in perl had been done and the output was verified.

Engineer Portal - Prem Sasi Kumar Arivukalanjiam

Search This Blog

TEXTPROCESSING WITH PERL :: OPEN SOURCE LAB

Comments

Post a Comment