Tuesday, July 31, 2012

TEXTPROCESSING WITH PERL :: OPEN SOURCE LAB



EX.NO:4a
TEXTPROCESSING WITH PERL
24.07.12

AIM:

To learn how to do some common text processing tasks using Perl.

INTRODUCTION:

Perl is a high-level, general-purpose, interpreted, dynamic programming language.
Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Perl borrows features from other programming languages including C, shell scripting(sh), AWK, and sed. The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools, facilitating easy manipulation of text files. Though originally developed for text manipulation, Perl is used for a wide range of tasks including system administration, web development, network programming, games, bioinformatics, and GUI development.
The language is intended to be practical (easy to use, efficient, complete) rather than
beautiful (tiny, elegant, minimal). Its major features include support for multiple programming paradigms (procedural, object-oriented, and functional styles), reference counting memory management , built-in support for text processing, and a large collection of third-party modules.

Program 1:
[root@localhost Desktop]# vi hello.pl

#!/usr/bin/env perl
#The above statement tells the system that this is a perl program.
print "Hello World!\n"; #print the text Hello World and a newline.

Output:

[root@localhost Desktop]# perl hello.pl

Hello World!



Program:2

[root@localhost Desktop]# vi name.pl

#!/usr/bin/env perl
# name.pl
print "Enter you name and press return:";
$name=<STDIN>; #read the data
chomp($name); #remove the newline
print "\nEnter your birth year and press return:";
$byear=<STDIN>;
chomp($byear);
#localtime gives the data with 9 distinct values. Collect them.
my ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $dst) =
localtime time;

output:

[root@localhost Desktop]# perl name.pl
Enter you name and press return:priya
Enter your birth year and press return:1992

Hello, priya!

You are 20 years old.

Program:3

[root@localhost Desktop]# vi words.pl
#!/usr/bin/env perl
#
#words.pl word FILE
#
#if no data filename is given, this program will hang
print "Enter the word you want to search for an press return:";
$sword=<STDIN>;
chomp($sword);
$scount = 0; #search counter
$bcount = 0; #blank line counter
while(<>) #continue reading as long as there is input
{
chomp; #remove newline from each line
foreach $w (split) #split each line into words
{
if ($w eq $sword)
{
$scount++; #search hit counter
}
$words++;
$char += length($w);
}
#if the length of the current line is 0, we have a blank line
if (length($_) == 0)
{
$bcount++;
}
}
$avgw = $words/$.; #average words per line including blank lines
$avgc = $char/$words; #average characters per word
print "There are $. lines in this file including $bcount blank
lines.\n";
print "There are $words words in this file.\n";
print "There are $char characters in this file.\n";
print "The average number of words per line is $avgw.\n";
print "The average number of characters per word is $avgc.\n";
print "the word $sword occurs in the text $scount times.\n";

output:

[root@localhost Desktop]# vi word.txt
hai i am priya
have a nice day for all.....

[root@localhost Desktop]# perl words.pl word.txt
Enter the word you want to search for an press return:priya

There are 3 lines in this file including 1 blank lines.
There are 10 words in this file.
There are 34 characters in this file.
The average number of words per line is 3.33333333333333.
The average number of characters per word is 3.4.
the word priya occurs in the text 1 times.


program:4

[root@localhost Desktop]# vi wordcount.pl

#!/usr/bin/env perl
#
#wordcount.pl FILE
#
#if no filename is given, print help and exit
if (length($ARGV[0]) < 1)
{


print "Usage is : words.pl word filename\n";
exit;
}
my $file = $ARGV[0]; #filename given in commandline
open(FILE, $file); #open the mentioned filename
while(<FILE>) #continue reading until the file ends
{
chomp;
tr/A-Z/a-z/; #convert all upper case words to lower case
tr/.,:;!?"(){}//d; #remove some common punctuation symbols
#We are creating a hash with the word as the key.
#Each time a word is encountered, its hash is incremented by 1.
#If the count for a word is 1, it is a new distinct word.
#We keep track of the number of words parsed so far.
#We also keep track of the no. of words of a particular length.
foreach $wd (split)
{
$count{$wd}++;
if ($count{$wd} == 1)
{
$dcount++;
}
$wcount++;
$lcount{length($wd)}++;
}
}
#To print the distinct words and their frequency,
#we iterate over the hash containing the words and their count.
print "\nThe words and their frequency in the text is:\n";
foreach $w (sort keys%count)
{
print "$w : $count{$w}\n";
}
#For the word length and frequency we use the word length hash
print "The word length and frequency in the given text is:\n";
foreach $w (sort keys%lcount)
{
print "$w : $lcount{$w}\n";
}
print "There are $wcount words in the file.\n";
print "There are $dcount distinct words in the file.\n";
$ttratio = ($dcount/$wcount)*100; #Calculating the type-token ratio.
print "The type-token ratio of the file is $ttratio.\n";





output:
[root@localhost Desktop]# perl wordcount.pl word.txt

The words and their frequency in the text is:
a : 1
all : 1
am : 1
day : 1
for : 1
hai : 1
have : 1
i : 1
nice : 1
priya : 1
The word length and frequency in the given text is:
1 : 2
2 : 1
3 : 4
4 : 2
5 : 1
There are 10 words in the file.
There are 10 distinct words in the file.
The type-token ratio of the file is 100.
[root@localhost Desktop]#




RESULT:
Thus, the simple programs for text processing in perl had been done and the output was verified.

No comments:

Post a Comment

Slider

Image Slider By engineerportal.blogspot.in The slide is a linking image  Welcome to Engineer Portal... #htmlcaption

Tamil Short Film Laptaap

Tamil Short Film Laptaap
Laptapp

Labels

About Blogging (1) Advance Data Structure (2) ADVANCED COMPUTER ARCHITECTURE (4) Advanced Database (4) ADVANCED DATABASE TECHNOLOGY (4) ADVANCED JAVA PROGRAMMING (1) ADVANCED OPERATING SYSTEMS (3) ADVANCED OPERATING SYSTEMS LAB (2) Agriculture and Technology (1) Analag and Digital Communication (1) Android (1) Applet (1) ARTIFICIAL INTELLIGENCE (3) aspiration 2020 (3) assignment cse (12) AT (1) AT - key (1) Attacker World (6) Basic Electrical Engineering (1) C (1) C Aptitude (20) C Program (87) C# AND .NET FRAMEWORK (11) C++ (1) Calculator (1) Chemistry (1) Cloud Computing Lab (1) Compiler Design (8) Computer Graphics Lab (31) COMPUTER GRAPHICS LABORATORY (1) COMPUTER GRAPHICS Theory (1) COMPUTER NETWORKS (3) computer organisation and architecture (1) Course Plan (2) Cricket (1) cryptography and network security (3) CS 810 (2) cse syllabus (29) Cyberoam (1) Data Mining Techniques (5) Data structures (3) DATA WAREHOUSING AND DATA MINING (4) DATABASE MANAGEMENT SYSTEMS (8) DBMS Lab (11) Design and Analysis Algorithm CS 41 (1) Design and Management of Computer Networks (2) Development in Transportation (1) Digital Principles and System Design (1) Digital Signal Processing (15) DISCRETE MATHEMATICS (1) dos box (1) Download (1) ebooks (11) electronic circuits and electron devices (1) Embedded Software Development (4) Embedded systems lab (4) Embedded systems theory (1) Engineer Portal (1) ENGINEERING ECONOMICS AND FINANCIAL ACCOUNTING (5) ENGINEERING PHYSICS (1) english lab (7) Entertainment (1) Facebook (2) fact (31) FUNDAMENTALS OF COMPUTING AND PROGRAMMING (3) Gate (3) General (3) gitlab (1) Global warming (1) GRAPH THEORY (1) Grid Computing (11) hacking (4) HIGH SPEED NETWORKS (1) Horizon (1) III year (1) INFORMATION SECURITY (1) Installation (1) INTELLECTUAL PROPERTY RIGHTS (IPR) (1) Internal Test (13) internet programming lab (20) IPL (1) Java (38) java lab (1) Java Programs (28) jdbc (1) jsp (1) KNOWLEDGE MANAGEMENT (1) lab syllabus (4) MATHEMATICS (3) Mechanical Engineering (1) Microprocessor and Microcontroller (1) Microprocessor and Microcontroller lab (11) migration (1) Mini Projects (1) MOBILE AND PERVASIVE COMPUTING (15) MOBILE COMPUTING (1) Multicore Architecute (1) MULTICORE PROGRAMMING (2) Multiprocessor Programming (2) NANOTECHNOLOGY (1) NATURAL LANGUAGE PROCESSING (1) NETWORK PROGRAMMING AND MANAGEMENT (1) NETWORKPROGNMGMNT (1) networks lab (16) News (14) Nova (1) NUMERICAL METHODS (2) Object Oriented Programming (1) ooad lab (6) ooad theory (9) OPEN SOURCE LAB (22) openGL (10) Openstack (1) Operating System CS45 (2) operating systems lab (20) other (4) parallel computing (1) parallel processing (1) PARALLEL PROGRAMMING (1) Parallel Programming Paradigms (4) Perl (1) Placement (3) Placement - Interview Questions (64) PRINCIPLES OF COMMUNICATION (1) PROBABILITY AND QUEUING THEORY (3) PROGRAMMING PARADIGMS (1) Python (3) Question Bank (1) question of the day (8) Question Paper (13) Question Paper and Answer Key (3) Railway Airport and Harbor (1) REAL TIME SYSTEMS (1) RESOURCE MANAGEMENT TECHNIQUES (1) results (3) semester 4 (5) semester 5 (1) Semester 6 (5) SERVICE ORIENTED ARCHITECTURE (1) Skill Test (1) software (1) Software Engineering (4) SOFTWARE TESTING (1) Structural Analysis (1) syllabus (34) SYSTEM SOFTWARE (1) system software lab (2) SYSTEMS MODELING AND SIMULATION (1) Tansat (2) Tansat 2011 (1) Tansat 2013 (1) TCP/IP DESIGN AND IMPLEMENTATION (1) TECHNICAL ENGLISH (7) Technology and National Security (1) Theory of Computation (3) Thought for the Day (1) Timetable (4) tips (4) Topic Notes (7) tot (1) TOTAL QUALITY MANAGEMENT (4) tutorial (8) Ubuntu LTS 12.04 (1) Unit Wise Notes (1) University Question Paper (1) UNIX INTERNALS (1) UNIX Lab (21) USER INTERFACE DESIGN (3) VIDEO TUTORIALS (1) Virtual Instrumentation Lab (1) Visual Programming (2) Web Technology (11) WIRELESS NETWORKS (1)

LinkWithin