Saturday, August 18, 2012

Data Warehousing and Data Mining Unit Wise Questions

Anna University
Data Warehousing and Data Mining
2012 Edition

Sub Code : CS2032
Sub Name: Data Warehousing and Data Mining


1. Define the term ‘Data Warehouse’.
2. Write down the applications of data warehousing.
3. When is data mart appropriate?
4. List out the functionality of metadata.
5. What are nine decision in the design of a Data warehousing?
6. List out the two different types of reporting tools.
7. Why data mining is used in all organizations.
8. What are the technical issues to be considered when designing and implementing a data warehouse environment?
9. List out some of the examples of access tools.
10. What are the advantages of data warehousing.
11. Give the difference between the Horizontal and Vertical Parallelism.
12. Draw a neat diagram for the Distributed memory shared disk architecture.
13. Define star schema.
14. What are the reasons to achieve very good performance by SYBASE IQ technology?
15. What are the steps to be followed to store the external source into the data warehouse?
16. Define Legacy data.
17. Draw the standard framework for metadata interchange.
18. List out the five main groups of access tools.
19. Define Data Visualization.
20. What are the various forms of data preprocessing?
21. How is data warehouse different from database? How are they similar?
22. What is data transformation? Give example.
23. With an example explain what is Meta data?
24. What is data mart?
1. Enumerate the building blocks of data warehouse. Explain the importance of metadata in a data warehouse environment. [16]
2. Explain various methods of data cleaning in detail [8]
3. Diagrammatically illustrate and discuss the data warehousing architecture with briefly explain components of data warehouse [16]
4. (i) Distinguish between Data warehousing and data mining. [8] (ii)Describe in detail about data extraction, cleanup [8]
5. Write short notes on
(i)Transformation [8] (ii)Metadata [8]
6. List and discuss the steps involved in mapping the data warehouse to a multiprocessor architecture. [16]
7. Discuss in detail about Bitmapped Indexing [16]
8. Explain in detail about different Vendor Solutions. [16]


1. Difference between OLAP and OLTP.
2. Classify OLAP tools.
3. What is meant by OLAP?
4. Difference between OLAP & OLTP
5. Define Concept Hierarchy.
6. List out the five categories of decision support tools.
7. Define Cognos Impromptu
8. List out any 5 OLAP guidelines.
9. Distinguish between multidimensional and multi-relational OLAP.
10. Define ROLAP.
11. Draw a neat diagram for the web processing model.
12. Define MQE.
13. Draw a neat sketch for three-tired client/server architecture.
14. List out the applications that the organizations uses to build a query and reporting environment for the data warehouse.
15. Distinguish between window painter and data windows painter.
16. Define ADF, SGF and DEF.
17. What is the function of power play administrator?
1. Discuss the typical OLAP operations with an example. [6]
2. List and discuss the basic features that are provided by reporting
and query tools used for business analysis. [16]
3. Describe in detail about Cognos Impromptu [16]
4. Explain about OLAP in detail. [16]
5. With relevant examples discuss multidimensional online analytical processing and multi-relational online analytical processing. [16]
6. Discuss about the OLAP tools and the Internet [16]
7. (i)Explain Multidimensional Data model. [10]
(ii)Discuss how computations can be performed efficiently on data cubes. [6]


1. Define data.
2. State why the data preprocessing an important issue for data warehousing and data mining.
3. What is the need for discretization in data mining?.
4. What are the various forms of data preprocessing?
5. What is concept Hierarchy? Give an example.
6. What are the various forms of data preprocessing?
7. Mention the various tasks to be accomplished as part of data pre-processing.
8. Define Data Mining.
9. List out any four data mining tools.
10. What do data mining functionalities include?
11. Define patterns.
1.(i) Explain the various primitives for specifying Data mining Task.
(ii) Describe the various descriptive statistical measures for data mining.
[10] [6]
2.Discuss about different types of data and functionalities.[16]
3.(i)Describe in detail about Interestingness of patterns.
(ii)Explain in detail about data mining task primitives.
(i)Discuss about different Issues of data mining.
(ii)Explain in detail about data preprocessing.
How data mining system are classified? Discuss each classification with an example.
[10] [16]
6.How data mining system can be integrated with a data warehouse? Discuss with an example.[16]

Part A

1. What is meant by market Basket analysis?
2. What is the use of multilevel association rules?
3. What is meant by pruning in a decision tree induction?
4. Write the two measures of Association Rule.
5. With an example explain correlation analysis.
6. Define conditional pattern base.
7. List out the major strength of decision tree method.
8. In classification trees, what are the surrogate splits, and how are they used?
9. The Na├»ve Bayes’ classifier makes what assumptions that motivate its name?
10. What is the frequent item set property?
11. List out the major strength of the decision tree Induction.
12. Write the two measures of association rule.
13. How are association rules mined from large databases?
14. What is tree pruning in decision tree induction?
15. What is the use of multi level association rules?
16. What are the Apriori properties used in the Apriori algorithms?
17. How is predication different from classification?
18. What is a support vector machine?
19. What are the means to improve the performance of association rule mining algorithm?
20. State the advantages of the decision tree approach over other approaches for performing classification.
1. Decision tree induction is a popular classification method. Taking one typical decision tree induction algorithm , briefly outline the method of decision tree classification. [16]
2Consider the following training dataset and the original decision tree induction algorithm (ID3). Risk is the class label attribute. The Height values have been already discredited into disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate the information gain if Height is chosen as the test attribute. Draw the final decision tree (without any pruning) for the training dataset. Generate all the “IF-THEN rules from the decision tree.
Gender Height Risk
F (1.5, 1.6) Low
M (1.9, 2.0) High
F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low
M (1.8, 1.9) Medium
F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High
F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium
F (1.7, 1.8) Medium [16]
(a) Given the following transactional database
1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O
(i) We want to mine all the frequent itemsets in the data using the Apriori algorithm.
Assume the minimum support level is 30%. (You need to give the setof frequent item sets in L1, L2,… candidate item sets in C1, C2,…) [9]
(ii) Find all the association rules that involve only B, C.H (in either leftor right hand side of the rule). The minimum confidence is 70%. [7]
3. Describe the multi-dimensional association rule, giving a suitable example. [16]
4. (a)Explain the algorithm for constructing a decision tree from training samples [12]
(b)Explain Bayes theorem. [4]
6. Develop an algorithm for classification using Bayesian classification.Illustrate the algorithm with a relevant example. [16]
7. Discuss the approaches for mining multi level association rules from the transactional databases. Give relevant example. [16]
8. Write and explain the algorithm for mining frequent item sets without candidate generation. Give relevant example. [16]
9. How is attribute oriented induction implemented? Explain in detail. [16]
10. Discuss in detail about Bayesian classification [8]
11. A database has four transactions. Let min sup=60% and min conf=80%.
Find all frequent itemsets using Apriori and FP growth, respectively. Compare the efficiency of the two mining process. [16]


1. What are the requirements of clustering?
2. What are the applications of spatial data bases?
3. What is text mining?
4. Distinguish between classification and clustering.
5. Define a Spatial database.
6. List out any two various commercial data mining tools.
7. What is the objective function of K-means algorithm?
8. Mention the advantages of Hierarchical clustering.
9. Distinguish between classification and clustering.
10. List the requirements of clustering in data mining.
11. What is web usage mining?
12. What are the requirements of clustering?
13. What are the applications of spatial databases?
14. What is text mining?
15. What is cluster analysis ?
16. What are the two data structures in cluster analysis?
17. What is an outlier? Give example.
18. What is audio data mining?
19. List two application of data mining.
1. BIRCH and CLARANS are two interesting clustering algorithms that perform effective clustering in large data sets.
(i) Outline how BIRCH performs clustering in large data sets. [10] (ii) Compare and outline the major differences of the two scalable clustering algorithms BIRCH and CLARANS. [6]
2. Write a short note on web mining taxonomy. Explain the different activities of text mining.
3. Discuss and elaborate the current trends in data mining. [6+5+5]
4. Discuss spatial data bases and Text databases [16]
5. What is a multimedia database? Explain the methods of mining multimedia database? [16]
6. (a) Explain the following clustering methods in detail.
(a) BIRCH (b) CURE [16]
7. Discuss in detail about any four data mining applications. [16]
8. Write short notes on
(i) Partitioning methods [8] (ii) Outlier analysis [8]
9. Describe K means clustering with an example. [16]
10. Describe in detail about Hierarchical methods.
11. With relevant example discuss constraint based cluster analysis. [16]

1 comment:

  1. THIS IS really nice portal to learn Datastage Online Training from the best traoners online.



Image Slider By The slide is a linking image  Welcome to Engineer Portal... #htmlcaption

Tamil Short Film Laptaap

Tamil Short Film Laptaap


About Blogging (1) Advance Data Structure (2) ADVANCED COMPUTER ARCHITECTURE (4) Advanced Database (4) ADVANCED DATABASE TECHNOLOGY (4) ADVANCED JAVA PROGRAMMING (1) ADVANCED OPERATING SYSTEMS (3) ADVANCED OPERATING SYSTEMS LAB (2) Agriculture and Technology (1) Analag and Digital Communication (1) Android (1) Applet (1) ARTIFICIAL INTELLIGENCE (3) aspiration 2020 (3) assignment cse (12) AT (1) AT - key (1) Attacker World (6) Basic Electrical Engineering (1) C (1) C Aptitude (20) C Program (88) C# AND .NET FRAMEWORK (11) C++ (1) Calculator (1) Chemistry (1) Cloud Computing Lab (1) Compiler Design (8) Computer Graphics Lab (31) COMPUTER GRAPHICS LABORATORY (1) COMPUTER GRAPHICS Theory (1) COMPUTER NETWORKS (3) computer organisation and architecture (1) Course Plan (2) Cricket (1) cryptography and network security (3) CS 810 (2) cse syllabus (29) Cyberoam (1) Data Mining Techniques (5) Data structures (3) DATA WAREHOUSING AND DATA MINING (4) DATABASE MANAGEMENT SYSTEMS (8) DBMS Lab (11) Design and Analysis Algorithm CS 41 (1) Design and Management of Computer Networks (2) Development in Transportation (1) Digital Principles and System Design (1) Digital Signal Processing (15) DISCRETE MATHEMATICS (1) dos box (1) Download (1) ebooks (12) electronic circuits and electron devices (1) Embedded Software Development (4) Embedded systems lab (4) Embedded systems theory (1) Engineer Portal (1) ENGINEERING ECONOMICS AND FINANCIAL ACCOUNTING (5) ENGINEERING PHYSICS (1) english lab (7) Entertainment (1) Facebook (2) fact (31) FUNDAMENTALS OF COMPUTING AND PROGRAMMING (3) Gate (3) General (3) Global warming (1) GRAPH THEORY (1) Grid Computing (11) hacking (4) HIGH SPEED NETWORKS (1) Horizon (1) III year (1) INFORMATION SECURITY (1) Installation (1) INTELLECTUAL PROPERTY RIGHTS (IPR) (1) Internal Test (13) internet programming lab (20) IPL (1) Java (38) java lab (1) Java Programs (28) jdbc (1) jsp (1) KNOWLEDGE MANAGEMENT (1) lab syllabus (4) MATHEMATICS (3) Mechanical Engineering (1) Microprocessor and Microcontroller (1) Microprocessor and Microcontroller lab (11) Mini Projects (1) MOBILE AND PERVASIVE COMPUTING (15) MOBILE COMPUTING (1) Multicore Architecute (1) MULTICORE PROGRAMMING (2) Multiprocessor Programming (2) NANOTECHNOLOGY (1) NATURAL LANGUAGE PROCESSING (1) NETWORK PROGRAMMING AND MANAGEMENT (1) NETWORKPROGNMGMNT (1) networks lab (16) News (14) Nova (1) NUMERICAL METHODS (2) Object Oriented Programming (1) ooad lab (6) ooad theory (9) OPEN SOURCE LAB (22) openGL (10) Openstack (1) Operating System CS45 (2) operating systems lab (20) other (4) parallel computing (1) parallel processing (1) PARALLEL PROGRAMMING (1) Parallel Programming Paradigms (4) pdf (1) Perl (1) Placement (3) Placement - Interview Questions (64) PRINCIPLES OF COMMUNICATION (1) PROBABILITY AND QUEUING THEORY (3) PROGRAMMING PARADIGMS (1) Python (3) Question Bank (1) question of the day (8) Question Paper (13) Question Paper and Answer Key (3) Railway Airport and Harbor (1) REAL TIME SYSTEMS (1) RESOURCE MANAGEMENT TECHNIQUES (1) results (3) semester 4 (5) semester 5 (1) Semester 6 (5) SERVICE ORIENTED ARCHITECTURE (1) Skill Test (1) software (1) Software Engineering (4) SOFTWARE TESTING (1) Structural Analysis (1) syllabus (34) SYSTEM SOFTWARE (1) system software lab (2) SYSTEMS MODELING AND SIMULATION (1) Tansat (2) Tansat 2011 (1) Tansat 2013 (1) TCP/IP DESIGN AND IMPLEMENTATION (1) TECHNICAL ENGLISH (7) Technology and National Security (1) Theory of Computation (3) Thought for the Day (1) Timetable (4) tips (4) Topic Notes (7) tot (1) TOTAL QUALITY MANAGEMENT (4) tutorial (8) Ubuntu LTS 12.04 (1) Unit Wise Notes (1) University Question Paper (1) UNIX INTERNALS (1) UNIX Lab (21) USER INTERFACE DESIGN (3) VIDEO TUTORIALS (1) Virtual Instrumentation Lab (1) Visual Programming (2) Web Technology (11) WIRELESS NETWORKS (1)