Wednesday, August 22, 2012

Data Mining & Data warehousing unit 2 2 marks with Answers and 16 mark questions

Data Mining & Data warehousing    

Data Mining & Data warehousing  unit 2   2 marks with Answers and 16 mark questions 

Unit II

  1. Why preprocess the data?

Data that is to be analyzed by data mining techniques are incomplete, noisy, and inconsistent. These are the commonplace properties of large real world databases and data warehouses. To remove all these errors data must be preprocessed.

  1. What are the data preprocessing techniques?

Data preprocessing techniques are

v     Data cleaning-removes noise and correct inconsistencies in the data.

v     Data integration-merges data from multiple sources into a coherent data store such as data warehouse or a data cube.

v     Data transformations-such as normalization improve the accuracy and efficiency of mining algorithms involving distance measurements.

v     Data reduction-reduces the data size by aggregating, eliminating redundant features, or clustering

  1. Give the strategies involved in data reduction.

Data reduction obtains a reduced representation of the data set that is much smaller in volume but produces the same analytical results. The strategies involved are:

v     Data aggregation-eg building a data cube.

v     Dimension reduction-eg removing irrelevant attributes through correlation analysis.

v     Data compression-eg using encoding schemes such as minimum length encoding or wavelets

  1. What is noise?

Noise is a random error or variance in a measured variable.

  1. Give the various data smoothing techniques.

Binning, clustering, combined computer and human inspection, regression.

  1. Define data integration.

Data integration combines data from multiple sources into a coherent data store. These sources may include multiple databases, data cubes or flat files.

  1. Give the issues to be considered during data integration.

Schema integration, Redundancy, detection and resolution of data value conflicts.

  1. Define redundancy.

An attribute is said to be redundant if it can be derived from another table. Inconsistencies in attribute or dimension naming can also cause redundancies in the resulting data set.

  1. Explain data transformation.

In data transformation data are transformed or consolidated into forms appropriate for mining. Data transformation can involve the following, smoothing, aggregation, generalization, normalization, and attribute construction.

  1. Define normalization and give the methods used for normalization.

In normalization attribute data are scaled so as to fall within a small specified range such as -1.0 to 1.0 or 0.0 to 1.0. The methods used for normalization are min-max normalization, z-score normalization and normalization by decimal scaling.

  1. Define data reduction.

It is used to obtain a reduced representation of the data set that is much smaller in volume yet closely maintains the integrity of the original data. I-e mining on the reduced set should be more efficient yet produce the same analytical results.

  1.  Give the strategies used for data reduction.

Data cube aggregation, dimension reduction, data compression, numerosity reduction, and discretization and concept hierarchy generation.

  1. Explain data cube aggregation.

Data cube store multidimensional aggregated information. Each cell holds an aggregate data value data value, corresponding to the data point in multidimensional space.

  1. Explain dimensionality reduction.

Data sets for analysis may contain hundreds of attributes many of which may be irrelevant to the mining task, or redundant. Leaving out relevant attributes or keeping irrelevant attributes may result in poor quality of discovered patterns.

  1. How dimensionality reduction is achieved?

Dimensionality reduction is achieved using attribute subset selection. The goal is to find a minimum set of attributes such that the resulting probability distribution of the data classes is as close as possible to the original distribution obtained using all attributes.

  1.  Define data compression.

In data compression, data encoding or transformation are applied so as to obtain a reduced or "compressed" representation of the original data. If original data can be reconstructed from the compressed data without any loss of information, data compression technique used is called lossless. If only an approximation of the original data is obtained then it is called lossy.

  1. Give examples for lossy data compression.

Wavelet transforms and principal component analysis.

  1. Define Discrete Wavelet transforms.

DWT is a linear signal processing technique that when applied to a data vector D, transforms it to a numerically different vector, D' of wavelet coefficients. The two vectors are of the same length. It achieves better lossy compression.

  1. Define numerosity reduction.

Numerosity reduction is a technique used to reduce the data volume by choosing alternative smaller forms of data representation. These techniques may be parametric or nonparametric.

  1. Give examples for parametric and nonparametric methods.

Parametric-Log-linear models

Non-parametric-histogram, clustering, sampling

Part B

                              1)            Explain in detail data cleaning and data integration process in detail.

                              2)            Explain data reduction in detail.

                              3)            Define data mining query language. Explain in detail.

                              4)            Explain attribute oriented induction in detail.

                              5)            Explain mining descriptive statistical measures in large databases.

                              6)            List out the system managers and explain their responsibilities?

                              7)            Explain the responsibilities of process manager?

                              8)            Explain the usage of different algorithms to construct a Decision tree automatically with example.

                              9)            When is a data marring appropriate?

                          10)            Explain the uses of metadata in data warehousing process?


--
Hackerx Sasi
Don't ever give up.
Even when it seems impossible,
Something will always
pull you through.
The hardest times get even
worse when you lose hope.
As long as you believe you can do it, You can.

But When you give up,
You lose !
I DONT GIVE UP.....!!!

with regards
prem sasi kumar arivukalanjiam

No comments:

Post a Comment

Slider

Image Slider By engineerportal.blogspot.in The slide is a linking image  Welcome to Engineer Portal... #htmlcaption

Tamil Short Film Laptaap

Tamil Short Film Laptaap
Laptapp

Labels

About Blogging (1) Advance Data Structure (2) ADVANCED COMPUTER ARCHITECTURE (4) Advanced Database (4) ADVANCED DATABASE TECHNOLOGY (4) ADVANCED JAVA PROGRAMMING (1) ADVANCED OPERATING SYSTEMS (3) ADVANCED OPERATING SYSTEMS LAB (2) Agriculture and Technology (1) Analag and Digital Communication (1) Android (1) Applet (1) ARTIFICIAL INTELLIGENCE (3) aspiration 2020 (3) assignment cse (12) AT (1) AT - key (1) Attacker World (6) Basic Electrical Engineering (1) C (1) C Aptitude (20) C Program (87) C# AND .NET FRAMEWORK (11) C++ (1) Calculator (1) Chemistry (1) Cloud Computing Lab (1) Compiler Design (8) Computer Graphics Lab (31) COMPUTER GRAPHICS LABORATORY (1) COMPUTER GRAPHICS Theory (1) COMPUTER NETWORKS (3) computer organisation and architecture (1) Course Plan (2) Cricket (1) cryptography and network security (3) CS 810 (2) cse syllabus (29) Cyberoam (1) Data Mining Techniques (5) Data structures (3) DATA WAREHOUSING AND DATA MINING (4) DATABASE MANAGEMENT SYSTEMS (8) DBMS Lab (11) Design and Analysis Algorithm CS 41 (1) Design and Management of Computer Networks (2) Development in Transportation (1) Digital Principles and System Design (1) Digital Signal Processing (15) DISCRETE MATHEMATICS (1) dos box (1) Download (1) ebooks (11) electronic circuits and electron devices (1) Embedded Software Development (4) Embedded systems lab (4) Embedded systems theory (1) Engineer Portal (1) ENGINEERING ECONOMICS AND FINANCIAL ACCOUNTING (5) ENGINEERING PHYSICS (1) english lab (7) Entertainment (1) Facebook (2) fact (31) FUNDAMENTALS OF COMPUTING AND PROGRAMMING (3) Gate (3) General (3) gitlab (1) Global warming (1) GRAPH THEORY (1) Grid Computing (11) hacking (4) HIGH SPEED NETWORKS (1) Horizon (1) III year (1) INFORMATION SECURITY (1) Installation (1) INTELLECTUAL PROPERTY RIGHTS (IPR) (1) Internal Test (13) internet programming lab (20) IPL (1) Java (38) java lab (1) Java Programs (28) jdbc (1) jsp (1) KNOWLEDGE MANAGEMENT (1) lab syllabus (4) MATHEMATICS (3) Mechanical Engineering (1) Microprocessor and Microcontroller (1) Microprocessor and Microcontroller lab (11) migration (1) Mini Projects (1) MOBILE AND PERVASIVE COMPUTING (15) MOBILE COMPUTING (1) Multicore Architecute (1) MULTICORE PROGRAMMING (2) Multiprocessor Programming (2) NANOTECHNOLOGY (1) NATURAL LANGUAGE PROCESSING (1) NETWORK PROGRAMMING AND MANAGEMENT (1) NETWORKPROGNMGMNT (1) networks lab (16) News (14) Nova (1) NUMERICAL METHODS (2) Object Oriented Programming (1) ooad lab (6) ooad theory (9) OPEN SOURCE LAB (22) openGL (10) Openstack (1) Operating System CS45 (2) operating systems lab (20) other (4) parallel computing (1) parallel processing (1) PARALLEL PROGRAMMING (1) Parallel Programming Paradigms (4) Perl (1) Placement (3) Placement - Interview Questions (64) PRINCIPLES OF COMMUNICATION (1) PROBABILITY AND QUEUING THEORY (3) PROGRAMMING PARADIGMS (1) Python (3) Question Bank (1) question of the day (8) Question Paper (13) Question Paper and Answer Key (3) Railway Airport and Harbor (1) REAL TIME SYSTEMS (1) RESOURCE MANAGEMENT TECHNIQUES (1) results (3) semester 4 (5) semester 5 (1) Semester 6 (5) SERVICE ORIENTED ARCHITECTURE (1) Skill Test (1) software (1) Software Engineering (4) SOFTWARE TESTING (1) Structural Analysis (1) syllabus (34) SYSTEM SOFTWARE (1) system software lab (2) SYSTEMS MODELING AND SIMULATION (1) Tansat (2) Tansat 2011 (1) Tansat 2013 (1) TCP/IP DESIGN AND IMPLEMENTATION (1) TECHNICAL ENGLISH (7) Technology and National Security (1) Theory of Computation (3) Thought for the Day (1) Timetable (4) tips (4) Topic Notes (7) tot (1) TOTAL QUALITY MANAGEMENT (4) tutorial (8) Ubuntu LTS 12.04 (1) Unit Wise Notes (1) University Question Paper (1) UNIX INTERNALS (1) UNIX Lab (21) USER INTERFACE DESIGN (3) VIDEO TUTORIALS (1) Virtual Instrumentation Lab (1) Visual Programming (2) Web Technology (11) WIRELESS NETWORKS (1)

LinkWithin