Data Mining & Data warehousing
Data Mining & Data warehousing unit 1 2 marks with Answers and 16 mark questions
- What is Data mining?
Data mining refers to extracting or "mining" knowledge from large amount of data. It is considered as a synonym for another popularly used term Knowledge Discovery in Databases or KDD.
- Give the steps involved in KDD.
KDD consists of the iterative sequence of the following steps:
v Data cleaning
v Data integration
v Data selection
v Data transformation
v Data mining
v Pattern Evaluation
v Knowledge Presentation
- Give the architecture of a typical data mining system.
The architecture of a typical data mining system consists of the following components:
v Database, data warehouse, or other information repository
v Database or data warehouse server
v Knowledge base
v Data mining engine
v Pattern Evaluation module.
v Graphical user interface.
- Define Database management system.
A database system also called database management system consists of a collection of interrelated data known as a database and a set of software programs to manage and access the data.
- Define relational database.
Relational database is a collection of tables each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (record or rows).
- Define data warehouse
Data warehouse is a repository of information collected from multiple sources stored under a unified schema and which usually resides at a single site. It is constructed via a process of data cleaning, data transformation, data integration, data loading and periodic data refreshing.
- Define data mart and compare it with data warehouse.
Data mart is a department subset of a data warehouse. It focuses on selected subjects and thus its scope is department wide. On the other hand data warehouse collects information about subjects that span an entire organization and thus its scope is department wide.
- Define transaction databases.
A transaction database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number and a list of items making up the transaction.
9. Explain object oriented databases.
Object oriented databases are based on object-oriented programming paradigm where each entity is considered as an object. Each object has e associated with it the following:
v A set of variables
v A set of messages
v A set of methods.
- Explain spatial databases.
Spatial databases contain spatial-related information. Such databases include geographic databases, VLSI chip design databases, medical and satellite image databases. Spatial data are represented in raster format consisting of n-dimensional bit maps or pixel maps. Maps are represented in vector format where roads, bridges are represented as a union of basic geometric constructs such as points, lines, polygons etc.
- Explain temporal and time-series databases.
A temporal database usually stores relational data that include time-related attributes. These attributes may involve several timestamps each having different semantics.A time-series database stores sequence of values that change with time such as data collected regarding the stock exchange.
- Explain text databases and multimedia databases.
Text databases are databases that contain word description for objects. These descriptions are long sentences or paragraphs such as product specifications, error or bug reports etc.Multimedia databases store image, audio, and video data. They are used in applications such as picture content based retrieval, voice mail systems, www, etc.
- Define legacy databases.
A legacy database is a group of heterogenous databases that combines different kinds of data systems such as relational or objects oriented databases, hierarchical databases, or file systems.
- Give the classification of Data Mining tasks
Descriptive – Characterizes the general property of the data in the database.
Predictive – perform inference on the current data in order to make predictions.
- Describe class/concept description.
Data can be associated with classes or concepts. The individual classes can be described in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions. These descriptions can be derived via data characterization or data discrimination.
- Define data characterization.
It is a summarization of the general characteristics or feature of a target class of data. The data corresponding to the user-specified class are typically collected by a database query.
- Give the output forms of data characterization.
Pie charts, bar charts, curves, multidimensional data cubes and multidimensional tables including cross tabs. The resulting descriptions can also be presented as generalized relations or in rule form called characteristic rule.
- Define data discrimination.
It is a comparison of the general features of target data objects with the general features of objects from one or a set of contrasting classes. The target and contrasting classes are specified by the user and the corresponding data objects retrieved through database queries.
- What is an association analysis?
Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. It is widely used for market basket or transaction data analysis.
- Define Classification.
It is the process of finding set of models that describe and distinguish data classes or concepts for the purpose of being able to use the model to predict the class objects whose class label is unknown. The derived model is based on the analysis of a set of training data.
1) Define data mining. Describe the steps Involved in data mining when viewed as a process of knowledge discovery. Explain the architecture of the data mining system?
2) Describe the kinds of data on which data mining is performed?
3) Briefly explain the kinds of patterns that can be mined?
4) Give the classification of data mining system. Describe the issues related to data mining.
5) Define data warehouse. Explain its features. Differentiate operational database systems and data warehouses?
6) Briefly describe star snowflake and fact constellations schemas with examples?
7) Explain data warehouse architecture in detail?
8) How a fact table is to be designed for data warehouse process?
9) Explain the steps to be involved in designing the dimension table?
10) Write briefly about the horizontal partitioning strategy
11) Explain about vertical partitioning strategy
12) Explain about hardware partitioning strategy