Data Mining & Data warehousing
Data Mining & Data warehousing unit 3 2 marks with Answers and 16 mark questions
Unit III
Part A
1) Define support in association rule mining
The rule A => B holds in the transaction set D with support s where s is the percentage of transactions in D that contain A U B i.e., both A & B. This is taken to be the probability, P (A U B).
2) Define confidence.
The rule A => B has confidence c in the transaction set D if c is the percentage of transactions in D containing A that also contains B. This is taken to be the Conditional Probability P (B|A).
3) Define occurrence frequency of an item set.
A set of items is referred to as an item set. The occurrence frequency of an item set is the number of transactions that contain the item set.
4) How is association rules mined from large databases?
Association rule mining is a two step process.
· Find all frequent item sets
· Generate strong association rules from the frequent item sets
5) When an item set satisfies minimum support?
An item set satisfies minimum support if the occurrence frequency of the item set is greater than or equal to the product of min_sup and the total number of transactions in D.
6) Define minimum support count.
The number of transactions required for the item set to satisfy minimum support is therefore referred to as minimum support count. If an item set satisfies minimum support then it is a frequent item set.
7) Give the classification of association rules.
· Based on the types of values handled in the rule
· Based on the dimensions of data involved in the rule.
· Based on the levels of abstractions involved in the rule set.
· Based on various extensions to association mining.
8) Define Frequent Closed Item Set.
Frequent Closed Item Set is a frequent closed item set where an item set c is closed if the there exists no proper superset of c, c' such that every transaction containing c also contains c'.
9) Define Apriori property.
If an item set I does not satisfy the minimum support threshold, min_sup then I is not frequent i.e., P (I) < min_sup. If an item A is added to the item set I then the resulting item set I U A cannot occur more frequently than I. Therefore I U A is not frequent either i.e., P (I U A) < min_sup.
10) Define Anti-Monotone property.
If a set cannot pass a test or, all of its supersets will fail the same test as well. It is called anti-monotone because property is monotonic in the context of failing a test.
11) List the two step process involved in Apriori algorithm.
· Join Step
· Prune Step
12) List the search strategy for mining multi level associations with reduced support.
· Level by level independent
· Level cross filtering by single item
· Level cross filtering by K item set.
13) Compare Level by level independent and level cross filtering by K item set strategy.
Level by level independent strategy lead to examining numerous infrequent at low levels finding association between items of little importance.
Level cross filtering by K item set strategy allows the mining system to examine only the children of frequent K item sets. This restriction is very strong in that, there usually are not K item sets that when combined are also frequent. Hence many valuable patterns may be filtered out using this approach.
14) Define single dimensional association rule.
Buys(X, "IBM desktop computer") => buys(X, "Sony b/w printer")
The above rule is said to be single dimensional rule since it contains a single distinct predicate (eg buys) with multiple occurrences (i.e., the predicate occurs more than once within the rule. It is also known as intra dimension association rule.
15) Define multi dimensional association rules.
Association rules that involve two or more dimensions or predicates can be referred to as multi dimensional associational rules.
Age(X, "20…29") ^ occupation (X, "Student") => buys (X,"Laptop")
The above rule contains three predicates (age, occupation, buys) each of which occurs only once in the rule. There are no repeated predicates in the above rule. Multi dimensional association rules with no repeated predicates are called interdimension association rules.
16) Define categorical attribute.
Categorical attributes have finite number of possible values with no ordering among the values (eg. Occupation, brand). Categorical attributes are also called nominal attributes since there values are "Names of Things".
17) Define Quantitative Attributes.
Quantitative attributes are numeric and have an implicit ordering among values (eg age, income, and price).
Part B
1) Define Apriori algorithm in detail.
2) Explain the techniques used to improve the efficiency of Apriori algorithm?
3) Explain mining frequent item sets without candidate generation?
4) Explain multi level association rules from transaction databases?
5) Explain association rule mining in detail with an example?
- What are the security requirements in a data warehouse?
- What are the overnight operations in the data warehouse?
- .What is clustering? Briefly describe the following approaches: partitioning methods, hierarchical methods, density-based method an model based method?
- Briefly outline how to compute the dissimilarity between objects described by the following types of variables: Asymmetric binary variable, nominal variable, ratio-scaled variable, numerical variable.
10. Describe the OLAP data cub techniques.
No comments:
Post a Comment