MISM5302  MDA5304  DATA MINING AND WAREHOUSING  KCA Past Paper

UNIVERSITY EXAMINATIONS: 2018/2019
EXAMINATION FOR THE DEGREE OF MASTER OF SCIENCE IN
INFORMATION SYSTEMS MANAGEMENT/ DATA ANALYTICS
MISM5302 MDA5304 DATA MINING AND WAREHOUSING
DATE: APRIL 2019 TIME: 2 HOURS
INSTRUCTIONS: Answer Question One & ANY OTHER TWO questions.

QUESTION ONE
(a) Briefly describe knowledge discovery process. Draw a diagram to illustrate the process
(5 Marks)

(b) Discuss four common data mining tasks. Use an example to illustrate each task (4 Marks)
(c) Discuss two main categories of data mining techniques and Describe two techniques in each
category (4 Marks)
(d) Given the following data set, Use K-Means algorithm to perform cluster mining, where k=2
(6 Marks)

(e) Discuss one application of clustering in business enterprises (1 Mark)
QUESTION TWO
(a)Describe any two techniques for smoothening noisy data during pre-processing phase
(2 Marks)
(b) Briefly describe the following terms as used in data mining and warehousing
(i) Data Normalization (1 Mark)
(ii) Data labeling (1 Mark)
(iii) Data standardization (1 Mark)
(c) Discuss four properties of an interesting pattern in the context of data post-processing
(4 Marks)
(d) Consider the following data set
sepal length sepal width petal length petal width class

Use the above data set to answer the following questions
(i) identify independent and dependent attributes (2 Marks)
(ii) Given that the split point =3,write sample python code to split the data set into test and
training data set, (2 Marks)
(iii) Write sample python code to split the data into independent and dependent attributes
(2 Marks)

QUESTION THREE
(a) Given a certain data set, describe two reasons for considering using decision tree learning to
perform data mining (2 Marks)
(b) Describe the meaning of the term “entropy” as used in decision tree learning (1 Mark)
(c) Briefly explain four methods that can be used for choosing an attribute to divide a give
training data set during decision tree learning (4 Marks)
(d) Given the following data set, compute information gain for selecting humidity attribute as
the root of decision tree during decision tree learning using ID3 algorithm (5 Marks)


(e). Consider the following confusion matrix:
a b c <– classified as
9 6 5 | a = Msc in Information systems management
1 3 4 | b = Msc in Data Analytics
2 8 7| c = Msc in data communications
Use the above confusion matrix to determine the following: (3 Marks)
(i) Precision for Msc Msc in Data Analytics class
(ii) Recall for Msc Information systems management Class
(iii) Overall accuracy level
QUESTION FOUR
(a) Describe four properties of a data warehouse (4 Marks)
(b) State and explain three architectures of a data warehouse (3 Marks)
(c) Describe the meaning of the following terms in the context of warehousing
(i) Dimension (1 Mark)
(ii) Schema (1 Mark)
(iii) Fact (1 Mark)
(d) Describe the meaning of initials ETL in the context of data warehousing (3 Marks)
(a) Briefly describe two properties of a data mart (2 Marks)

(Visited 85 times, 1 visits today)
Share this: