SylabUZ
Course name | Data mining |
Course ID | 04.2-WE-BizElP-DataMining-Er |
Faculty | Faculty of Computer Science, Electrical Engineering and Automatics |
Field of study | E-business |
Education profile | practical |
Level of studies | First-cycle Erasmus programme |
Beginning semester | winter term 2019/2020 |
Semester | 2 |
ECTS credits to win | 3 |
Course type | obligatory |
Teaching language | english |
Author of syllabus |
|
The class form | Hours per semester (full-time) | Hours per week (full-time) | Hours per semester (part-time) | Hours per week (part-time) | Form of assignment |
Lecture | 15 | 1 | - | - | Credit with grade |
Laboratory | 30 | 2 | - | - | Credit with grade |
Presentation of the software used for data mining. Familiarize students with the methods of data cleaning. Presentation of data classification methods. Presentation of methods of association and sequences rules discovery. Presentation of data clustering methods. Developing practical skills in operating selected data mining systems. Developing skills in the application of data mining methods in e-business (customer segmentation, credit risk scoring, cross-selling strategies, fraud detection).
Review and characteristics of the software used for data mining. Introduction to data mining software (SAS). Data structures used in data mining. Types and roles of variables in data mining tasks.
Preparation of data for exploration. Data profiling. Data cleansing. Data sampling. Transformation of variables. Variable Selection.
Data classification. Classification trees, k-nearest neighbors, naive Bayes classifier, neural networks, logistic regression. Measures of classification accuracy. Practical exercises from data classification.
Discovering association and sequence rules. Measures describing the statistical importance and strength of association and sequence rules. Market basket analysis. Computational complexity of association rules discovery. Discussion of the Apriori and Generalized Sequential Pattern algorithm. Practical exercises from association and sequence rules discovery.
Data clustering. Methods of hierarchical clustering. Clustering methods based on iterative optimization. Distance measures used in clustering algorithms. Clusters summary. Methods for estimating the number of clusters. Practical exercises from data clustering.
Lecture - conventional lecture using a video projector.
Laboratory - practical exercises in the computer laboratory.
Outcome description | Outcome symbols | Methods of verification | The class form |
Lecture - the passing criteria is to obtain positive grades from tests carried out at least once in a semester.
Laboratory - the passing criterion is to obtain positive marks for laboratory exercises and tests.
Final mark components = lecture: 50% + teaching laboratory: 50%
1. Hastie T., Tibshirani R., Friedman J.H.: The Elements of Statistical Learning, Springer 2001
Modified by dr hab. inż. Marek Kowal, prof. UZ (last modification: 09-12-2019 11:52)