Web mining - course description

General information

Course name	Web mining
Course ID	11.3-WE-BizElP-EkspZasInter-Er
Faculty	Faculty of Computer Science, Electrical Engineering and Automatics
Field of study	E-business
Education profile	practical
Level of studies	First-cycle Erasmus programme
Beginning semester	winter term 2022/2023

Course information

Semester	5
ECTS credits to win	4
Course type	obligatory
Teaching language	english
Author of syllabus	dr hab. inż. Artur Gramacki, prof. UZ

Classes forms

The class form	Hours per semester (full-time)	Hours per week (full-time)	Hours per semester (part-time)	Hours per week (part-time)	Form of assignment
Lecture	15	1	-	-	Credit with grade
Project	30	2	-	-	Credit with grade

Aim of the course

To familiarize students with basic models and techniques for discovering information found on the Internet
To familiarize students with text mining algorithms
Developing skills of exploring Internet resources based on statistical software.

Prerequisites

Basics of statistics

Scope

Types of information on the internet. Introduction to Text Mining. Searching textual information. Preprocessing of text documents: removing unnecessary elements from text documents (stop list, punctuation, numbers, etc.), reducing words to the form of a semantic core using Porter's algorithm and selected IT libraries. Search by keywords. Organization of documents in the form of a term-document matrix (TDM) and various ways of calculating the weight of individual terms (TF - term frequency, IDF - inverse document frequency). Measures of similarity of vectors and using them to create a ranking of found documents. Comparing the quality of text document search engines using various measures, e.g. precision-recall, ROC curves. Selected elements of linear algebra and applying them to the task of TDM matrix approximation (Low-rank approximation), discussing the benefits of approximation. Various techniques for grouping and classifying documents. Document ranking based on connection structure: PageRank algorithm; authorities and hubs. Creating document summaries by automatically selecting the most important sentences and the most important words (terms). Creating wordclouds. Sentiment analysis as a technique to systematically identify, extract, quantify, and study affective states and subjective information (e.g. positive, negative, neutral, etc.). Word embeddings. Recommendation systems (user-based, item-based). Presentation of selected IT tools for carrying out tasks in the field of Text Mining.

Teaching methods

Lecture, individual projects.

Learning outcomes and methods of theirs verification

Outcome description	Outcome symbols	Methods of verification	The class form

Assignment conditions

Lecture – the passing condition is to obtain a positive mark from the final test

Project– the passing condition is to obtain a positive mark from the project form

Calculation of the final grade: lecture 50% + project 50%

Notes

Modified by dr hab. inż. Artur Gramacki, prof. UZ (last modification: 21-04-2022 00:07)

Generate PDF for this page