Databases
A.Y. 2024/2025
Learning objectives
The objective of the course is to illustrate the main concepts and foundations of databases and data management systems (DBMSs), languages (relational algebra and SQL) for the management and querying of databases, methodology for the design of databases, the physical data organization, and transaction management. The course will also illustrate the main concepts related to distributed database architectures, to semi-structured data, active databases, and databases for decision support.
Expected learning outcomes
At the end of the course, the student will have acquired the ability to design, manage, and query relational databases and will learn the basic concepts related to the technology of database management systems also in relation to the physical organization, management of transactions, management of semi-structured data, active databases and databases for decision support.
Lesson period: Second four month period
Assessment methods: Esame
Assessment result: voto verbalizzato in trentesimi
Single course
This course can be attended as a single course.
Course syllabus and organization
Single session
Responsible
Lesson period
Second four month period
Course syllabus
Databases: Elements
1. Introduction to databases. Information systems, organizational systems, and informatics systems. Information and data. Introduction to databases and DBMSs, data models, schemas, and instances. Logical and physical data independence, database languages, people involved in the database management.
2. The relational model. Logical data models. The relational data model: relations vs. tables; relations with attributes; notations; incomplete information and null values. Integrity constraints: tuple constraints; keys and null values; referential constraints.
3. Relational algebra. Basic operators (union, intersection, difference, selection, and projection) and derived operators (natural join, theta join, semi-join). Queries in relational algebra and equivalence of algebraic expressions. Query idioms.
4. SQL. Data Definition Language: elementary domains, schema definition, table definition, and user defined domains. Intra-relational and inter-relational constraints. SQL queries: simple queries, aggregate queries, GROUP BY queries, set and nested queries. Data modification in SQL: insertions, deletions, and updates. Definition of integrity constraints, assertions, and views. Access control.
5. Database design. The life cycle of information systems. Requirements collection and analysis. Methodologies for database design. Phases of the design methodology. The Entity-Relationship model: the basic constructs (entity, relationship, attribute, cardinality, identifiers, generalizations); documentation of E-R schemas; rules. Design strategies: top-down, bottom-up, inside-out, and mixed. Quality of a conceptual schema. Logical design: restructuring of E-R schemas (removing generalizations; selection of primary identifiers; partitioning/merging of entities and relationships); translation into the relational model; documentation of logical schemas. Mention of physical design.
Databases: Complements
6. Physical database organization. Access manager. Main memory, secondary memory, and buffer. Buffer manager and its primitives. File organization: sequential structures (entry-sequenced, array, sequentially ordered), hash-based structures, tree structures. B- and B+- trees. Organization of tuples within pages. Physical database design and definition of indexes.
7. Transactions management. Definition of transactions. ACID properties of transactions. Transactions and system modules. Reliable control system. Stable memory. Log: organization, record, and management. Failure management: warm restart and cold restart. Concurrency control. Anomalies of concurrent transactions. Serial and serializable schedules. View-equivalence and conflict-equivalence. Two-phase locking and its variations. Timestamp (mono-version and multi-version). Lock management. Locking and isolation levels in SQL. Deadlock management. Livelock and starvation.
8. Distributed architectures. Distributed data paradigms. Types of architectures. Distributed system properties. Client-server architecture. Distributed databases. Data fragmentation and allocation. Transparency levels. Distributed transactions: classification and ACID properties. Distributed query optimization. Lamport method. Distributed deadlock: definition and detection. Two-phase commit protocol: basic protocol; recovery protocols; protocol optimization; other commit protocols.
9. Semi-structured data. XML. Semi-structured data in XML. XML queries: XQuery and XPath; FLOWR expressions.
10. Active databases. E-C-A paradigm. Triggers. Levels of granularity and evaluation behaviour. Advanced features of active rules. Properties of active rules: termination, confluence, identical observable behaviour. Termination analysis. Applications of active rules.
11. Data analysis. OLTP vs. OLAP. Data warehouse: characteristics and architecture. Multi-dimensional data model. Operations on multi-dimensional data: slice-and-dice, roll-up, drill-down. Development of the data warehouse: ROLAP and MOLAP. ROLAP schemas: star schema and snowflake schema. ROLAP operations. SQL aggregations. Data mining: association and classification rules.
1. Introduction to databases. Information systems, organizational systems, and informatics systems. Information and data. Introduction to databases and DBMSs, data models, schemas, and instances. Logical and physical data independence, database languages, people involved in the database management.
2. The relational model. Logical data models. The relational data model: relations vs. tables; relations with attributes; notations; incomplete information and null values. Integrity constraints: tuple constraints; keys and null values; referential constraints.
3. Relational algebra. Basic operators (union, intersection, difference, selection, and projection) and derived operators (natural join, theta join, semi-join). Queries in relational algebra and equivalence of algebraic expressions. Query idioms.
4. SQL. Data Definition Language: elementary domains, schema definition, table definition, and user defined domains. Intra-relational and inter-relational constraints. SQL queries: simple queries, aggregate queries, GROUP BY queries, set and nested queries. Data modification in SQL: insertions, deletions, and updates. Definition of integrity constraints, assertions, and views. Access control.
5. Database design. The life cycle of information systems. Requirements collection and analysis. Methodologies for database design. Phases of the design methodology. The Entity-Relationship model: the basic constructs (entity, relationship, attribute, cardinality, identifiers, generalizations); documentation of E-R schemas; rules. Design strategies: top-down, bottom-up, inside-out, and mixed. Quality of a conceptual schema. Logical design: restructuring of E-R schemas (removing generalizations; selection of primary identifiers; partitioning/merging of entities and relationships); translation into the relational model; documentation of logical schemas. Mention of physical design.
Databases: Complements
6. Physical database organization. Access manager. Main memory, secondary memory, and buffer. Buffer manager and its primitives. File organization: sequential structures (entry-sequenced, array, sequentially ordered), hash-based structures, tree structures. B- and B+- trees. Organization of tuples within pages. Physical database design and definition of indexes.
7. Transactions management. Definition of transactions. ACID properties of transactions. Transactions and system modules. Reliable control system. Stable memory. Log: organization, record, and management. Failure management: warm restart and cold restart. Concurrency control. Anomalies of concurrent transactions. Serial and serializable schedules. View-equivalence and conflict-equivalence. Two-phase locking and its variations. Timestamp (mono-version and multi-version). Lock management. Locking and isolation levels in SQL. Deadlock management. Livelock and starvation.
8. Distributed architectures. Distributed data paradigms. Types of architectures. Distributed system properties. Client-server architecture. Distributed databases. Data fragmentation and allocation. Transparency levels. Distributed transactions: classification and ACID properties. Distributed query optimization. Lamport method. Distributed deadlock: definition and detection. Two-phase commit protocol: basic protocol; recovery protocols; protocol optimization; other commit protocols.
9. Semi-structured data. XML. Semi-structured data in XML. XML queries: XQuery and XPath; FLOWR expressions.
10. Active databases. E-C-A paradigm. Triggers. Levels of granularity and evaluation behaviour. Advanced features of active rules. Properties of active rules: termination, confluence, identical observable behaviour. Termination analysis. Applications of active rules.
11. Data analysis. OLTP vs. OLAP. Data warehouse: characteristics and architecture. Multi-dimensional data model. Operations on multi-dimensional data: slice-and-dice, roll-up, drill-down. Development of the data warehouse: ROLAP and MOLAP. ROLAP schemas: star schema and snowflake schema. ROLAP operations. SQL aggregations. Data mining: association and classification rules.
Prerequisites for admission
Knowledge of basic concepts of computer science.
Due to requirements established by the Academic Programs Committee, it is compulsory to have first passed the exam of Computer Programming.
Due to requirements established by the Academic Programs Committee, it is compulsory to have first passed the exam of Computer Programming.
Teaching methods
Recorded lectures.
Teaching Resources
P. Atzeni, S. Ceri, P. Fraternali, S. Paraboschi, R. Torlone, Basi di Dati, 5 ed., McGraw-Hill Italia, 2018
Slides and videos available on the online platform.
Slides and videos available on the online platform.
Assessment methods and Criteria
Written exam aimed at verifying the student's knowledge and understanding of the subject. The written exam includes theory questions and exercises. The duration of the exam is 2 hours and half. The mark is expressed in thirtieths and the grading will consider the correctness, completeness, and clarity of the answers to the questions and exercises. The exam is closed book. As per organization on the online learning mode, the exam can be split in two parts each covering part of the course. Each will be evaluated in thirtieths and will be considered passed if the evaluation is at least 15/30. For the exam to be considered passed, the average of the two will have to be at least 18/30.
INF/01 - INFORMATICS - University credits: 12
Lessons: 96 hours
Professors:
Livraga Giovanni, Samarati Pierangela
Shifts:
Professor(s)