Курс “Automated Term Extraction and Ontology Learning from Texts”, літо 2018

The Scope of the Course

This short MSci-level course covers one of the most vibrantly developed areas on the crossroads of Text Mining and Ontology Engineering, positioned as Ontology Learning from Texts. It gives a beginner’s, though professional and up-to-date introduction into:

  • (i) What are ontologies and why one needs (to be at least knowledgeable about) ontologies for being successful in Data Science, in particular in Data Analytics, and related disciplines?
  • (ii) What are the knowledge sources for building ontologies and why a representative collection of high-quality professional texts is a right sublimation for these sources?
  • (iii) How to elicit the bits of the required knowledge from texts? Why Automated Term Extraction (ATE, or Recognition, ATR) is a relevant approach? What is the ATE processing stack? How would Linguistic and Statistical Processing be reasonably married together for increasing the quality of ATE?
  • (iv) What is Linguistic Processing (LP) in the context of ATE? How would a generic LP technology stack and workflow look like? Which Natural Language Processing techniques are relevant for LP in ATE?
  • (v) What is Statistical Processing (SP) in the context of ATE? How would a generic SP technology stack and workflow look like? Which Statistical Processing techniques are relevant for SP in ATE? How would LP and SP be rationally combined in ATE?
  • (vi) How would a statistically representative subset be identified in a document collection for ATE? What is Terminological Saturation (TS)? How can TS be measured? Which factors influence TS? How would a minimal subset of documents, representing the decisive minority sentiment, be identified?
  • (vii) Provided that the set of terms is extracted, what are the next steps in learning the ontology based on these terms? How would a generic ontology learning workflow look like? – This will be presented based on the example of the OntoElect methodology for ontology refinement.
  • (viii) What are the highlights and pitfalls in the field of Ontology Learning from Texts? Why do fully automated approaches fall short?

Didactics

The course is given in the form of tutorials with a hands-on practical component taking ~50 percent of teaching time. After each lecture, except the introductory one, the students are offered to:

  • Use the instrumental software and the document collection(s) / dataset(s) provided by the tutor
  • Refine the software in some advised way, e.g. by introducing a more sophisticated metric or an improvement in an algorithm
  • Perform a cross-evaluation experiment to compare the initial revision of the software and their refined revision

These practical tasks are organized in a way to finally assemble a simple instrumental tool suite that helps to perform a basic ATE workflow.

The final slot of the course is organized as a cross-evaluation contest for the solutions by students. They are offered to apply their tool suites to the same document collection and measure the quality of ATE results. The ranked list of the solutions is built based on the comparison of these results.

Course Topics

1. Introduction: Data, Analytics, Ontologies, Text Mining, and ATE

2. Linguistic and Statistical Methods and Metrics for ATE

3. ATE Workflow: Phases and Tools

4. Terminological Saturation in Document Collections

5. Ontology Learning from Texts using OntoElect

6. Students’ ATE Solutions Cross-Evaluation Contest

Please download the detailed course description (PDF).

Course dates

The course dates are May 31 – June 2. The classes will go whole days, starting from morning until 18:00.

Course enrollment

The participants are enrolled in the course based on the application process. Please fill the following application form: goo.gl/forms/vwn90QYDsuFC4xbP2.

The personal motivation statement and previous background in Data Science are counted for the participant selection process. The application deadline is 20th of May. The application results will be announced no later than 23rd of May.

Please pay attention that the organizers could close the application earlier in case there will be enough requests to fill all free spaces in a class. Also, the organizers could ask for the additional interview with the applicants to clarify the aspects of their application and/or check prerequisites knowledge.

Course fee

The participation fee is 4,500 UAH.

The approved candidates must pay the course tuition fee during the period that is defined by the organizers. In case if there will be no payment from the participant side, the organizers may cancel the participant’s course registration and free the space for the next candidate. If you have any financial questions please contact the organizers as soon as possible (the contact information is provided below).

Certificates

The participants may be granted the official certificate of completion in case they gain at least 60% of the maximum grade. The certificate can be used to transfer credits to the participant’s original university if there is such need and the university’s policy allows such transfers.

About lecturer

Vadim Ermolayev, Ph.D., a professor at the Department of Information Technologies, Zaporozhye National University.

Vadim Ermolayev had studied Applied Mathematics and Computer Science at Samara Aerospace Institute, Russia and Dnepropetrovsk State University, Ukraine in 1979-1984. He has obtained an MSc Diploma in Applied Mathematics at Dnepropetrovsk State University in 1984 (cum laude). In 1994 he was awarded a Ph.D. degree in Mathematical Modelling at Zaporizhzhya State University, Ukraine. In 1997 he received his habilitation as the Docent (Assoc. Prof.) at the Department of Mathematical Modelling and IT of Zaporizhzhya State University.

From August 1984 until December 1986 Mr. Ermolayev worked as a research engineer, senior research engineer at Zaporizhzhya Research Institute for Radio Communication. In 1987 he was affiliated as a visiting researcher at Dnepropetrovsk All-Union Pipe Research Institute. From April 1987 till now he works at Zaporizhzhya State (later – National) University at various positions: researcher, senior researcher, director of University Computing Centre, Docent. Since 1995 he is also in charge of design and implementation of the University-wide Network and Information infrastructure as university administration adviser, project manager, and principal researcher. In 1997-2001 he was the member of Technical Committee of Ukrainian National Research and Education Network (URAN). Since 1991 he took and is taking part as a researcher, senior researcher, principal researcher and project manager in many research and RTD projects funded by European Commission (FP6 and FP7), European Training Foundation, Ukrainian Ministry of Education and Science, German Federal Ministry of Education and Research (BMBF), European Industrial Companies, International Renaissance Foundation, Zaporizhzhya State University.

Since 1994 Dr. Ermolayev teaches undergraduate and graduate courses in Software Algorithms and Data Structures, the Architectures of Operating Systems and Database Systems, Advances in Information Systems, Software Engineering and Programming Technologies, Agent and Semantic Technologies for e-Business, Knowledge Engineering, the Semantic Web and Web Services, Logic Programming, and Artificial Intelligence. He supervised over 50 successfully accomplished master theses and several PhDs. He has also been the member of about 20 Ph.D. Committees. From September 2000 till July 2003 he served also as the deputy president of the University responsible for IT, networking and computing.

Dr. Ermolayev has published over 100 papers as journal articles, book chapters, refereed conference and workshop contributions, technical reports. He also co-edited or (co-)authored several proceedings volumes, technical manuals, and textbooks. He serves as a member of Editorial Advisory Boards, Editorial Review Boards of international journals and book series, a program committee member of many international conferences and workshops.

Dr. Ermolayev also possesses working experience in industrial R&D. He worked at various branches of industry as a full-time employee, a freelance sub-contractor, or a research consultant. He also serves to EC as an independent expert in ICT programme.

Dr. Ermolayev is the founder and the head of the Intelligent Systems Research Group (ISRG) at Zaporizhzhya National University.

Personal website: ermolayev.com

Contacts

E-mail: [email protected]
Facebook: www.facebook.com/ucucsds/