MS-AMLV 2019 Schedule
9:30, Opening session
9:45, Keynote talk: Unreasonable Effectiveness of Noise: Training Deep Neural Networks by Optimizing Over Hyperparameters Path
Hyperparameter optimization remains important problem in the training of deep architectures. Despite many recent advances most of the approaches intrinsically linked to the sampling of hyperparameter space or greedy search. We show that at a negligible additional computational cost, results can be improved by sampling nonlocal paths instead of points in hyperparameter space. To this end, we interpret hyperparameters as controlling the level of correlated noise in training, which can be mapped to an effective temperature. We then perform training in the joint hyperparameter/model-parameter space with an optimal training protocol corresponding to the path in this space. We observe faster training and improved resistance to overfitting and show a systematic decrease in the absolute validation error, improving over benchmark results.
Based on the paper arxiv.org/abs/1909.04013
Speaker – Mykola Maksymenko
Mykola Maksymenko, R&D Director at SoftServe, drives technological development in applied science and AI, human-computing interactions, and sensing. Mykola holds a Ph.D. in Theoretical Condensed Matter Physics, with over ten years of research experience, previously working at the Max Planck Institute for the Physics of Complex Systems and the Weizmann Institute of Science.
10:30, Ensembling and Transfer Learning for Multi-domain Microscopy Image Segmentation
Oleh Misko and Dmytro Fishman
Nowadays, a lot of data is generated in medical, and particularly in the microscopy imaging area. Currently, researchers and lab technicians spend a lot of time analyzing this data due to slow algorithms and the need for manual work. Recent advancements in deep learning resulted in the development of methods that could be used to efficiently solve challenges in the microscopy imaging field. Image segmentation is one of the most common labor-intensive tasks in this area that could be automated with deep learning approaches, demonstrating remarkable performance across different areas. The difference between training data distribution and distribution of the unseen data is called the domain shift. A significant domain shift has been known to harm the model’s prediction accuracy. In this work, we were looking into ways to reduce the effect of the domain shift for microscopy image segmentation: using a single model for all domains, separate models for each domain or ensemble of models. The experiments are performed for two types of microscopy image modalities – fluorescent and brightfield. Furthermore, we explore the impact of transfer learning on the performance of trained models and the reduction of the domain shift.
11:00, 3D Reconstruction of 2D Sign Language Dictionaries
Roman Riazantsev and Maksym Davydov
In this paper, we review different approaches to hand pose estimation and 3D reconstruction from a single RGB camera for the purpose of converting 2D sign language dictionaries into animated 3D models. Unlike many other works aimed at real-time or near real-time translation, we focus on the quality of conversion given large video dictionary as input. Several approaches to training and validation are considered: pose reconstruction through depth estimation, training, and validation with synthetic data, training and validation with multiple views. Besides that, the work provides a review of various end-to-end algorithms for key-point detection trained on labeled data. Based on the results of the studied models, the outline of a possible solution to the 3D reconstruction task is proposed.
12:00, Generation of Memes to Engage Audience in Social Media
In digital marketing, memes have become an attractive tool for engaging an online audience. The memes have an impact on the buyers’ and sellers’ online behavior and the information spreading processes. Thus, the technology of generating memes is a significant tool for social media engagement.
The primary purpose of the project is to develop a new approach and compare it to the existing baselines in the field of social media content generation, more precisely – meme generation. The meme is an image superimposed with text with humoristic or sarcastic sense; the meme is just another flavor of visual online content.
This project is aimed at the application state of the art Deep Learning technics as Transformer architecture to the meme generation problem. To achieve project objectives we go through dataset collection; a model for the generation of a meme and a title based on the input text; defining optimal time to make a post; collecting post metrics; measuring system performance in terms of social network audience engagement.
12:30, Investigation of the Complex Data Distributions for Their Efficient Generation
Currently, the active development of image processing methods requires large amounts of correctly labeled data. The lack of quality data makes it impossible to use various machine learning methods. In case of limited possibilities for collecting real data, used methods for their synthetic generation. In practice, we can formulate the task of the high-quality generation of synthetic images as an efficient generation of complex data distributions, which is the object of study of this work. Generating high-quality synthetic data is an expensive and complicated process in terms of existing methods. We can distinguish two main approaches that are used to generate synthetic data: image generation based on rendered 3-D scenes and the use of GANs for simple images. These methods have some drawbacks, such as a narrow range of applicability and insufficient distribution complexity of the obtained data. When using GANs to generate complex distributions, in practice, we encounter a sharp increase in the complexity of the model architecture and training procedure. A deep understanding of the real data complex distributions can be used to improve the quality of synthetic generation. Minimizing the differences in the real and synthetic data distributions can improve not only the generation process but also develop tools for solving the problem of data shortage in the field of image processing.
14:00, Image Recommendation for Wikipedia Articles
Multimodal learning, which is simultaneous learning from different data sources such as audio, text, images; is a rapidly emerging field of Machine Learning. It is also considered to be learning on the next level of abstraction, which will allow us to tackle more complicated problems such as creating cartoons from a plot or speech recognition based on lips movement.
In this paper, we propose to research whether state-of-the-art techniques of multimodal learning, will solve the problem of recommending the most relevant images for a Wikipedia article. In other words, we need to create a shared text-image representation of an abstract notion paper describes, so that having only a text description machine would ”understand” which images would visualize the same notion accurately.
14:30, Matching Red Links with Wikidata Items
Kateryna Liubonko and Diego Saez-Trumper
This work is dedicated to Ukrainian and English editions of the Wikipedia encyclopedia network. The problem to solve is matching red links of a Ukrainian Wikipedia with Wikidata items that is to make Wikipedia graph more complete. To that aim, we apply an ensemble methodology, including graph-proprieties, and information retrieval approaches.
9:45, Keynote talk: Fitting Machine Translation Into Clients
We’re making neural machine translation efficient enough to run with high quality on a desktop, preserving privacy compared to online services.
Doing so requires us to compress the model to fit in reasonable memory and run fast on a wide range of CPUs.
Speaker – Kenneth Heafield
Kenneth Heafield is a Lecturer (~Assistant Professor) leading a machine translation group at the University of Edinburgh. He works on efficient neural networks, low-resource translation, mining petabytes for translations, and occasionally grammatical error correction. He runs the Bergamot project to make client-side translation as a Firefox extension and the ParaCrawl project to mine the web for translations.
10:30, Towards a Theoretical Framework of Terminological Saturation for Ontology Learning from Texts
Victoria Kosa and Vadim Ermolayev
In this position paper, we propose a detailed technical outline of what needs to be done, for example in a Master project, to bridge the research gap for the problem of the existence of terminological saturation. The problem is studied regarding a sequence of incrementally growing sub-collections of documents describing an arbitrary subject domain, using the OntoElect approach. After reviewing the related work, we present the formal basics of the approach and experimental evidence of the existence of terminological saturation. Consequently, we formulate the research hypotheses, and outline the methodology for further research elaborating on this position.
11:00, Building a Feature Taxonomy of the Terms Extracted from a Text Collection
Svitlana Moiseyenko, Alexander Vasileyko, and Vadim Ermolayev
This position paper presents an approach for feature grouping and taxonomic relationship extraction with the further objective to build a feature taxonomy of a learned ontology. The approach needs to be developed as a part of the OntoElect methodology for domain ontologies refinement. The paper contributes a review of the related work in taxonomic relationships extraction from natural language texts. Within this review, the research gaps and remaining challenges are analyzed. The paper proceeds with outlining the envisioned solution. It presents the approach to this solution starting with the research questions, followed by the initial research hypotheses to be tested. Consequently, the plan of research is presented, including the potential research problems, the rationale to use and re-use existing components, and the evaluation plan. Finally, the proposed solution and the project are placed in the broader context of the overall OntoElect workflow.
12:00, Context-based Question Answering Model for the Ukrainian Language
We introduce a context-based question answering model for the Ukrainian language based on Wikipedia articles using Bidirectional Encoder Representations from Transformers (BERT) model which takes a context (Wikipedia article) and a question to the context. The result of the model is an answer to the question. The model consists of two parts. The first one is a pre-trained multilingual BERT model which are trained on the top-100 the most popular languages on Wikipedia articles. The second part is the fine-tuned model which is trained on the data set of questions and answers to the Wikipedia articles. The training and validation data is Stanford Question Answering Dataset (SQuAD) and Cross-lingual Natural Language Inference (XNLI).
There is no question answering datasets for the Ukrainian language. The plan is to build an appropriate dataset with machine translate and use it for the fine-tuning training stage and compare the result with models that were fine-tuned on the other languages. The next experiment is to train a model on the Slavic languages’ dataset before fine-tuning on the Ukrainian language and compare the results.
12:30, Towards Language Modelling for the Ukrainian Language
Language Modeling is one of the most important subfields of modern Natural Language Processing (NLP). The objective of language modeling is to learn a probability distribution over sequences of linguistic units pertaining to the language. As it produces a probability of the language unit that will follow, the language model is a form of grammar for the language, and it plays a key role in traditional NLP tasks, such as speech recognition, machine translation, sentiment analysis, text summarization, grammatical error correction, natural language generation. Much work has been done for the English language in terms of developing both training and evaluation approaches. However, there has not been as much progress for the Ukrainian language. In this work, we are going to explore, extend, evaluate and compare different language models for the Ukrainian language. The main objective is to provide a balanced evaluation data set and train a number of baseline models.
14:00, Detect Emotions and Sentiments for a Specific Domain in the Ukrainian Language
Dmytro Babenko and Vsevolod Dyomkin
The main idea of this project to develop a model that can recognize emotions from user review text. A lot of websites contain review text with marks (stars or another representation) about some products or services. Sometimes users can write only text without a mark, in this case, such a model can help us to understand the estimation of this user. It can be a review of some special product, about booking a hotel or renting a car or something like that. Furthermore, another purpose of this project to detect which characteristics were influenced for such review. Here, the model should detect sentiments from text and classify for positive and negative. Moreover, it will understand which reasons were caused by these emotions and why the user wrote such texts. As this project mostly works with text processing, it is a classical natural language processing task using deep learning. Probably, as a result, it would be some neural network that was selected as the best from several experiments.
14:30, Enriching the Number of Controllable Parameters for Text Generation
There are a lot of models that could generate text conditioned in some context, but those approaches don’t provide us an ability to control various aspects of the generated text like style, tone, language, tense, sentiment, lengths, grammaticality, etc. In this work, we’re exploring unsupervised ways to learn disentangled vector representations of sentences with different interpretable components and trying to generate text in a controllable manner based on obtained representations.
15:30, Topological Approach to Wikipedia Page Recommendation
Maksym Opirskyi and Petro Sarkanych
Human navigation in information spaces has increasing importance in ever-growing data sources we possess. Therefore, an efficient navigation strategy would give a huge benefit to the satisfaction of human information needs. Often, the search space can be understood as a network and navigation can be seen as a walk on this network. Previous studies have shown that despite not knowing the global network structure people tend to be efficient at finding what they need. This is usually explained by the fact that people possess some background knowledge. In this work, we explore an adapted version of the network consisting of Wikipedia pages and links between them as well as human trails on it. The goal of our research is to find a procedure to label articles that are similar to a given one. Among others, this would lay a foundation for a recommender system for Wikipedia editors, which will suggest links from the given page to the related articles. Our work is, therefore, providing a basement for enhancing the Wikipedia navigation process making it more user-friendly.
16:00, Parameterizing of human speech generation
In modern days synthesis of human images and videos is arguably the most popular topic in the Data Science community. The synthesis of human speech is less trendy but deeply bonded to the mentioned topic. Since the publication of WaveNet paper in 2016 the state-of-the-art approach transferred from parametric and concatenative systems to deep learning models. Each significant paper on the topic mentions the way to parameterize the output audio with different voices and sentiments, though parameterizing isn’t the main focus of those works. Most of the time-proven solutions require re-training of models for speech synthesis of unknown to the model voice. In my master’s degree, I aim to implement the competitive text-to-speech solution, enhance parameterization abilities, and improve the performance of current models.