MS-AMLV 2022 Schedule

Due to the Russian invasion of Ukraine, the Symposium was moved to a virtual asynchronous format. Below, you will find a list of accepted papers. Please join the symposium and papers’ discussion on the virtual platform open.ucu.edu.ua (you will need to register on the website beforehand). Each paper is accompanied by a prerecorded video, full text, and forum section where you can raise questions to the authors. The symposium and discussion are open from March 24-25.

MobileTrack: Efficient Siamese Neural Network for Mobile Object Tracking

Vasyl Borsuk and Orest Kupyn Visual object tracking is one of the most fundamental research topics in computer vision that aims to obtain the target object’s location in a video sequence given the object’s initial state in the first video frame. The recent advance of deep neural networks, specifically Siamese networks, has led to significant progress in visual object tracking. Despite being accurate and achieving high results on academic benchmarks, current state-of-the-art approaches are compute-intensive and have a large memory footprint that cannot satisfy the strict performance requirements of real-world applications. This work focuses on designing a novel lightweight framework for resource-efficient and accurate visual object tracking. Additionally, we introduce a new tracker efficiency benchmark and protocol where efficiency is defined in terms of both energy consumption and execution speed on edge devices.

One-shot Facial Expression Reenactment using 3D Morphable Models

Roman Vei and Orest Kupyn Face Reenactment was a topic of interest in a last year’s researches. It aims to generate novel poses and emotions from a given human head, preserving its identity. Current approaches have limitations, making them hard to use in real applications. Some limitations are the slow optimization process, not obvious tools for image manipulation (face boundaries of target image), obligatory audio, or multiple images needed for processing. Our method addresses the single-shot face reenactment problem end-to-end without additional data and iterative optimizations. The proposed method utilizes head 3D morphable model (3DMM) parameters to encode identity, pose, and expression. With our approach, the emotions and pose of a real person on an image can be easily changed by modifying 3DMM parameters. Our work will consist of a face mesh predictor and GAN-based renderer. A predictor is a neural network with simple encoder architecture inside that regresses parameters of a 3D mesh. A renderer is a network similar to HeadGAN that renders images from the single source image and 3DMM parameters. The main building block inside the renderer is a SPADE block used inside GANs for image generation from semantic masks. As a result, we propose a framework that efficiently generates new poses and emotions from a single image and works in real-life applications.

Application of the textural transformer to the task of super-resolution with reference

Teodor Romanus, Roman Riazantsev and Maksym Davydov We study a partial super-resolution problem, which aims to recover a realistic high-resolution (HR) image from a low-resolution (LR) image with a known high-resolution reference part. In recent years progress has been made in the field of image super-resolution with a reference image, allowing the transfer of relevant textures to LR images. In this paper, we propose to use the novel Texture Transformer Network for Image Super-Resolution (TTSR) to solve the Partial-SR task. The known approaches to this problem suffer from selection of incorrect textures producing visually unacceptable artifacts. The usage of the attention mechanism allows to take advantage of joint feature learning in LR and HR parts of images simultaneously, in which deep feature correspondences can be discovered by attention. This approach exhibits accurate transfer of texture features. The experiments compare the Partial-TTSR network with the state-of-the-art approaches on image zooming completion and damaged image parts reconstruction tasks as well as solving both tasks simultaneously.

Balancing between real and synthetic data for object detection in automotive vehicle domain

Roman Ilechko and Viktor Sdobnikov In recent years the deep learning domain rapidly grew. While the quality of brought solutions increased, simultaneously, the models complexity rose dramatically. Despite the fact that available public datasets are an acceptable solution for many machine learning and research projects, there is always a temptation to increase the training data with the aim to attain a more pleasing result. Even more, domain-specific cases often require additional data, which is usually not publicly available. Data labeling is an obvious solution in the faced situation; it also has weaknesses. Manually collected and labeled data is not the panacea due to time-consuming and high error risk. The synthetical data could be a suitable solution based on the facts above but also have drawbacks. Nowadays, the researcher should intuitively balance the ratio of natural and generated data simultaneously, considering the possibility of gaps between the two domains. Despite the fact that the mentioned task is not evident, constraints like model size and count of classes could bring additional unclarity more. In this paper, we precisely analyze the impact of synthetic data on the training process, cover the possible training strategies, and provide guidance on defining the amount of artificial data with existing constraints.

Polyp detection and segmentation from endoscopy images

Mariia Kokshaikyna, Maria Dobko and Oles Dobosevych Endoscopy is a widely used clinical procedure for the detection of different diseases in internal organs such as the stomach and colon. Modern endoscopes allow getting high-quality video during the procedure. Computer-assisted methods might support medical specialists in detecting or segmenting anomaly regions on the picture. Many datasets are available and methods to detect polyp regions have been proposed. One kind of task is polyps segmentation on images and videos. Overall, there are classical approaches, fully supervised, semi-supervised, and unsupervised methods applicable to this task. The best results in semantic segmentation of polyps on the Kvasir-SEG dataset are now achieved with fully supervised approaches. In this paper, we describe experiments with CaraNet (supervised) model. We checked robustness on cross-validation on several publicly available datasets. We also describe problems that appeared when trying to apply these models to a custom dataset. And we describe further experiments in the hope to improve the performance of SOTA approaches in endoscopy image segmentation and anomaly detection.

Improved nnU-Net architecture for tumour segmentation

Iryna Zakharchenko and Dmytro Fishman The main field of this research is tumour and cyst segmentation using 3D computed tomography scans. Deep learning models are significantly affected by the volume of training data. We are building a pipeline to enrich existing data using datasets that do not have organs’ segmentations. Another part of the pipeline is finding the architecture that will have promising results on tumour segmentation of different organs. We will extensively compare nnU-Net improvements, such as adding residual blocks and attention mechanisms.

Multi-temporal Satellite Imagery Panoptic Segmentation of Agricultural Land in Ukraine

Marian Petruk and Taras Firman Remote sensing of the earth using satellites helps to analyze the earth’s resources, monitor local land surface changes, and study global climate changes. In particular, farmland information helps farmers in decision-making, planning and increases productivity to achieve better agro-ecological conditions. In this work, we primarily focus on panoptic segmentation of agricultural land, a combination of two parts: 1) delineation of parcels (instance segmentation), 2) classification of parcel crop type (semantic segmentation). Second, we explore how multi-temporal satellite imagery data compares to a single image query in segmentation performance. Third, we review the recent advances in Deep Learning and Computer Vision that improve the performance of such systems. Finally, we explore the state-of-the-art in agricultural land of Ukraine analysis, where the farmland market has just opened.

Semantic segmentation for multi-channel microscopy images using the Voronoi diagrams

Anton Borkivskyi and Andrii Babii Semantic image segmentation has been developing rapidly over the past years. This problem is crucial in the scope of biological images segmentation because it helps to identify cells and nuclei faster. It results in boosting medical researches and increasing the speed of treatment development. In this work, we try to combine existing approaches for semantic segmentation, such as convolutional neural networks (e.g. U-Net) and Voronoi regions. Applying Voronoi regions could provide additional information for model training about cell borders and therefore increase the accuracy of the solution. Also, we can try to segment cell components after cell segmentation itself. Then this information can be used for initial feature preprocessing for further analysis. Using a segmented cell, we can try to calculate its most important features (size, roundness, etc.). Then we can generate text descriptions with specific information.

Weakly Supervised Anomaly Detection in Industrial Processes through Video Recognition

Vladyslav Kutsuruk and Viktor Sakharchuk Unsupervised learning of visual representations and anomaly detection in videos, in particular, remains an actual but challenging problem. The main goal is to capture a difference between generic and anomalous videos while avoiding annotating the segments or clips with anomalies in training videos. Additionally, real-time decision making is a notable factor in this domain which often goes as a tradeoff between performance and speed due to time and resource constraints. Most of the current systems used to detect anomalies in industrial processes are done manually in a supervised manner and are domain-specific. At the same time, the SOTA Deep Learning methods that perform pretty well at a larger scope are usually very demanding in terms of resources and time. In this research, we have introduced a weakly supervised anomaly detection model which can work on embedded systems. The proposed method uses a model pre-trained on the normal videos and can detect anomalies in most of the corresponding industrial processes (e.g. operation of the lathe, perimeter guarding, gas tank refilling). We have used autoencoders with third-party feature detectors to make the solution light and scalable, thus usable on CPUs and embedded GPUs.

Training Oped-Domain Conversational Model for the Ukrainian Language

Liza Bykhanova and Oleksiy Syvokon Despite the availability of large pretrained transformer-based models trained on multiple language corpus, training open-domain conversational models for the Ukrainian language is still a challenging task. Collecting enough high-quality data is time-consuming and costly. In this work, we aim to investigate how available data such as dialogue dataset in English and movie subtitles in Ukrainian can be used to extend the training dataset and evaluate the effect of such extension on the performance of the conversational model.

Natural language processing while maintaining user privacy

Oleksandr Yermilov, Artem Chernodub and Vipul Raheja Data anonymization techniques have been developing rapidly in the last few years due to the rising request for privacy. However, natural language processing (NLP) approaches to text anonymization are limited and error-prone, negatively affecting the quality of downstream models trained on anonymized texts. Conventionally, named entity recognition and rule-based approaches have been used for the de-identification of the data. These methods have been limited owing to a lack of annotated data or standardized annotation standards. More recently, several approaches have sought to transform latent representations of texts to protect confidential attributes, using adversarial learning or reinforcement learning. However, those methods operate at the level of latent vector representations and do not modify the texts themselves. Further, the problem of replacing identifiers with surrogate values has rarely been addressed in NLP. This work investigates the impact of different text anonymization techniques on the data and models used for various downstream natural language processing, understanding, and generation tasks. Our work will provide crucial insights into the gaps between pre- and post-anonymization data and model quality and foster future research into higher-quality anonymization techniques to better balance the trade-offs between data protection and utility preservation.

Personalizing Large Language Models

Khrystyna Skopyk, Artem Chernodub and Vipul Raheja Large language models (LLMs), such as GPT-3 or T5, have gained popularity and pervasiveness in recent years. Applications like dialogue and review generation systems benefit from the capabilities of these LLMs. At the same time, personalization has received much attention in the natural language processing community for applications such as machine translation, response generation, slot tagging, etc. However, there has been a lack of exploration into the personalization capabilities of language generated by LLMs, which rely on large quantities of data to train. Generating text that would mirror a user’s preferences and style is a complicated task. Most of the research conducted in this direction mainly uses some additional conditioning on persona-related meta-data. While such data is usually scarce and hard-to-acquire, in many cases, a model can leverage users’ previously written texts for a more personalized generation. In this work, we will explore ways to adapt large language models to generate personalized text for multiple users. Methods like fine-tuning, few-shot learning, and meta-learning are promising candidates for this research. Additionally, we will also investigate data-vs-model quality trade-offs to tackle the problems of data sparsity and a large number of users.

Extreme products and services classification for procurement recommender systems

Ihor Hrysha and Sam Grondahl We consider the problem of recommending relevant suppliers given detailed request context in a procurement setting. The fundamental recommendation problem can be viewed as an extreme multilabel learning problem, where a single query has potentially hundreds of relevant suppliers associated. A complicating factor is that, for most suppliers, we do not have a complete listing of product and service offerings, in contrast with most literature in the space of product search. An additional difficulty is introduced by the fact that queries are generated by users operating within large procurement organizations, each building queries in idiosyncratic but internally consistent ways, and each organizing activities according to unique internal ontologies. The central research question that we aim to address is: can we utilize this vast but inconsistently structured set of product and service data to construct a unified ontology that allows us to derive semantic meaning across users and contexts? We propose several fully and semi-supervised approaches and  benchmark them using a proprietary dataset that includes large-scale procurement data as well as supplier-provided catalogs. Finally, and uniquely, we experimentally validate the performance of our preferred model in a live production setting.

EEG based BrainAGE estimation with Deep Learning

Ostap Hembara and Vasily Vakorin With the aging population, chances to be exposed to neurodegenerative diseases are increasing, hence placing a growing interest in identifying and analyzing those diseases and deviations from a healthy proportion of people. Even though individual aging can vary from many interactions between the genetic, environmental, and behavioral factors there are plenty of studies that prove the existence of healthy brain maturation curves that can be extracted from routine clinical electroencephalographic (EEG) scans. Conventional review of EEG records relies heavily on neurologists to visually inspect complex, noisy, high-dimensional digital data. Thus, making interpretation of scans slow, not entirely reliable, and suboptimal regarding that human health and life are at stake. Taking into account these challenges, EEG data is highly suitable for Deep/Machine Learning approaches. Moreover, end-to-end Deep or Machine Learning can infer useful properties from raw data. Establishing biomarkers of the neuroanatomical aging with the use of novel Deep/Machine Learning frameworks potentially will provide some risk assessments for age-associated neurodegenerative diseases for specific individuals. Apart from that, it is appealing to use novel CNN(convolutional neural nets) or Attention-based architectures for building a predictive pipeline as they proved to work well with time series and sequenced data. This paper explores approaches to predict BrainAge using raw EEG data.

Brain age prediction based on EEG records

Mykola Klymenko and Vasily Vakorin Increasing population and life expectancy lead to rising of neurodegenerative diseases that were not so widespread in the last century. Aging does not affect people uniformly: environment and genetics cause different aging speeds. Brain age is a strong predictor for neurogenerative diseases, physical disabilities, and even mortality. The concept of Brain age is based on the idea that a 50-year-old can “have the heart of a 20-year-old” or that the lungs of a 30-year old smoker “work like they’re 80”. Prediction of brain age with machine learning techniques, including deep learning, based on electroencephalography data has a great potential for personalized healthcare. Currently the field of study is in lack of good benchmark datasets, since the EEG recording approaches vary. Besides the issue with data, researchers have no established approach for other part of the pipeline. In addition, most of the works are based on MRI data. Thus, there is no clear SOTA approach for predicting brain age with EEG. The paper explores the major pitfalls in estimating brain age delta: (1) raw data preprocessing, (2) defining ground truth, (3) measuring/comparing model performance.

Extensive Data Analysis of Metabolomic and Proteomic Data to Diagnose Endometrium Carcinoma

Anastasia Holovenko and Dmytro Fishman Recent advances in high-throughput technologies have enabled researchers to collect vast amounts of heterogeneous data about the patient: genomic, proteomic, metabolomic, transcriptomic data. Hence, integrative analysis, which takes into account several multi-omics data sources, has become a popular tool for disease analysis. Moreover, bioinformaticians are starting to combine well-known statistical methods with powerful machine learning algorithms in order to boost the research in the field and make new discoveries possible. Endometrial carcinoma (EC) is one of the exemplar diseases known to be hard to diagnose, but still, the most common gynecologic malignancy in developed countries and is the fourth most common cancer in women [1, 2]. The aim of the presented work is to analyze semi-targeted proteomic, targeted, and non-targeted metabolomics data for both patients with EC and control samples, and to identify a number of potential biomarkers that could be used to diagnose patients with EC via a simple blood test.

Exploration of Reinforcement Learning agent in procedurally-generated environments with sparse rewards

Oleksii Nahirnyi and Pablo Maldonado Solving sparse-reward environments is among the most considerable challenges for state-of-the-art (SOTA) Reinforcement Learning (RL). A set of methods was lately developed to handle it via enhancing agent’s directed exploration. Also the concept of procedurally-generated environment (PGE) was proposed to more adequately measure agent’s generalization capabilities using randomization. Recent introduction of sparse-rewards into some PGE benchmarks makes sparse-reward task even more challenging. Despite some progress of newly created algorithms in MiniGrid PGEs, the task remains open for research in terms of improving sample complexity. Also not all exploration-based approaches were tested in high-dimensional sparse-reward game environments. In this paper, we analyze and compare current exploration SOTA approaches in sparse-reward settings, including PGEs. Also we will focus on outlining further research directions to improve benchmark results.