Schedule of diploma defenses
31st January
Jan. 29, 14:00, Iryney Baran, Safe Augmentation: Learning Task-Specific Transformations from DataAbstract. Data augmentation is widely used as a part of the training process applied to deep learning models, especially in the computer vision domain. Currently, common data augmentation techniques are designed manually. Therefore they require expert knowledge and time. Moreover, optimal augmentations found for one dataset, often do not transfer to other datasets as effectively. We propose a simple novel method that can automatically learn task-specific data augmentation techniques called safe augmentations that do not break the data distribution and can be used to improve model performance. Moreover, we provided a new training pipeline for using safe augmentations for different computer vision tasks. Our method works both with image classification and image segmentation and achieves significantly better accuracy on CIFAR-10, CIFAR-100, SVHN, Tiny ImageNet and Cityscapes datasets comparing to other augmentation techniques.
Jan. 29, 14:40, Oleskii Moskalenko, Convolutional Graph Embeddings for Article Recommendation in WikipediaAbstract. In this master thesis, we were solving the task of a recommendation system to recommend articles to edit to Wikipedia contributors. Our system is built on top of articles’ embeddings constructed by applying Graph Convolutional Network to the graph of Wikipedia articles. We outperformed embeddings generated from the text (via Doc2Vec model) by 47% in Recall and 32% in Mean Reciprocal Rank (MRR) score for English Wikipedia and by 62% in Recall and 41% in MRR for Ukrainian in the offline evaluation conducted on the history of previous users’ editions. With the additional ranking model we were able to achieve total improvement on 68% in Recall and 41% in MRR on English edition of Wikipedia. Graph Neural Networks are deep learning based methods aimed to solve typical Machine Learning tasks such as classification, clusterization or link prediction for structured data – Graphs – via message passing architecture. Due to the explosive success of Convolution Neural Networks (CNN) in the construction of highly expressive representations – similar ideas were recently projected onto GNN. Graph Convolutional Networks are GNNs that likewise CNNs allow sharing weights for convolutional filters across nodes in the graph. They demonstrated especially good performance on the task of Representation Learning via semi-supervised tasks as mentioned above classification or link-prediction.
Jan. 29, 15:40, Tetyana Martynyuk, Multi-task Learning for Image RestorationAbstract. We present an an efficient end-to-end pipeline for general image restoration. The setting has a generic encoder and separate decoders so that our model can benefit from the shared low-level feature representations between the tasks. We also introduce the new architecture for the generator inspired by the feature pyramid networks for dealing with multi-scale degradations. We train the models for solving three particular image restoration problems: deblurring, dehazing, and raindrop removal.
Jan. 29, 16:20, Andrii Kusyi, Color and Style Transfer using Generative Adversarial NetworksAbstract. In this work, we present an end-to-end solution for an image to image color and style transfer using Conditional Generative Adversarial Networks. Nowadays photo editing industry is growing rapidly, and one of the crucial issues is recoloring and restyling of individual objects or areas on images. With a fast advancement of deep segmentation models, getting a precise segmentation mask for an area on a picture is no longer a problem although unsupervised restyling and recoloring of the object with complex patterns is still a challenge. The proposed model is a state-of-the-art regarding visual appearance and provides high structural similarity.
30st January
Jan. 30, 14:00, Hanna Pylyeva, Detection of Difficult for Understanding Medical Words using Deep LearningAbstract. In the medical domain, non-specialized users often require a better understanding of medical information provided by doctors. In this work, we address this need. We introduce novel embeddings received from RNN – FrnnMUTE (French RNN Medical Understandability Text Embeddings) – and show how they help to improve identification of readability and understandability of medical words when applied as features in the classification task, reaching at maximum 87.0 F1 score. We also found out that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the classification model. For generalizability study of different models, we introduce a methodology comprising three crossvalidation scenarios which allow testing classifiers in real-world conditions: when understanding of medical words by new users is unknown or when no information about understandability of new words is provided for the model.
Jan. 30, 14:40, Anton Ponomarchuk, Semi-Supervised Feature Sharing for Efficient Video SegmentationAbstract. In robot sensing and automotive driving domains producing precise semantic segmentation masks for images can help a lot into environment understanding and, as a result, better interaction with it. Usually, for these tasks need to be processed images with more the 2 class numbers. Moreover, it should be done for a short period. Almost all architectures that tries to solve this task used heavyweight end-to-end deep neural network or external blocks like GRU, LSTM, optical flow. In this work, we provide a deep neural network architecture for learning to extract global high-level features and propagate them among the images that describe the same video’s scene, for speeding up image processing. We provide a propagation strategy without any external blocks. Also, we provide loss function for training such network with the dataset, where the vast number of images don’t have segmentation mask.
Jan. 30, 15:20, Kateryna Zorina, Building Segment Based Revenue Prediction for Customer Lifetime Value ModelAbstract. This work presents part of customer lifetime value calculation project. Two tasks are described: dividing customers into segments and forecasting of future revenue. For both components, there are metrics to compare performance between different experiments.
Jan. 30, 16:20, Markiyan Kostiv, Customer Lifetime Value for Credit Limit OptimizationAbstract. Customer lifetime value is an important metric for banks to optimize a credit limit, improve retention and set competitive pricing. The specifics of credit cards market provide challenges with undetermined usage time and positive correlation between the risk and revenue. To address these challenges, we present a customer lifetime value framework in conjunction with risk-adjusted-return for revolving products and credit limit increase and decrease strategy taking into account CLV metrics.
Jan. 30, 17:00, Oleksandr Zaytsev, Aspects of Software Naturalness Through the Generation of Identifier NamesAbstract. Modern-day programming can be viewed as a form of communication between the person who is writing code and the one reading it. Nevertheless, very often developers neglect readability of software, and even well-written code becomes less comprehensive through the course of software evolution. In this work, we study how naturalness of source code written in Pharo allows us to train machine learning models that extract semantic information from method’s body and map it to a short descriptive name. We collect a dataset of methods from 10 biggest projects written in Pharo and build an attention-based sequence to sequence network that generates method names by translating source code into a couple of English words. We evaluate our model on an independent test set and report the precision of over 50%. To our knowledge, this is the first application of machine learning and natural language processing to the source code of Pharo.
31st January
Jan. 31, 14:00, Olha Chernytska, 3D Hand Pose Estimation from Single RGB CameraAbstract. With the increase of popularity VR/AR applications, 3D hand pose estimation task has become very popular. 3D hand pose estimation from single RGB camera has great potential, because RGB cameras are cheap and already available on most mobile devices. In this thesis we work on improving pipeline for 3D hand pose estimation from RGB camera. We dealt with two challenges – sophisticated algorithmic task and absence of good datasets. We trained several convolutional neural networks and showed that direct heatmaps method is the best approach for 2D pose estimation and vector representation – for 3D pose. We demonstrated that adding data augmentations even for synthetic dataset increases performance on real data. For 2D hand pose estimation, we proved that it is possible to train neural network on large scale synthetic dataset and finetune it on small partly labeled real dataset to receive adequate results, even when only small part of keypoint labels is available. With no real 3D labels available, model trained on synthetic data still could correctly predict 3D keypoint locations for simple poses. All code and pre-trained models will be publicly available.
Jan. 31, 14:40, Oleg Shyshkin, Music Generation Powered by Artificial IntelligenceAbstract. Music is an essential part of human life in our days. Despite a long history of the phenomena people still explore it and expand the new horizons. For the last ten years quality of computer-generated music significantly improved. State of the art machine learning models like PerformanceRNN can perform music close to a human level. However, it is hard to deal with a generation of long-term music for the systems. In work, we apply a TCN model to a generation music task and evaluate the quality of the music. We show that the models have a significantly better performance than a baseline model for a long-term music generation task. However, it has own weak points in musicality and time generation. We also discuss possible options to resolve the issues.
Jan. 31, 15:20, Yuriy Kaminskyi, Semantic Segmentation for Visual Indoor LocalizationAbstract. The problem of visual localization and navigation in the 3D environment is a key to solving a vast variety of practical tasks. For example in robotics, where the machine is required to locate itself on the 3D map and steer to a specific location. Another example is a personal assistant in the form of a mobile phone or smart glasses that uses augmented reality techniques to navigate the user seamlessly in large indoor spaces such as airports, hospitals, shopping malls or office buildings. The purpose of this work was to improve the performance of the InLoc localization pipeline that gives state-of-the-art results for indoor visual localization problem. That was done by developing relevant semantic features. Namely, we introduce a variety of features as a result of two different segmentation models: Mask R-CNN and CSAIL. We evaluate the quality of generated features and add the features of the better performing model into the InLoc localization pipeline. With the introduced features we improved the performance of the InLoc localization pipeline and introduced approaches for further research.
Jan. 31, 16:20, Oleh Pidhirnyak, Automatic Plant Counting using Deep Neural NetworksAbstract. Crop counting is a challenging task for today’s agriculture. Increasing demand for food supplies creates a necessity to perform farming activities more efficiently and precisely. Usage of remote sensing images can help to better control the population of the plants grown and forecast future yields, profits and disasters. In this study we offer a series of approaches for plant counting using foreground extraction algorithms, deep neural networks. The study introduces innovative to the field approach of densely distributed plants counting using density map regression with the accuracy of 98.9% on palm oil trees dataset.
Jan. 31, 17:00, Ivan Ilnytskyi, Stable and Efficient Video Segmentation via GAN Predicting Adjacent FrameAbstract. Analyzing video streams represents a huge problem not only in terms of accuracy and speed, but also in consistency of analysis between adjacent frames as videos are consistent due to real-world nature. Jittering effect of predictions is easily noticed by human vision in video semantic segmentation tasks. But its not usually taken into account by design of algorithms as being suited for single image recognition and lack of easy solution via classical filters. This jittering leads to quite negative human assessment of algorithms while being good at accuracy. In addition it may lead to unstable or conflicting behavior of control systems that use computer vision. We propose the methods of efficient video semantic segmentation that take into account video consistency and can be implemented without annotated video dataset. Some methods require annotated photo only dataset, other methods additionally use generative adversarial network trained on relevant video dataset with no supervision. The solution is relevant for cases when the domain does not contain large annotated video datasets, but there are available annotated photo datasets and significantly large unlabeled videos. We show that using semantic segmentation mask of previous frame as a feature for current frame segmentation improves accuracy and consistency. We achieve best results using the network trained with features obtained from GAN and baseline segmentation network.