At this page, you find the lists of the skills required for the jobs at the various Data Science positions (Big Data Software Engineer / Data Engineer, Data Scientist, Machine Learning Engineer, Data Analyst, NLP Engineer / NLP Data Scientist, CV Engineer, Deep Learning Engineer / Deep Learning Research Engineer) in Ukraine. The job positions were analyzed during the research of the Ukrainian job market in the first half of 2020.
Big Data Software Engineer / Data Engineer
- Linear algebra. Calculus. Statistics and Probability Theory.
- Machine Learning Algorithms: regression, simulation, scenario analysis, modeling, clustering, decision trees, etc.
- Python 3, Pandas, Scikit Learn, Keras, Tensor Flow, Numpy, PyTorch.
- Data visualization.
- Software engineering methodologies, functional programming or object-oriented programming.
- DevOps: containerization and orchestration.
- Classic DBs (relational or object): MySQL, PostgreSQL, RDS.
- NoSQL (documented): MongoDB, Cassandra, HBase, Elasticsearch, Redis, DynamoDB.
- NewSQL (hybrid/in memory): Memsql, VoltDB.
- Query engines: Impala, Presto.
- Cloud platforms (GCP, AWS). Cloud computation (Dataflow, Dataproc). Streaming (Pub/Sub, Kafka). Data storage (BigQuery, Cloud SQL, Cloud Spanner, Firestore, BigTable).
- ETL Concepts / Processes.
- Data Warehouse technologies, Data Lake architecture.
- Data modeling: Bachman diagrams, Chen’s Notation, Object-relational mapping, etc.
- Processing frameworks: Apache Spark (Pyspark/SparkR/sparklyr), Flink, Beam, Kafka streams
- Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
Data Scientist
- Python (PyCharm, Pandas, NumPy, bs4, sklearn, scipy). R.
- Linear algebra. Calculus. Statistics.
- Machine Learning techniques (Decision Trees, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and concepts: regression and classification, clustering, feature selection, feature engineering, the curse of dimensionality, bias-variance tradeoff, SVMs.
- Data visualization.
- Data Mining (Clustering, Frequent Pattern Mining, Outliers Detection).
- Neural Networks and ML Packages (sklearn/sqboost/Tensorflow/Keras, H20).
- Cloud platforms (GCP, AWS). Cloud computation (Dataflow, Dataproc). Streaming (Pub/Sub, Kafka). Data storage (BigQuery, Cloud SQL, Cloud Spanner, Firestore, BigTable).
- Databases: SQL and non-SQL, AWS cloud storage, GDPR data privacy.
- Processing frameworks: Hadoop, Spark.
- Business Intelligence Software (Power BI, Tableau, Qlik, Cognos Analytics).
Machine Learning Engineer
- Computer science fundamentals, algorithms, mathematics, linear algebra, probability, and statistics.
- Python (Pandas, Numpy, Scikit-Learn, Tensorflow, Keras).
- Python visualization tools: matplotlib/seaborn, Plotly.
- Machine Learning techniques (Decision Trees, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and concepts: regression and classification, clustering, feature selection, feature engineering, the curse of dimensionality, bias-variance tradeoff, SVMs.
- Deep Learning: Recurrent Neural Network (LSTM/GRU units), Convolutional Neural Network.
- Machine learning frameworks (TensorFlow, Caffe2, PyTorch, Spark ML, scikit-learn) and ML techniques: GAN, ASR, RL.
- Databases: SQL and non-SQL. Hadoop ecosystem.
- Processing frameworks: Apache Spark (Pyspark/SparkR/sparklyr)
- Cloud platforms (GCP, AWS).
Data Analyst
- Math, Statistics (regression, properties of distributions, statistical tests, and proper usage, etc.) and Probability Theory.
- Statistical programming software (R, Python, SAS, Matlab).
- Predictive analytics (regression models, time-series analysis and forecasting, survival or duration analysis).
- BI tools: Google Data Studio / Microsoft PowerBI / Tableau.
- Classic DBs: MySQL.
- MS Excel.
- A/B testing.
NLP Engineer / NLP Data Scientist
- Python (sklearn, nltk, gensim, spacy, Tensor Flow, PyTorch, Keras) and Python Data Science toolkit: Jupyter Notebook, Pandas, Numpy, Matplotlib/Seaborn, Scipy.
- Databases: SQL and NoSQL (MySQL, MongoDB, PostgreSQL ) .
- NLP libraries: NLTK, SpaCy, Stanford CoreNLP etc.
- NLP techniques for text representation: (TF-IDF, Word2Vec), semantic extraction, data structures and modeling.
- Methods of Information Extraction (NER, terminology extraction, keywords extraction, etc.)
- Machine Learning techniques and concepts (regression, trees, SVM, ensembles) for NLP tasks.
CV Engineer
- Linear Algebra. Geometry. Calculus. Statistics and Probability theory.
- Python3, numpy, pandas, seaborn, scipy.
- Computer vision / image processing libraries such as: OpenCV, Pillow.
- Convolutional Neural Networks (LSTM, inception, residual, GAN).
- Neural network frameworks: TensorFlow, PyTorch.
- Computer vision algorithms and architectures: object detection, segmentation, face recognition, image processing, video processing.
- Real-time CV systems based on Deep Learning.
- Cloud model training (GCP, AWS), Cloud integration, Cloud Platforms.
- Performance metrics in object detection and classification, such as mAP and related.
- Big Data (Hadoop, Spark, Hive).
Deep Learning Engineer / Deep Learning Research Engineer
- Python3: numpy, scikit-learn, pandas, scipy.
- Statistics (regression, properties of distributions, statistical tests, and proper usage, etc.) and probability theory.
- Deep learning frameworks: Tensorflow, PyTorch; MxNet, Caffe, Keras.
- Deep learning architectures: VGG, ResNet, Inception, MobileNet.
- Deepnets, hyperparameter optimization, visualization, interpretation.
- Machine learning models.