Data Science

Data structure

A specialized format for organizing and storing data. Serves as the basis for abstract data types. General data structure types include the array, the file, the record, the table, the tree, and so on.

Data visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Data Warehousing

A system that pulls together data from many different sources within an organization for reporting and analysis. Used for online analytical processing, which uses complex queries to analyze rather than process transactions.

Decision tree

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.


The management of money and other assets.


An open-source software framework that is used for distributed storage and processing of big data sets across clusters of computers using simple programming models; the Apache project. 


A distributed, versioned, non-relational database modeled after Google's Bigtable. It is built on top of HDFS and allows to perform read/write operations on large datasets in real time using Key/Value data. The programming language of HBase is Java. Today HBase is an integral part of the Apache Software Foundation and the Hadoop ecosystem.


Hadoop Distributed File System, HDFS for short, is a Java-based distributed file system that allows to store large data sets (files which are in the range of terabytes and petabytes) reliably. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. It is the primary storage used by Hadoop applications. 


Hortonworks Data Platform is a secure, enterprise-ready open source Hadoop distribution based on a centralized architecture (YARN). HDP enables enterprises to deploy, integrate and work with unprecedented volumes of structured and unstructured data.

Image processing

Digital image processing is the application of a set of techniques and algorithms to a digital image to analyze, enhance, or optimize image characteristics such as sharpness and contrast.


A modern, massively distributed SQL query engine for Apache Hadoop. It allows you to analyze, transform and combine data from a variety of data sources. With Impala, you can query data, whether stored in HDFS or HBase, in real time. 


A functional language designed for processing data and JSON queries on big data. It is suitable for any volume of data, both structured and unstructured. Jaql also works on other data formats, such as XML and CSV, and it is compatible with SQL structured data.


Amazon Web Service for processing big data in real time. Enables to get timely information and react quickly to it. Simplifies the process of writing apps that rely on data that must be processed in real time.


A streaming SQL engine for Apache Kafka. Provides interactive SQL interface for stream processing on Kafka. Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, sessionization, and much more.


A distributed analytics engine that provides a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. 

Logistic regression

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).  Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.


Mathematics is the science of numbers and their operations, interrelations, combinations, generalizations, and abstractions and of space configurations and their structure, measurement, transformations, and generalizations.


The heart of Apache Hadoop. A software framework for easily writing applications which process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. 

Microsoft Excel

Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications. 


A machine learning library of high-quality algorithms for Apache Spark. It supports R, Python, Java and Scala programming languages. It can run on Mesos, Hadoop and Kubernetes, and can extract data from a number of databases, such as Hive, Cassandra, HDFS, and HBase.


MXNet is a deep learning library for GPU and cloud computing developers. It is an acceleration library that helps save time on building and deploying large-scale DNNs. It also offers predefined layers and tools for coding your own, for specifying data structure placement and automating calculations.

Neural networks

Neural networks(NN) are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.


Optimization is the process of modifying a system to make some features of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it runs faster, or to run with less memory requirements or other resources, or to consume less energy. 

Pattern Recognition

Pattern recognition is the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. 

Predictive modeling

Predictive modeling is a process that uses data mining and probability to forecast outcomes. 

Development by

Sign up for updates
straight to your inbox