Data Science

HBase

A distributed, versioned, non-relational database modeled after Google's Bigtable. It is built on top of HDFS and allows to perform read/write operations on large datasets in real time using Key/Value data. The programming language of HBase is Java. Today HBase is an integral part of the Apache Software Foundation and the Hadoop ecosystem.

HDFS

Hadoop Distributed File System, HDFS for short, is a Java-based distributed file system that allows to store large data sets (files which are in the range of terabytes and petabytes) reliably. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. It is the primary storage used by Hadoop applications. 

HDP

Hortonworks Data Platform is a secure, enterprise-ready open source Hadoop distribution based on a centralized architecture (YARN). HDP enables enterprises to deploy, integrate and work with unprecedented volumes of structured and unstructured data.

Image Processing

The application of a set of techniques and algorithms to a digital image to analyze, enhance, or optimize image characteristics such as sharpness and contrast.

Impala

A modern, massively distributed SQL query engine for Apache Hadoop. It allows you to analyze, transform and combine data from a variety of data sources. With Impala, you can query data, whether stored in HDFS or HBase, in real time. 

TensorFlow

Software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

Jaql

A functional language designed for processing data and JSON queries on big data. It is suitable for any volume of data, both structured and unstructured. Jaql also works on other data formats, such as XML and CSV, and it is compatible with SQL structured data.

Kinesis

Amazon Web Service for processing big data in real time. Enables to get timely information and react quickly to it. Simplifies the process of writing apps that rely on data that must be processed in real time.

KSQL

A streaming SQL engine for Apache Kafka. Provides interactive SQL interface for stream processing on Kafka. Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, sessionization, and much more.

Kylin

A distributed analytics engine that provides a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. 

Logistic regression

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).  Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Luigi

Luigi is a python package to build complex pipelines and it was developed at Spotify. In Luigi, as in Airflow, you can specify workflows as tasks and dependencies between them. The two building blocks of Luigi are Tasks and Targets.

Mathematics

Mathematics is the science of numbers and their operations, interrelations, combinations, generalizations, and abstractions and of space configurations and their structure, measurement, transformations, and generalizations.

MapReduce

The heart of Apache Hadoop. A software framework for easily writing applications which process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. 

Microsoft Excel

Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications. 

MLlib

A machine learning library of high-quality algorithms for Apache Spark. It supports R, Python, Java and Scala programming languages. It can run on Mesos, Hadoop and Kubernetes, and can extract data from a number of databases, such as Hive, Cassandra, HDFS, and HBase.

MXNet

MXNet is a deep learning library for GPU and cloud computing developers. It is an acceleration library that helps save time on building and deploying large-scale DNNs. It also offers predefined layers and tools for coding your own, for specifying data structure placement and automating calculations.

Neural Networks

Neural networks are a set of algorithms, modeled loosely after the human brain that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.

Optimization

Optimization is the process of modifying a system to make some features of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it runs faster, or to run with less memory requirements or other resources, or to consume less energy. 

Pattern Recognition

Pattern recognition is the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. 

Predictive modeling

Predictive modeling is a process that uses data mining and probability to forecast outcomes. 

Oozie

A workflow scheduler system designed to manage Hadoop jobs. Oozie allows to automates commonly performed tasks. By using it, you can describe workflows to be performed on a Hadoop cluster, schedule those workflows to execute under a specified condition, and even combine multiple workflows and schedules together into a package to manage their full lifecycle.

QlikView

An in-memory, business discovery tool. Provides self-service BI for all business users in organizations. Enables users to conduct direct and indirect searches across all data anywhere in the application.

Quantitative analytics

Quantitative analytics (QA) is a technique that seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. Quantitative analysts aim to represent a given reality in terms of a numerical value.

Quantitative finance

Quantitative finance is the use of mathematical models and extremely large datasets to analyze financial markets and securities. Common examples include (1) the pricing of derivative securities such as options, and (2) risk management, especially as it relates to portfolio management applications.

Development by Synergize.digital

Sign up for updates
straight to your inbox