Data Science

Random Forest

A supervised learning algorithm that randomly creates and merges multiple decision trees into one “forest.”

Samza

A distributed stream processing framework. It has been developed in conjunction with Apache Kafka. Allows to build stateful applications that process data in real-time from multiple sources. Continuously computes results as data arrives which makes sub-second response times possible.

SnapLogic eXtreme

A big data transformations tool that can process large volumes of information and doesn't require a special set of skills. The tool assists in cost and risk reduction and data analytics. It can be integrated with other Big Data tools and frameworks, including Amazon Elastic MapReduce and Azure HDInsights.

spaCy

A software library for advanced Natural Language Processing. Written in Python and Cython. Helps to build applications that process large volumes of text.

Spark MLlib

Spark MLlib is Apache Spark’s Machine Learning component. One of the major attractions of Spark is the ability to scale computation massively, and that is exactly what you need for machine learning algorithms. 

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

Sqoop

A Java-based tool used for transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Statistical modeling

Statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used.

Statistics

Statistics is a Mathematical Science pertaining to data collection, analysis, interpretation and presentation. Statistics can be used to derive meaningful insights from data by performing mathematical computations on it.

Tableau

A data visualization tool used to create interactive visual analytics in the form of dashboards and generate compelling business insights. These dashboards make it easier for non-technical analysts and end users to convert data into understandable, interactive graphics.

Time Series Analysis

A statistical technique that deals with time series data, or trend analysis. There are two main goals of time series analysis: (a) identifying the nature of the phenomenon represented by the sequence of observations, and (b) forecasting (predicting future values of the time series variable). 

Torch AI

A deep learning library of algorithms based on the LuaJIT scripting language. Torth has a neural network library and comes with machine learning, computer vision, audio, image and video capabilities. It can be viewed as a scientific computer framework.

Vertica

A big data analytics platform with a distributed architecture and columnar compression for reliability and speed. Vertica is designed for use in big data workloads where speed, simplicity and scalability are essential. It offers SQL and geospatial analysis functions, machine learning models and Hadoop integration.

WEKA

Stands for Waikato Environment for Knowledge Analysis. WEKA is machine learning software which provides algorithms and tools for data analysis, visualization, predictive modeling and a user-friendly interface. It also allows the implementation of pre-processing, clustering, regression, association and classification rules.

YARN

The architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform. 

Machine Learning

A field of study which explore that capability of a computer to learn, akin to a human brain. It also studies the construction of algorithms that can learn from and make predictions on data. Basically, it’s intention is to teach a computer how to learn at its own volition, and not because it has been programmed to.

SAS

Stands for Statistical Analysis System. A data analysis tool for data management, data mining, statistical analysis, data warehousing and more. It also provides features for writing reports, developing business models and applications.

Data Science

An interdisciplinary study of information sources, what the information represents and ways of turning it into a valuable resource when creating business and IT strategies. It uses methods and techniques of statistics and data analysis to analyze and understand a phenomenon.

Deep Learning

A branch of machine learning based on a specific set of algorithms. These algorithms are called artificial neural networks and were designed to mimic a human brain’s structure and function. The algorithms can learn different levels of representation (abstraction) through classification and pattern analysis, among other methods.

NLP

Stands for Natural Language Processing. A field of Computer Science and Artificial Intelligence which studies ways for computers to analyze, understand human language and derive meaning from it, especially the programming aspect of it. The goal of NLP is to facilitate computer-human interactions.

RNN

Stands for Recurrent Neural Networks. It is a class of Artificial Neural Networks commonly used with sequential data. RNNs allow retaining information from a previous input in each neuron of the network, which is possible because they have loops due to which information can be passed along between neurons. RNNs can be used in conjunction with NLP methods and in machine translation.

Development by Synergize.digital

Sign up for updates
straight to your inbox