Data Science

Optimization

Optimization is the process of modifying a system to make some features of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it runs faster, or to run with less memory requirements or other resources, or to consume less energy. 

Pattern Recognition

Pattern recognition is the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. 

Predictive modeling

Predictive modeling is a process that uses data mining and probability to forecast outcomes. 

Project Management

Project Management is the process of managing IT or IT-related projects. Technical project managers are critical to the conception, development and execution of the projects. 

Oozie

A workflow scheduler system designed to manage Hadoop jobs. Oozie allows to automates commonly performed tasks. By using it, you can describe workflows to be performed on a Hadoop cluster, schedule those workflows to execute under a specified condition, and even combine multiple workflows and schedules together into a package to manage their full lifecycle.

QlikView

An in-memory, business discovery tool. Provides self-service BI for all business users in organizations. Enables users to conduct direct and indirect searches across all data anywhere in the application.

Quantitative analytics

Quantitative analytics (QA) is a technique that seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. Quantitative analysts aim to represent a given reality in terms of a numerical value.

Quantitative finance

Quantitative finance is the use of mathematical models and extremely large datasets to analyze financial markets and securities. Common examples include (1) the pricing of derivative securities such as options, and (2) risk management, especially as it relates to portfolio management applications.

Random forest

The random forest is a supervised learning algorithm that randomly creates and merges multiple decision trees into one “forest.”

Samza

A distributed stream processing framework. It has been developed in conjunction with Apache Kafka. Allows to build stateful applications that process data in real-time from multiple sources. Continuously computes results as data arrives which makes sub-second response times possible.

SnapLogic eXtreme

A big data transformations tool that can process large volumes of information and doesn't require a special set of skills. The tool assists in cost and risk reduction and data analytics. It can be integrated with other Big Data tools and frameworks, including Amazon Elastic MapReduce and Azure HDInsights.

spaCy

A software library for advanced Natural Language Processing. Written in Python and Cython. Helps to build applications that process large volumes of text.

Spark MLlib

Spark MLlib is Apache Spark’s Machine Learning component. One of the major attractions of Spark is the ability to scale computation massively, and that is exactly what you need for machine learning algorithms. 

Sqoop

A Java-based tool used for transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Statistical modeling

Statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used.

Statistics

Statistics is a Mathematical Science pertaining to data collection, analysis, interpretation and presentation. Statistics can be used to derive meaningful insights from data by performing mathematical computations on it.

Tableau

A data visualization tool used to create interactive visual analytics in the form of dashboards and generate compelling business insights. These dashboards make it easier for non-technical analysts and end users to convert data into understandable, interactive graphics.

Testing

Testing is defined as an activity to check whether the actual results match the expected results

Time series analysis

Time series analysis is a statistical technique that deals with time series data, or trend analysis. There are two main goals of time series analysis: (a) identifying the nature of the phenomenon represented by the sequence of observations, and (b) forecasting (predicting future values of the time series variable). 

Torch AI

A deep learning library of algorithms based on the LuaJIT scripting language. Torth has a neural network library and comes with machine learning, computer vision, audio, image and video capabilities. It can be viewed as a scientific computer framework.

User Research

User Research is used to understand the user’s needs, behaviours, experience and motivations through various qualitative and quantitative methods to inform the process of solving for user’s problems.

Vertica

A big data analytics platform with a distributed architecture and columnar compression for reliability and speed. Vertica is designed for use in big data workloads where speed, simplicity and scalability are essential. It offers SQL and geospatial analysis functions, machine learning models and Hadoop integration.

WEKA

Stands for Waikato Environment for Knowledge Analysis. WEKA is machine learning software which provides algorithms and tools for data analysis, visualization, predictive modeling and a user-friendly interface. It also allows the implementation of pre-processing, clustering, regression, association and classification rules.

YARN

The architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform. 

Development by Synergize.digital

Sign up for updates
straight to your inbox