Data Science

KSQL

A streaming SQL engine for Apache Kafka. Provides interactive SQL interface for stream processing on Kafka. Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, sessionization, and much more.

Kylin

A distributed analytics engine that provides a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. 

MapReduce

The heart of Apache Hadoop. A software framework for easily writing applications which process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. 

MLlib

A machine learning library of high-quality algorithms for Apache Spark. It supports R, Python, Java and Scala programming languages. It can run on Mesos, Hadoop and Kubernetes, and can extract data from a number of databases, such as Hive, Cassandra, HDFS, and HBase.

MXNet

MXNet is a deep learning library for GPU and cloud computing developers. It is an acceleration library that helps save time on building and deploying large-scale DNNs. It also offers predefined layers and tools for coding your own, for specifying data structure placement and automating calculations.

Oozie

A workflow scheduler system designed to manage Hadoop jobs. Oozie allows to automates commonly performed tasks. By using it, you can describe workflows to be performed on a Hadoop cluster, schedule those workflows to execute under a specified condition, and even combine multiple workflows and schedules together into a package to manage their full lifecycle.

QlikView

An in-memory, business discovery tool. Provides self-service BI for all business users in organizations. Enables users to conduct direct and indirect searches across all data anywhere in the application.

Samza

A distributed stream processing framework. It has been developed in conjunction with Apache Kafka. Allows to build stateful applications that process data in real-time from multiple sources. Continuously computes results as data arrives which makes sub-second response times possible.

SnapLogic eXtreme

A big data transformations tool that can process large volumes of information and doesn't require a special set of skills. The tool assists in cost and risk reduction and data analytics. It can be integrated with other Big Data tools and frameworks, including Amazon Elastic MapReduce and Azure HDInsights.

spaCy

A software library for advanced Natural Language Processing. Written in Python and Cython. Helps to build applications that process large volumes of text.

Sqoop

A Java-based tool used for transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Tableau

A data visualization tool used to create interactive visual analytics in the form of dashboards and generate compelling business insights. These dashboards make it easier for non-technical analysts and end users to convert data into understandable, interactive graphics.

Torch AI

A deep learning library of algorithms based on the LuaJIT scripting language. Torth has a neural network library and comes with machine learning, computer vision, audio, image and video capabilities. It can be viewed as a scientific computer framework.

Vertica

A big data analytics platform with a distributed architecture and columnar compression for reliability and speed. Vertica is designed for use in big data workloads where speed, simplicity and scalability are essential. It offers SQL and geospatial analysis functions, machine learning models and Hadoop integration.

WEKA

Stands for Waikato Environment for Knowledge Analysis. WEKA is machine learning software which provides algorithms and tools for data analysis, visualization, predictive modeling and a user-friendly interface. It also allows the implementation of pre-processing, clustering, regression, association and classification rules.

YARN

The architectural center of Hadoop that allows multiple data processing engines to handle data stored in a single platform. 

Development by Synergize.digital

Sign up for updates
straight to your inbox