Data Science

Big Data tools

Hadoop, Hive, Pig, Apache HBase, Cassandra, MapReduce (method), Spark.

Caffe

A deep learning framework, best used in image classification and segmentation, where speed, modularity and expression are important. Caffe can be implemented in different scale projects, from academic to industrial, as it can process more than 60M images in a day.

Computer vision

Computer vision is a field of computer science that works on enabling computers to see, identify and process images in the same way that human vision does, and then provide appropriate output.

Convolutional neural network

Convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.

Data Set

A collection of data that is organized into some type of data structure.

Data Extraction

The process of retrieving data from data sources for further data processing or storage.

PyTorch

An open source machine learning library for Python. Provides a seamless path from research prototyping to production deployment. Based on Torch.

Data Mining

Computational process of discovering patterns in large data sets to extract information and transform it into an understandable structure, for further use. Data mining is used for developing and improving marketing strategies, increasing sales and decreasing costs.

Data Model

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.

Data Modeling

Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. 

Data Structure

A specialized format for organizing and storing data. Serves as the basis for abstract data types. General data structure types include the array, the file, the record, the table, the tree, and so on.

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

Data Warehousing

A system that pulls together data from many different sources within an organization for reporting and analysis. Used for online analytical processing, which uses complex queries to analyze rather than process transactions.

Decision Tree

A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Hadoop

An open-source software framework that is used for distributed storage and processing of big data sets across clusters of computers using simple programming models; the Apache project. 

scikit-learn

Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license. Efficient tool for data mining and data analysis.

HBase

A distributed, versioned, non-relational database modeled after Google's Bigtable. It is built on top of HDFS and allows to perform read/write operations on large datasets in real time using Key/Value data. The programming language of HBase is Java. Today HBase is an integral part of the Apache Software Foundation and the Hadoop ecosystem.

HDFS

Hadoop Distributed File System, HDFS for short, is a Java-based distributed file system that allows to store large data sets (files which are in the range of terabytes and petabytes) reliably. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. It is the primary storage used by Hadoop applications. 

HDP

Hortonworks Data Platform is a secure, enterprise-ready open source Hadoop distribution based on a centralized architecture (YARN). HDP enables enterprises to deploy, integrate and work with unprecedented volumes of structured and unstructured data.

Image Processing

The application of a set of techniques and algorithms to a digital image to analyze, enhance, or optimize image characteristics such as sharpness and contrast.

Impala

A modern, massively distributed SQL query engine for Apache Hadoop. It allows you to analyze, transform and combine data from a variety of data sources. With Impala, you can query data, whether stored in HDFS or HBase, in real time. 

TensorFlow

Software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

Jaql

A functional language designed for processing data and JSON queries on big data. It is suitable for any volume of data, both structured and unstructured. Jaql also works on other data formats, such as XML and CSV, and it is compatible with SQL structured data.

Kinesis

Amazon Web Service for processing big data in real time. Enables to get timely information and react quickly to it. Simplifies the process of writing apps that rely on data that must be processed in real time.

KSQL

A streaming SQL engine for Apache Kafka. Provides interactive SQL interface for stream processing on Kafka. Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, sessionization, and much more.

Development by Synergize.digital

Sign up for updates
straight to your inbox