Data Science

Data Analysis

A process of analyzing data that uses analytical and statistical tools to examine each component of the data provided and discover useful information.

Apache Mahout

A machine learning framework for creating scalable applications, that can be used by data scientists, mathematicians and statisticians to implement their algorithms. Mahout also offers core algorithms that can be used for classification, clustering and batch based filtering.

Apache Pig

A programming framework used to analyze and transform large data sets. Apache Pig provides a high-level language known as Pig Latin which helps Hadoop developers write data analysis programs. By using various operators provided by the Pig Latin language, programmers can develop their own functions for reading, writing, and processing data.

Apache Spark

An open-source lightning-fast cluster computing technology, designed for fast computation. Has in-memory cluster computing that increases the processing speed of an app.

Apache Flume

A distributed, reliable and high availability service for collecting, accumulating and moving to a centralized repository of large amounts of streaming data from multiple sources. 

Apache Zeppelin

A web-based notebook tool for interactive data analytics. Zeppelin supports different technologies that aid in analytics, such as SQL, Python and Apache Spark. Aside from analytics, it can also perform discovery, ingestion, collaboration and visualisation.

Applied Mathematics

The application of mathematical methods by different fields such as science, engineering, business, computer science, and industry. Thus, applied mathematics is a combination of mathematical science and specialized knowledge.

R language

A programming language and software environment, commonly used for statistical computing within data heavy roles such as data mining, statistics and working with graphics. The language was created as a language similar to S.

Artificial Intelligence

A field of study which determines ways of engineering a computer, a computer-controlled robot, or a software think intelligently, similarly to how humans are capable to think. It includes includes many disciplines, including those that study how human brain works, our learning, decision capabilities and behavior while trying to solve a problem.

Artificial Neural Networks

In machine learning and data mining, ANNs are computer systems that resemble the neural networks that constitute a biological brain. They are designed to learn tasks based on examples. For example, they can learn to identify certain types of images based on analyzing pre-labeled examples that define images as "this is a ..." and "this is not a ...".

NumPy

Fundamental package for scientific computing with Python. NumPy is the core library in Python for array manipulation and thus a large part of numerical and scientific computation based on this language.

Bayesian statistics

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event.

Big Data

A term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Can be analyzed for insights that lead to better decisions and strategic business moves.

Pandas

BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. Pandas provides two additional data containers to Python (Series & DataFrame), as well as useful data processing functionality around handling of missing data, set comparisons, and vectorization.

Keras

A neural networks API. It can run on top of Tensorflow, CNTK or Theano. This library allows you to prototype easy and fast, supports both convolutional networks and recurrent networks and runs seamlessly on CPU and GPU.

Big Data tools

Hadoop, Hive, Pig, Apache HBase, Cassandra, MapReduce (method), Spark.

Caffe

A deep learning framework, best used in image classification and segmentation, where speed, modularity and expression are important. Caffe can be implemented in different scale projects, from academic to industrial, as it can process more than 60M images in a day.

Computer vision

Computer vision is a field of computer science that works on enabling computers to see, identify and process images in the same way that human vision does, and then provide appropriate output.

Convolutional neural network

Convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.

Data Set

A collection of data that is organized into some type of data structure.

Data Extraction

The process of retrieving data from data sources for further data processing or storage.

PyTorch

An open source machine learning library for Python. Provides a seamless path from research prototyping to production deployment. Based on Torch.

Data Mining

Computational process of discovering patterns in large data sets to extract information and transform it into an understandable structure, for further use. Data mining is used for developing and improving marketing strategies, increasing sales and decreasing costs.

Data Model

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.

Data Modeling

Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. 

Development by Synergize.digital

Sign up for updates
straight to your inbox