This website uses cookies
We use cookies to continuously improve your experience on our site. More info.
Data Science |
|
Amazon Elastic MapReduce. A tool for big data processing and analysis. Provides a managed Hadoop and Apache Spark framework that makes it easier to process vast amounts of data across dynamically scalable Amazon EC2 instances. |
|
It is a software project for easier Hadoop management. It enables system administrators to provide, manage and monitor Hadoop clusters. It also provides features for installation and configuration of Hadoop services and a dashboard for monitoring cluster status. |
|
Analysis is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. |
|
A data serialization system that relies on schemas for reading data. Using a schema helps cut back on serialization size. Avro also provides data structures, remote procedure call, compact binary data format and integration with dynamic languages. |
|
The Microsoft Cognitive Toolkit (formerly CNTK) is a back-end framework used for deep learning. |
|
Tableau's in-memory data engine technology, designed for fast data ingest and analytical query processing on large or complex data sets. |
|
A Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. |
|
A unified model for defining both batch and streaming data-parallel processing pipelines. Provides a general approach to expressing embarrassingly parallel data processing pipelines and supports end users, SDK writers, and runner writers. |
|
A general-purpose data processing platform and a top-level Apache project. It provides efficient, fast, accurate, and fault tolerant handling of massive streams of events. Flink is usable for dozens of big data scenarios and capable of running in standalone mode. Its defining feature is its ability to process streaming data in real time. |
|
A data warehouse infrastructure built on top of Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability for querying and analysis of large datasets stored in Hadoop files. |
|
A process of analyzing data that uses analytical and statistical tools to examine each component of the data provided and discover useful information. |
|
A machine learning framework for creating scalable applications, that can be used by data scientists, mathematicians and statisticians to implement their algorithms. Mahout also offers core algorithms that can be used for classification, clustering and batch based filtering. |
|
A programming framework used to analyze and transform large data sets. Apache Pig provides a high-level language known as Pig Latin which helps Hadoop developers write data analysis programs. By using various operators provided by the Pig Latin language, programmers can develop their own functions for reading, writing, and processing data. |
|
An open-source lightning-fast cluster computing technology, designed for fast computation. Has in-memory cluster computing that increases the processing speed of an app. |
|
A distributed, reliable and high availability service for collecting, accumulating and moving to a centralized repository of large amounts of streaming data from multiple sources. |
|
A web-based notebook tool for interactive data analytics. Zeppelin supports different technologies that aid in analytics, such as SQL, Python and Apache Spark. Aside from analytics, it can also perform discovery, ingestion, collaboration and visualisation. |
|
The application of mathematical methods by different fields such as science, engineering, business, computer science, and industry. Thus, applied mathematics is a combination of mathematical science and specialized knowledge. |
|
A programming language and software environment, commonly used for statistical computing within data heavy roles such as data mining, statistics and working with graphics. The language was created as a language similar to S. |
|
A field of study which determines ways of engineering a computer, a computer-controlled robot, or a software think intelligently, similarly to how humans are capable to think. It includes includes many disciplines, including those that study how human brain works, our learning, decision capabilities and behavior while trying to solve a problem. |
|
In machine learning and data mining, ANNs are computer systems that resemble the neural networks that constitute a biological brain. They are designed to learn tasks based on examples. For example, they can learn to identify certain types of images based on analyzing pre-labeled examples that define images as "this is a ..." and "this is not a ...". |
|
Fundamental package for scientific computing with Python. NumPy is the core library in Python for array manipulation and thus a large part of numerical and scientific computation based on this language. |
|
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. |
|
A term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Can be analyzed for insights that lead to better decisions and strategic business moves. |
|
BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. Pandas provides two additional data containers to Python (Series & DataFrame), as well as useful data processing functionality around handling of missing data, set comparisons, and vectorization. |
|
A neural networks API. It can run on top of Tensorflow, CNTK or Theano. This library allows you to prototype easy and fast, supports both convolutional networks and recurrent networks and runs seamlessly on CPU and GPU. |