This website uses cookies
We use cookies to continuously improve your experience on our site. More info.
Data Science |
|
Hadoop, Hive, Pig, Apache HBase, Cassandra, MapReduce (method), Spark. |
|
A deep learning framework, best used in image classification and segmentation, where speed, modularity and expression are important. Caffe can be implemented in different scale projects, from academic to industrial, as it can process more than 60M images in a day. |
|
Computer vision is a field of computer science that works on enabling computers to see, identify and process images in the same way that human vision does, and then provide appropriate output. |
|
Convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing. |
|
A collection of data that is organized into some type of data structure. |
|
The process of retrieving data from data sources for further data processing or storage. |
|
An open source machine learning library for Python. Provides a seamless path from research prototyping to production deployment. Based on Torch. |
|
Computational process of discovering patterns in large data sets to extract information and transform it into an understandable structure, for further use. Data mining is used for developing and improving marketing strategies, increasing sales and decreasing costs. |
|
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. |
|
Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. |
|
A specialized format for organizing and storing data. Serves as the basis for abstract data types. General data structure types include the array, the file, the record, the table, the tree, and so on. |
|
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. |
|
A system that pulls together data from many different sources within an organization for reporting and analysis. Used for online analytical processing, which uses complex queries to analyze rather than process transactions. |
|
A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. |
|
An open-source software framework that is used for distributed storage and processing of big data sets across clusters of computers using simple programming models; the Apache project. |
|
Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license. Efficient tool for data mining and data analysis. |
|
A distributed, versioned, non-relational database modeled after Google's Bigtable. It is built on top of HDFS and allows to perform read/write operations on large datasets in real time using Key/Value data. The programming language of HBase is Java. Today HBase is an integral part of the Apache Software Foundation and the Hadoop ecosystem. |
|
Hadoop Distributed File System, HDFS for short, is a Java-based distributed file system that allows to store large data sets (files which are in the range of terabytes and petabytes) reliably. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. It is the primary storage used by Hadoop applications. |
|
Hortonworks Data Platform is a secure, enterprise-ready open source Hadoop distribution based on a centralized architecture (YARN). HDP enables enterprises to deploy, integrate and work with unprecedented volumes of structured and unstructured data. |
|
The application of a set of techniques and algorithms to a digital image to analyze, enhance, or optimize image characteristics such as sharpness and contrast. |
|
A modern, massively distributed SQL query engine for Apache Hadoop. It allows you to analyze, transform and combine data from a variety of data sources. With Impala, you can query data, whether stored in HDFS or HBase, in real time. |
|
Software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. |
|
A functional language designed for processing data and JSON queries on big data. It is suitable for any volume of data, both structured and unstructured. Jaql also works on other data formats, such as XML and CSV, and it is compatible with SQL structured data. |
|
Amazon Web Service for processing big data in real time. Enables to get timely information and react quickly to it. Simplifies the process of writing apps that rely on data that must be processed in real time. |
|
A streaming SQL engine for Apache Kafka. Provides interactive SQL interface for stream processing on Kafka. Supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, sessionization, and much more. |