This website uses cookies
We use cookies to continuously improve your experience on our site. More info.
Data Science |
|
PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. Apache Spark is written in Scala and can be integrated with Python, Scala, Java, R, SQL languages. |
|
An open source machine learning library for Python. Provides a seamless path from research prototyping to production deployment. Based on Torch. |
|
An in-memory, business discovery tool. Provides self-service BI for all business users in organizations. Enables users to conduct direct and indirect searches across all data anywhere in the application. |
|
Quantitative analytics (QA) is a technique that seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. Quantitative analysts aim to represent a given reality in terms of a numerical value. |
|
Quantitative finance is the use of mathematical models and extremely large datasets to analyze financial markets and securities. Common examples include (1) the pricing of derivative securities such as options, and (2) risk management, especially as it relates to portfolio management applications. |
|
An open, simple, and secure data lake platform for machine learning, streaming and ad-hoc analytics. |
|
A supervised learning algorithm that randomly creates and merges multiple decision trees into one “forest.” |
|
RapidMiner is a data science platform used for data preparation, machine learning, and advanced analytics. |
|
Data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing. |
|
A programming language and software environment, commonly used for statistical computing within data heavy roles such as data mining, statistics and working with graphics. The language was created as a language similar to S. |
|
Stands for Recurrent Neural Networks. It is a class of Artificial Neural Networks commonly used with sequential data. RNNs allow retaining information from a previous input in each neuron of the network, which is possible because they have loops due to which information can be passed along between neurons. RNNs can be used in conjunction with NLP methods and in machine translation. |
|
A distributed stream processing framework. It has been developed in conjunction with Apache Kafka. Allows to build stateful applications that process data in real-time from multiple sources. Continuously computes results as data arrives which makes sub-second response times possible. |
|
Stands for Statistical Analysis System. A data analysis tool for data management, data mining, statistical analysis, data warehousing and more. It also provides features for writing reports, developing business models and applications. |
|
Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license. Efficient tool for data mining and data analysis. |
|
Data visualization tool, makes data easy to digest by allowing to crunch large data sets and visualize it with pictures, graphs, charts, maps and more. |
|
Small data refers to datasets that are relatively modest in size, and often manageable without the need for complex or big data processing techniques. These datasets are typically small enough to be processed, analyzed, and interpreted by traditional data analysis methods and tools, making them suitable for smaller-scale projects and applications. |
|
SnapLogic encapsulates integration task complexity with “Snaps” to convert integration tasks and subtasks into modular, pluggable pieces of logic. |
|
A big data transformations tool that can process large volumes of information and doesn't require a special set of skills. The tool assists in cost and risk reduction and data analytics. It can be integrated with other Big Data tools and frameworks, including Amazon Elastic MapReduce and Azure HDInsights. |
|
A software library for advanced Natural Language Processing. Written in Python and Cython. Helps to build applications that process large volumes of text. |
|
Spark MLlib is Apache Spark’s Machine Learning component. One of the major attractions of Spark is the ability to scale computation massively, and that is exactly what you need for machine learning algorithms. |
|
Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. |
|
The set of software programs that are combined together in a single package. The basic application of this program is to analyze scientific data related with the social science. This data can be used for market research, surveys, data mining, etc. |
|
A Java-based tool used for transferring bulk data between Apache Hadoop and structured datastores such as relational databases. |
|
Statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used. |
|
Statistics is a Mathematical Science pertaining to data collection, analysis, interpretation and presentation. Statistics can be used to derive meaningful insights from data by performing mathematical computations on it. |