CODΞ - maximizing your sourcing capabilities

CODΞ supports GlossaryTech to keep it free for the community

Data Science

PySpark

PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. Apache Spark is written in Scala and can be integrated with Python, Scala, Java, R, SQL languages. 

PyTorch

An open source machine learning library for Python. Provides a seamless path from research prototyping to production deployment. Based on Torch.

QlikView

An in-memory, business discovery tool. Provides self-service BI for all business users in organizations. Enables users to conduct direct and indirect searches across all data anywhere in the application.

Quantitative analytics

Quantitative analytics (QA) is a technique that seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. Quantitative analysts aim to represent a given reality in terms of a numerical value.

Quantitative finance

Quantitative finance is the use of mathematical models and extremely large datasets to analyze financial markets and securities. Common examples include (1) the pricing of derivative securities such as options, and (2) risk management, especially as it relates to portfolio management applications.

Qubole

An open, simple, and secure data lake platform for machine learning, streaming and ad-hoc analytics.

Random Forest

A supervised learning algorithm that randomly creates and merges multiple decision trees into one “forest.”

RapidMiner

RapidMiner is a data science platform used for data preparation, machine learning, and advanced analytics.

Raw Data

Data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing.

R language

A programming language and software environment, commonly used for statistical computing within data heavy roles such as data mining, statistics and working with graphics. The language was created as a language similar to S.

RNN

Stands for Recurrent Neural Networks. It is a class of Artificial Neural Networks commonly used with sequential data. RNNs allow retaining information from a previous input in each neuron of the network, which is possible because they have loops due to which information can be passed along between neurons. RNNs can be used in conjunction with NLP methods and in machine translation.

Samza

A distributed stream processing framework. It has been developed in conjunction with Apache Kafka. Allows to build stateful applications that process data in real-time from multiple sources. Continuously computes results as data arrives which makes sub-second response times possible.

SAS

Stands for Statistical Analysis System. A data analysis tool for data management, data mining, statistical analysis, data warehousing and more. It also provides features for writing reports, developing business models and applications.

scikit-learn

Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license. Efficient tool for data mining and data analysis.

Sisense

Data visualization tool, makes data easy to digest by allowing to crunch large data sets and visualize it with pictures, graphs, charts, maps and more.

Small data

Small data refers to datasets that are relatively modest in size, and often manageable without the need for complex or big data processing techniques. These datasets are typically small enough to be processed, analyzed, and interpreted by traditional data analysis methods and tools, making them suitable for smaller-scale projects and applications.

SnapLogic

SnapLogic encapsulates integration task complexity with “Snaps” to convert integration tasks and subtasks into modular, pluggable pieces of logic.

SnapLogic eXtreme

A big data transformations tool that can process large volumes of information and doesn't require a special set of skills. The tool assists in cost and risk reduction and data analytics. It can be integrated with other Big Data tools and frameworks, including Amazon Elastic MapReduce and Azure HDInsights.

spaCy

A software library for advanced Natural Language Processing. Written in Python and Cython. Helps to build applications that process large volumes of text.

Spark MLlib

Spark MLlib is Apache Spark’s Machine Learning component. One of the major attractions of Spark is the ability to scale computation massively, and that is exactly what you need for machine learning algorithms. 

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

SPSS

The set of software programs that are combined together in a single package. The basic application of this program is to analyze scientific data related with the social science. This data can be used for market research, surveys, data mining, etc.

Sqoop

A Java-based tool used for transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

Statistical modeling

Statistical modeling is a simplified, mathematically-formalized way to approximate reality (i.e. what generates your data) and optionally to make predictions from this approximation. The statistical model is the mathematical equation that is used.

Statistics

Statistics is a Mathematical Science pertaining to data collection, analysis, interpretation and presentation. Statistics can be used to derive meaningful insights from data by performing mathematical computations on it.

Development by Synergize.digital

Sign up for updates
straight to your inbox