This website uses cookies
We use cookies to continuously improve your experience on our site. More info.
Data Science |
|
Large Language Models (LLMs) are a class of artificial intelligence models used for natural language processing tasks. They are characterized by having a large number of parameters, enabling them to generate human-like text, understand context, and perform various language-related tasks, such as text generation, translation, summarization, and more. |
|
Tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: find the names of people, organizations or locations in news, automatically classify Twitter search results into categories, suggest correct spellings of queries. |
|
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. |
|
Long short-term memory is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single data points (such as images), but also entire sequences of data (such as speech or video). |
|
Luigi is a python package to build complex pipelines and it was developed at Spotify. In Luigi, as in Airflow, you can specify workflows as tasks and dependencies between them. The two building blocks of Luigi are Tasks and Targets. |
|
A field of study which explore that capability of a computer to learn, akin to a human brain. It also studies the construction of algorithms that can learn from and make predictions on data. Basically, it’s intention is to teach a computer how to learn at its own volition, and not because it has been programmed to. |
|
The heart of Apache Hadoop. A software framework for easily writing applications which process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. |
|
Mathematics is the science of numbers and their operations, interrelations, combinations, generalizations, and abstractions and of space configurations and their structure, measurement, transformations, and generalizations. |
|
Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications. |
|
Minitab is a statistical software package used for quality improvement and statistical data analysis. It provides a range of tools for data analysis, statistical graphics, and hypothesis testing, making it a valuable tool for professionals working in quality control and process improvement. |
|
A machine learning framework for the Julia programming language. MLBase.jl provides a set of tools and utilities for building machine learning models. |
|
A machine learning library of high-quality algorithms for Apache Spark. It supports R, Python, Java and Scala programming languages. It can run on Mesos, Hadoop and Kubernetes, and can extract data from a number of databases, such as Hive, Cassandra, HDFS, and HBase. |
|
MLOps, short for Machine Learning Operations, is a set of practices and tools that combine machine learning (ML) and software engineering to automate and streamline the deployment, management, and monitoring of machine learning models in production environments. It aims to bridge the gap between data science and IT operations to ensure that machine learning models are effectively integrated into real-world applications. |
|
MXNet is a deep learning library for GPU and cloud computing developers. It is an acceleration library that helps save time on building and deploying large-scale DNNs. It also offers predefined layers and tools for coding your own, for specifying data structure placement and automating calculations. |
|
Neural networks are a set of algorithms, modeled loosely after the human brain that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input. |
|
Stands for Natural Language Processing. A field of Computer Science and Artificial Intelligence which studies ways for computers to analyze, understand human language and derive meaning from it, especially the programming aspect of it. The goal of NLP is to facilitate computer-human interactions. |
|
A platform used for building Python programs that work with human language data for applying in statistical natural language processing. |
|
Fundamental package for scientific computing with Python. NumPy is the core library in Python for array manipulation and thus a large part of numerical and scientific computation based on this language. |
|
A workflow scheduler system designed to manage Hadoop jobs. Oozie allows to automates commonly performed tasks. By using it, you can describe workflows to be performed on a Hadoop cluster, schedule those workflows to execute under a specified condition, and even combine multiple workflows and schedules together into a package to manage their full lifecycle. |
|
A machine learning-based toolkit for the processing of natural language text. The toolkit provides support for the most common NLP tasks, such as language detection, parsing, tokenization, sentence segmentation, and more. |
|
Optimization is the process of modifying a system to make some features of it work more efficiently or use fewer resources. For instance, a computer program may be optimized so that it runs faster, or to run with less memory requirements or other resources, or to consume less energy. |
|
BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. Pandas provides two additional data containers to Python (Series & DataFrame), as well as useful data processing functionality around handling of missing data, set comparisons, and vectorization. |
|
Pattern recognition is the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. |
|
Pentaho is business intelligence software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load capabilities. |
|
The process of using data and statistical techniques to forecast future outcomes. |