Classification:
Classification algorithms are algorithms that learn topredict theclass orcategory of an instance of data. The input of a classification algorithm is a set of labeled examples. Each example is represented as a feature vector, and each label is an integer between 0 and k-1, where k is the number of classes. If k=2, the task is called binary classification, whereas if k>2, it is called multi-class classification. The output of a classification algorithm is a classifier, which can be used to predict the label of a new (unlabeled) instance.
Regression:
Regression algorithms are algorithms that learn to predict the value of a real function on an instance of data. Their input is a set of labeled examples. Each example is represented by a feature vector, and each label is a real number. A regression algorithm trains a regressor using the training examples, which can then be used to predict the value of the function on new unlabeled instances.
Ranking:
Ranking is a problem in which the goal is to automatically construct a ranker from a set of labeled examples. This set consists of groups of instances, with some specified between instances in each group. This order is typically induced by giving a numerical or ordinal score or a judgment (e.g. degrees of relevance: "perfect", "good", "fair", "bad") for each instance. The purpose of ranking algorithms is totrain a ranker that can rank new groups of instances for which the score of each instance is unknown.
Clustering:
Clustering algorithms are algorithms that groups a set of items together based on a set of features. The algorithm can be used to cluster unlabeled data or create a model to predict which cluster an instance of data belongs to
Recommendation:
Recommendation is a ML problem that can be phrased like this: "For a given user,predict the ratings this user would give to the items that he/she has not explicitly rated yet", or "For a given user,suggest items that this user will most likely be interested in, given the user's prior history".
The major flavors of recommender systems are:
-
-
Collaborative filtering: predict ratings based on previously observed ratings.
-
Content-based recommendations: predict ratings based on knowledge (features) of the user and items.
-
Mixed: apply both above techniques to provide the best recommendations.
Cross Validation:
Cross Validation is a technique used for training and testing a model when there is only one dataset. The dataset is partitioned into k parts (k is specified by the user) called folds. Each fold, in turn, is used as a test set, where the rest of the data is used as a training set. The result is k separate models. The metrics for each model are reported separately, and so is the average of each metric on all models.