= Machine Learning = == K-means clustering == * https://en.wikipedia.org/wiki/K-means_clustering It aims to partition n observations into k cluster. It's an unsupervised k-means algorithm * PSPP contains k-means, The QUICK CLUSTER command performs k-means clustering on the dataset. * Weka contains k-means and x-means. * Octave contains k-means. * OpenCV contains a k-means implementation. * Spark MLlib implements a distributed k-means algorithm. == K-NN classifier == * https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm k-nearest neighbors algorithm allows classification and regression A confusion matrix or "matching matrix" is often used as a tool to validate the accuracy of k-NN classification. * https://en.wikipedia.org/wiki/Confusion_matrix == Decision trees == * https://en.wikipedia.org/wiki/Decision_tree_learning Createa a model that predicts the value of a target variable based on several input variables. Classification tree outcome is the class (discrete) to which the data belongs. Regression tree outcome can be considered a real number Notable decision tree algorithms include: * ID3 (Iterative Dichotomiser 3) * C4.5 (successor of ID3) * CART (Classification And Regression Tree) * Chi-square automatic interaction detection (CHAID) * MARS === ID3 === * https://en.wikipedia.org/wiki/ID3_algorithm Algorithm invented by Ross Quinlan[1] used to generate a decision tree from a dataset. == Naive Bayes classifier == * https://en.wikipedia.org/wiki/Naive_Bayes_classifier Document classification Here is a worked example of naive Bayesian classification to the document classification problem. Consider the problem of classifying documents by their content, for example into spam and non-spam e-mails. == Apriori algorithm == https://en.wikipedia.org/wiki/Apriori_algorithm association rule learning market basket analysis == Libraries/frameworks == * scikit-learn * R (an open-source software environment for statistical computing, which includes several CART implementations such as rpart, party and randomForest packages), * Weka (a free and open-source data-mining suite, contains many decision tree algorithms), * Orange * KNIME * OpenCV === w3schools python ML === * https://www.w3schools.com/python/python_ml_getting_started.asp * matplotlib.pyplot.scatter * matplotlib.pyplot.hist * numpy.mean * numpy.median * numpy.std * numpy.var * numpy.percentile * numpy.random.uniform * numpy.random.normal * numpy.poly1d * numpy.polyfit * pandas.read_csv * scipy.stats.mode * scipy.stats.linregress * scipy.cluster.hierarchy.dendrogram * scipy.cluster.hierarchy.linkage * sklearn.metrics.r2_score * sklearn.linear_model * sklearn.preprocessing.StandardScaler * sklearn.tree * sklearn.tree.DecisionTreeClassifier * sklearn.metrics.confusion_matrix * sklearn.metrics.accuracy_score * sklearn.metrics.precision_score * sklearn.metrics.recall_score * sklearn.metrics.f1_score * sklearn.cluster.AgglomerativeClustering * sklearn.linear_model.LogisticRegression * sklearn.cluster.KMeans * sklearn.neighbors.KNeighborsClassifier