Machine Learning
K-means clustering
It aims to partition n observations into k cluster. It's an unsupervised k-means algorithm
- PSPP contains k-means, The QUICK CLUSTER command performs k-means clustering on the dataset.
- Weka contains k-means and x-means.
- Octave contains k-means.
- OpenCV contains a k-means implementation.
- Spark MLlib implements a distributed k-means algorithm.
K-NN classifier
k-nearest neighbors algorithm allows classification and regression
A confusion matrix or "matching matrix" is often used as a tool to validate the accuracy of k-NN classification.
Decision trees
Createa a model that predicts the value of a target variable based on several input variables. Classification tree outcome is the class (discrete) to which the data belongs. Regression tree outcome can be considered a real number
Notable decision tree algorithms include:
- ID3 (Iterative Dichotomiser 3)
- C4.5 (successor of ID3)
- CART (Classification And Regression Tree)
- Chi-square automatic interaction detection (CHAID)
- MARS
ID3
Algorithm invented by Ross Quinlan[1] used to generate a decision tree from a dataset.
Naive Bayes classifier
Document classification Here is a worked example of naive Bayesian classification to the document classification problem. Consider the problem of classifying documents by their content, for example into spam and non-spam e-mails.
Apriori algorithm
https://en.wikipedia.org/wiki/Apriori_algorithm association rule learning market basket analysis
Libraries/frameworks
- scikit-learn
- R (an open-source software environment for statistical computing, which includes several CART implementations such as rpart, party and randomForest packages),
- Weka (a free and open-source data-mining suite, contains many decision tree algorithms),
- Orange
- KNIME
- OpenCV
w3schools python ML
https://www.w3schools.com/python/python_ml_getting_started.asp
- matplotlib.pyplot.scatter
- matplotlib.pyplot.hist
- numpy.mean
- numpy.median
- numpy.std
- numpy.var
- numpy.percentile
- numpy.random.uniform
- numpy.random.normal
- numpy.poly1d
- numpy.polyfit
- pandas.read_csv
- scipy.stats.mode
- scipy.stats.linregress
- scipy.cluster.hierarchy.dendrogram
- scipy.cluster.hierarchy.linkage
- sklearn.metrics.r2_score
- sklearn.linear_model
sklearn.preprocessing.StandardScaler
- sklearn.tree
sklearn.tree.DecisionTreeClassifier
- sklearn.metrics.confusion_matrix
- sklearn.metrics.accuracy_score
- sklearn.metrics.precision_score
- sklearn.metrics.recall_score
- sklearn.metrics.f1_score
sklearn.cluster.AgglomerativeClustering
sklearn.linear_model.LogisticRegression
- sklearn.cluster.KMeans
- sklearn.neighbors.KNeighborsClassifier