Unsupervised learning
Machine learning and data mining 

Machine learning venues


Unsupervised machine learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution – this distinguishes unsupervised learning from supervised learning and reinforcement learning.
Unsupervised learning is closely related to the problem of density estimation in statistics.^{[1]} However, unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data.
Approaches to unsupervised learning include:
 clustering
 anomaly detection
 Neural Networks
 Approaches for learning latent variable models such as
Unsupervised learning in neural networks
The classical example of unsupervised learning in the study of both natural and artificial neural networks is subsumed by Donald Hebb's principle, that is, neurons that fire together wire together. In Hebbian learning, the connection is reinforced irrespective of an error, but is exclusively a function of the coincidence between action potentials between the two neurons. A similar version exists that modifies synaptic weights takes into account the time between the action potentials (spiketimingdependent plasticity or STDP). Hebbian Learning has been hypothesized to underlie a range of cognitive functions, such as pattern recognition and experiential learning.
Among neural network models, the selforganizing map (SOM) and adaptive resonance theory (ART) are commonly used unsupervised learning algorithms. The SOM is a topographic organization in which nearby locations in the map represent inputs with similar properties. The ART model allows the number of clusters to vary with problem size and lets the user control the degree of similarity between members of the same clusters by means of a userdefined constant called the vigilance parameter. ART networks are also used for many pattern recognition tasks, such as automatic target recognition and seismic signal processing. The first version of ART was "ART1", developed by Carpenter and Grossberg (1988).^{[4]}
Method of moments
One of the statistical approaches for unsupervised learning is the method of moments. In the method of moments, the unknown parameters (of interest) in the model are related to the moments of one or more random variables, and thus, these unknown parameters can be estimated given the moments. The moments are usually estimated from samples empirically. The basic moments are first and second order moments. For a random vector, the first order moment is the mean vector, and the second order moment is the covariance matrix (when the mean is zero). Higher order moments are usually represented using tensors which are the generalization of matrices to higher orders as multidimensional arrays.
In particular, the method of moments is shown to be effective in learning the parameters of latent variable models.^{[5]} Latent variable models are statistical models where in addition to the observed variables, a set of latent variables also exists which is not observed. A highly practical example of latent variable models in machine learning is the topic modeling which is a statistical model for generating the words (observed variables) in the document based on the topic (latent variable) of the document. In the topic modeling, the words in the document are generated according to different statistical parameters when the topic of the document is changed. It is shown that method of moments (tensor decomposition techniques) consistently recover the parameters of a large class of latent variable models under some assumptions.^{[5]}
The Expectation–maximization algorithm (EM) is also one of the most practical methods for learning latent variable models. However, it can get stuck in local optima, and it is not guaranteed that the algorithm will converge to the true unknown parameters of the model. Alternatively, for the method of moments, the global convergence is guaranteed under some conditions.^{[5]}
Examples
Behavioralbased detection in network security has become a good application area for a combination of supervised and unsupervisedmachine learning. This is because the amount of data for a human security analyst to analyze is impossible (measured in terabytes per day) to review to find patterns and anomalies. According to Giora Engel, cofounder of LightCyber, in a Dark Reading article, "The great promise machine learning holds for the security industry is its ability to detect advanced and unknown attacks  particularly those leading to data breaches."^{[6]} The basic premise is that a motivated attacker will find their way into a network (generally by compromising a user's computer or network account through phishing, social engineering or malware). The security challenge then becomes finding the attacker by their operational activities, which include reconnaissance, lateral movement, command & control and exfiltration. These activitiesespecially reconnaissance and lateral movementstand in contrast to an established baseline of "normal" or "good" activity for each user and device on the network. The role of machine learning is to create ongoing profiles for users and devices and then find meaningful anomalies. Darktrace also uses some form of unsupervised machine learning to find behavioral anomalies, although the system is not fully selfcontained, and a team of analysts in Cambridge, UK review results in order to create security alerts.
See also
 Cluster analysis
 Anomaly detection
 Expectation–maximization algorithm
 Generative topographic map
 Multivariate analysis
 Radial basis function network
 Hebbian Theory
Notes
 ↑ Jordan, Michael I.; Bishop, Christopher M. (2004). "Neural Networks". In Allen B. Tucker. Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, FL: Chapman & Hall/CRC Press LLC. ISBN 158488360X.
 ↑ Hastie,Trevor,Robert Tibshirani, Friedman,Jerome (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer. pp. 485–586. ISBN 9780387848570.
 ↑ Acharyya, Ranjan (2008); A New Approach for Blind Source Separation of Convolutive Sources, ISBN 9783639077971 (this book focuses on unsupervised learning with Blind Source Separation)
 ↑ Carpenter, G.A. & Grossberg, S. (1988). "The ART of adaptive pattern recognition by a selforganizing neural network" (PDF). Computer. 21: 77–88. doi:10.1109/2.33.
 1 2 3 Anandkumar, Animashree; Ge, Rong; Hsu, Daniel; Kakade, Sham; Telgarsky, Matus (2014). "Tensor Decompositions for Learning Latent Variable Models" (PDF). Journal of Machine Learning Research (JMLR). 15: 2773−2832.
 ↑ Engel, Giora (February 11, 2016). "3 Flavors of Machine Learning: Who, What & Where". Dark Reading. Retrieved 20161121.
Further reading
 Bousquet, O.; von Luxburg, U.; Raetsch, G., eds. (2004). Advanced Lectures on Machine Learning. SpringerVerlag. ISBN 9783540231226.
 Duda, Richard O.; Hart, Peter E.; Stork, David G. (2001). "Unsupervised Learning and Clustering". Pattern classification (2nd ed.). Wiley. ISBN 0471056693.
 Hastie, Trevor; Tibshirani, Robert (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer. pp. 485–586. doi:10.1007/9780387848587_14. ISBN 9780387848570.
 Hinton, Geoffrey; Sejnowski, Terrence J., eds. (1999). Unsupervised Learning: Foundations of Neural Computation. MIT Press. ISBN 026258168X. (This book focuses on unsupervised learning in neural networks)