Editing Unsupervised learning (section)

{{Short description|Paradigm in machine learning that uses no classification labels}}
{{Machine learning|Paradigms}}

'''Unsupervised learning''' is a framework in [[machine learning]] where, in contrast to [[supervised learning]], algorithms learn patterns exclusively from unlabeled data.<ref name="WeiWu">{{Cite web |last=Wu |first=Wei |title=Unsupervised Learning |url=https://na.uni-tuebingen.de/ex/ml_seminar_ss2022/Unsupervised_Learning%20Final.pdf |access-date=26 April 2024 |archive-date=14 April 2024 |archive-url=https://web.archive.org/web/20240414213810/https://na.uni-tuebingen.de/ex/ml_seminar_ss2022/Unsupervised_Learning%20Final.pdf |url-status=live }}</ref> Other frameworks in the spectrum of supervisions include [[Weak supervision|weak- or semi-supervision]], where a small portion of the data is tagged, and [[Self-supervised learning|self-supervision]]. Some researchers consider self-supervised learning a form of unsupervised learning.<ref>{{Cite journal |last1=Liu |first1=Xiao |last2=Zhang |first2=Fanjin |last3=Hou |first3=Zhenyu |last4=Mian |first4=Li |last5=Wang |first5=Zhaoyu |last6=Zhang |first6=Jing |last7=Tang |first7=Jie |date=2021 |title=Self-supervised Learning: Generative or Contrastive |url=https://ieeexplore.ieee.org/document/9462394 |journal=IEEE Transactions on Knowledge and Data Engineering |pages=1 |doi=10.1109/TKDE.2021.3090866 |issn=1041-4347|arxiv=2006.08218 }}</ref>

Conceptually, unsupervised learning divides into the aspects of data, training, algorithm, and downstream applications. Typically, the dataset is harvested cheaply "in the wild", such as massive [[text corpus]] obtained by [[Web crawler|web crawling]], with only minor filtering (such as [[Common Crawl]]). This compares favorably to supervised learning, where the dataset (such as the [[ImageNet|ImageNet1000]]) is typically constructed manually, which is much more expensive.

There were algorithms designed specifically for unsupervised learning, such as [[Cluster analysis|clustering algorithms]] like [[K-means clustering|k-means]], [[dimensionality reduction]] techniques like [[Principal component analysis|principal component analysis (PCA)]], [[Boltzmann machine|Boltzmann machine learning]], and [[autoencoder]]s. After the rise of deep learning, most large-scale unsupervised learning have been done by training general-purpose neural network architectures by [[gradient descent]], adapted to performing unsupervised learning by designing an appropriate training procedure.

Sometimes a trained model can be used as-is, but more often they are modified for downstream applications. For example, the generative pretraining method trains a model to generate a textual dataset, before finetuning it for other applications, such as text classification.<ref name="gpt1paper">{{cite web |last1=Radford |first1=Alec |last2=Narasimhan |first2=Karthik |last3=Salimans |first3=Tim |last4=Sutskever |first4=Ilya |date=11 June 2018 |title=Improving Language Understanding by Generative Pre-Training |url=https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf |url-status=live |archive-url=https://web.archive.org/web/20210126024542/https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf |archive-date=26 January 2021 |access-date=23 January 2021 |publisher=[[OpenAI]] |page=12}}</ref><ref>{{Cite journal |last1=Li |first1=Zhuohan |last2=Wallace |first2=Eric |last3=Shen |first3=Sheng |last4=Lin |first4=Kevin |last5=Keutzer |first5=Kurt |last6=Klein |first6=Dan |last7=Gonzalez |first7=Joey |date=2020-11-21 |title=Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers |url=https://proceedings.mlr.press/v119/li20m.html |journal=Proceedings of the 37th International Conference on Machine Learning |language=en |publisher=PMLR |pages=5958–5968}}</ref> As another example, autoencoders are trained to [[Feature learning|good features]], which can then be used as a module for other models, such as in a [[latent diffusion model]].