潜在変数

潜在変数（せんざいへんすう、英: latent variable^{[注釈 1]}）は、統計学において、直接は観察されないが（数理モデルを通して）、観測（直接測定）された他の変数から推定される変数を意味する。観測変数（英: observed variable）と対比される。

観測変数を潜在変数の観点から説明することを目的とした数理モデルは、潜在変数モデルと呼ばれる。潜在変数モデルは、心理学、人口統計学、経済学、工学、医学、物理学、機械学習/人工知能、バイオインフォマティクス、ケモメトリックス、自然言語処理、計量経済学、管理、社会科学など、多くの分野で使用されている。

潜在変数は、現実の側面に対応する場合がある。原理的には測定できるが実際には観測できない状況では、変数に意味があるが観測できないという事実から、「隠れた変数」という用語が一般的に使用される。カテゴリ・行動・精神状態・データ構造などの抽象的な概念に対応する潜在変数では、「仮想変数」または「仮想構成」という用語が用いられることがある。

潜在変数を使用することは、データの次元を減らすのに役立つ。多くの観測可能な変数をモデルに集約して、基礎となる概念を表すことができるため、データを理解しやすくなる。この意味で、それらは科学理論と同様の機能を果たす。同時に、潜在変数は、現実世界の観測可能な（サブシンボリックな）データをモデル化された世界のシンボリックなデータにリンクする。

定式化する際は潜在変数を示すシンボルとして $z$ がしばしば用いられる^[1]。

例

心理学

因子分析法によって作成された潜在変数は、一般に「共有」分散、または変数が一緒に「移動」する程度を表す。相関関係のない変数は、共通因子モデルに基づく潜在構造を生成できない。 ^[3]

「ビッグファイブの性格特性」は、因子分析を使用して推測される
外向性^[4]
空間能力
知恵 – 知恵を評価するための手段として、知恵に関連するパフォーマンスと潜在変数の測定とがある ^[5]
スピアマンの g 、または心理測定学の一般的な知性要因^[6]

経済

経済学の分野からの潜在変数の例には、生活の質、ビジネスの信頼、士気、幸福、保守主義が含まれる。これらの変数はすべて、直接測定できない。しかし、これらの潜在変数を他の観測可能な変数にリンクすると、潜在変数の値は、観測可能な変数の測定値から推測することができる。生活の質は、直接測定できない潜在変数であるため、観察可能な変数を使用して生活の質を推測する。生活の質を測定するための観察可能な変数には、富、雇用、環境、心身の健康、教育、レクリエーションと余暇、および社会的帰属が含まれる。

薬

潜在変数の方法論は、医学の多くの分野で使用されています。潜在変数アプローチに自然に役立つ問題のクラスは、時間スケール（参加者の年齢や研究ベースラインからの時間など）が研究対象の特性と同期していない縦断的研究です。このような研究では、研究対象の特性と同期する観察されていない時間スケールを、潜在変数を使用して観察された時間スケールの変換としてモデル化できる。この例として、疾患進行モデリングおよび成長のモデリングなどがある。

潜在変数モデル

潜在変数モデル（英語版）（英: latent variable models）は観測変数と潜在変数の同時分布で表現された確率モデルの総称である^[7]。

潜在変数モデルでは観測変数 $x$ の分布 $p^{*}(x)$ を潜在変数 $z$ とモデルパラメータ $\theta$ を用いた同時分布 $p_{\theta }(x,z)$ で表現する。このとき同時分布を周辺化すると以下が成立している：

p_{\theta }(x)=\int p_{\theta }(x,z)dz

潜在変数に関して周辺化された尤度 $p(x|\theta )$ と見做せるため、これは $\theta$ の関数として周辺尤度（英: marginal likelihood）あるいはエビデンス（英: model evidence）と呼ばれる^[8]。

モデルの例には潜在変数を含んだベイジアンネットワークがある。すなわち同時分布の因数分解により条件付き確率モデルの積 $p_{\theta }(x,z)=p_{\theta }(x|z)p_{\theta }(z)$ としてモデル化する^[9]。このとき $p_{\theta }(z)$ は「 $z$ の事前分布」としばしば呼ばれる（観測値で条件付けられていないため prior）^[10]。

潜在変数モデルのパラメータ推定には課題とそれに対応する手法が存在する。潜在変数はモデル内部に存在する変数であり、その定義から観測値として与えられない。ゆえに同時分布 $p_{\theta }(x,z)$ ではなく周辺尤度 $p_{\theta }(x)$ を介して最適化することになる。しかし最尤推定をする場合、周辺化で登場する積分が intractable であるため尤度の解析解も効率のいい推定器も得られない^[11]。

ベイジアンネットワークでモデル化するとき各条件付き確率モデルは tractable なものを採用するため同時分布は tractable になる。ゆえにベイズの定理より、intractable である原因は周辺尤度 $p_{\theta }(x)$ および事後分布 $p_{\theta }(z|x)$ にあることがわかる^[12]：

p_{\theta }(z|x)={\frac {p_{\theta }(x,z)}{p_{\theta }(x)}}

これを解決する手法にはEMアルゴリズム、オートエンコーディング変分ベイズアルゴリズムなどがある。

深層潜在変数モデル

深層潜在変数モデル（英: Deep Latent Variable Models、DLVM）はベイジアンネットワークの条件付け入力をニューラルネットワークで変換する潜在変数モデルの一種である^[13]。その同時分布は以下の式で表現される：

p_{\theta }(z_{0}=x,z_{1},...,z_{N})=\prod _{i=0}^{N}p_{\theta }(z_{i}|pa(z_{i}))=\prod _{i=0}^{N}p_{\theta }(z_{i};\ \eta =NeuralNet_{\theta }(pa(z_{i})))

DLVMでは万能近似能力をもつニューラルネットワークを用いて潜在変数を変換するため、各条件付き確率モデル $p_{\theta }(z_{i}|pa(z_{i}))$ に単純な分布を用いても複雑な周辺分布 $p_{\theta }(x)$ を表現できる^[14]。

DLVMは潜在変数モデルであるため、パラメータ推定に単純な最尤推定を適用できない。DLVMの学習を可能にする手法の1つに変分オートエンコーダーが存在する。

潜在変数の推測

潜在変数を利用し、潜在変数の存在下で推論を可能にするさまざまなモデルクラスと方法論が存在する。次のようなモデルがある：

次のような分析や推論がある：

ベイズのアルゴリズムと方法

ベイズ統計は、潜在変数を推測するためによく使用される。

潜在的ディリクレの割り当て
中華料理店プロセスは、潜在的なカテゴリへのオブジェクトの割り当てに関する事前分布を提供するためによく使用される。
インドのビュッフェプロセスは、オブジェクトへの潜在的なバイナリ機能の割り当てに関する事前分布を提供するためによく使用される。

脚注

[脚注の使い方]

注釈

^ 「隠された」を表すラテン語: lateo から

出典

^ "We typically use $z$ to denote such latent variables." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ “A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data”. Pattern Recognition Letters 38: 1-7. (2014). doi:10.1016/j.patrec.2013.10.018.
^ Tabachnick, B.G.; Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. ISBN 978-0-321-05677-1 ^{[要ページ番号]}
^ Borsboom, D.; Mellenbergh, G.J.; van Heerden, J. (2003). “The Theoretical Status of Latent Variables”. Psychological Review 110 (2): 203–219. doi:10.1037/0033-295X.110.2.203. PMID 12747522 2008年4月8日閲覧。.
^ Greene, Jeffrey A.; Brown, Scott C. (2009). “The Wisdom Development Scale: Further Validity Investigations”. International Journal of Aging and Human Development 68 (4): 289–320 (at p. 291). doi:10.2190/AG.68.4.b. PMID 19711618.
^ Spearman, C. (1904). “"General Intelligence," Objectively Determined and Measured”. The American Journal of Psychology 15 (2): 201–292. doi:10.2307/1412107. JSTOR 1412107.
^ "a latent variable model $p_{\theta }(x,z)$ " Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "This is also called the (single datapoint) marginal likelihood or the model evidence, when taken as a function of θ." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "Perhaps the simplest, and most common, DLVM is one that is specified as factorization" Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "The distribution $p(z)$ is often called the prior distribution over $z$ , since it is not conditioned on any observations." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "This is due to the integral ... for computing the marginal likelihood ..., not having an analytic solution or efficient estimator." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "The intractability of pθ(x), is related to the intractability of the posterior distribution pθ(z|x). ... Since pθ(x, z) is tractable to compute, a tractable marginal likelihood pθ(x) leads to a tractable posterior pθ(z|x), and vice versa. Both are intractable in DLVMs." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "We use the term deep latent variable model (DLVM) to denote a latent variable model pθ(x, z) whose distributions are parameterized by neural networks." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.
^ "One important advantage of DLVMs, is that even when each factor (prior or conditional distribution) in the directed model is relatively simple (such as conditional Gaussian), the marginal distribution pθ(x) can be very complex"

参考文献

Kmenta, Jan (1986). “Latent Variables”. Elements of Econometrics (Second ed.). New York: Macmillan. pp. 581–587. ISBN 978-0-02-365070-3

[1] 「隠された」を表すラテン語: lateo から

[2] "We typically use $z$ to denote such latent variables." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[Raket_et_al_20142-3] “A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data”. Pattern Recognition Letters 38: 1-7. (2014). doi:10.1016/j.patrec.2013.10.018.

[4] Tabachnick, B.G.; Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. ISBN 978-0-321-05677-1 ^{[要ページ番号]}

[status-5] Borsboom, D.; Mellenbergh, G.J.; van Heerden, J. (2003). “The Theoretical Status of Latent Variables”. Psychological Review 110 (2): 203–219. doi:10.1037/0033-295X.110.2.203. PMID 12747522 2008年4月8日閲覧。.

[wisdom-6] Greene, Jeffrey A.; Brown, Scott C. (2009). “The Wisdom Development Scale: Further Validity Investigations”. International Journal of Aging and Human Development 68 (4): 289–320 (at p. 291). doi:10.2190/AG.68.4.b. PMID 19711618.

[7] Spearman, C. (1904). “"General Intelligence," Objectively Determined and Measured”. The American Journal of Psychology 15 (2): 201–292. doi:10.2307/1412107. JSTOR 1412107.

[8] "a latent variable model $p_{\theta }(x,z)$ " Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[9] "This is also called the (single datapoint) marginal likelihood or the model evidence, when taken as a function of θ." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[10] "Perhaps the simplest, and most common, DLVM is one that is specified as factorization" Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[11] "The distribution $p(z)$ is often called the prior distribution over $z$ , since it is not conditioned on any observations." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[12] "This is due to the integral ... for computing the marginal likelihood ..., not having an analytic solution or efficient estimator." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[13] "The intractability of pθ(x), is related to the intractability of the posterior distribution pθ(z|x). ... Since pθ(x, z) is tractable to compute, a tractable marginal likelihood pθ(x) leads to a tractable posterior pθ(z|x), and vice versa. Both are intractable in DLVMs." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[14] "We use the term deep latent variable model (DLVM) to denote a latent variable model pθ(x, z) whose distributions are parameterized by neural networks." Kingma. (2019). An Introduction to Variational Autoencoders. Foundations and Trends in Machine Learning.

[15] "One important advantage of DLVMs, is that even when each factor (prior or conditional distribution) in the directed model is relatively simple (such as conditional Gaussian), the marginal distribution pθ(x) can be very complex"

[注釈 1]

例

心理学

経済

薬

潜在変数モデル

深層潜在変数モデル

潜在変数の推測

ベイズのアルゴリズムと方法

関連項目

脚注

注釈

出典

参考文献