机器学习 by 李宏毅(8-2)

Flow-based model

Generative Model

  • Component-by-component (Auto-regressive Model)
    • What is the best order for the components?
      • 通常从左上角开始生成 pixel
    • Slow generation
  • Variational Auto-encoder
    • Maximizing a lower bound
  • GAN
    • Unstable training

Generator

  • A generator G is a network. The network defines a probability distribution \(P_G\)

  • \(P_G\)\(P_{data}\) 越接近越好,具体来说:

  • \[ \{x^1, x^2,x^3,...,x^m\}\ from\ P_{data} \]

    \[ G^*=arg \underset{G}{max}\sum_{i=1}^{m} logP_G(x^i) \]

  • 等价于 Minimizing,推导见 8-1 \[ arg\underset{G}{min}KL(P_{data}||P_G) \]

  • 由于\(P_G\)非常复杂,很难知道如何 Maximizing \(G^*\),Flow-based model 可以直接 Optimize

Flow-based model

Math Background

Jacobian Matrix---雅克比

\[ x = \begin{bmatrix}x_1 \\x_2\end{bmatrix}\text{, z}= \begin{bmatrix}z_1 \\z_2 \end{bmatrix} \]

\[ x = f(z)\text{, z=}f^{-1}(x) \]

\[ J_f = \begin{bmatrix}\frac{\partial x_1}{\partial z_1} \space \space \frac{\partial x_1}{\partial z_2} \\ \frac{\partial x_2}{\partial z_1}\space \space \frac{\partial x_2}{\partial z_2} \end{bmatrix}\text{, .} J_{f^{-1}} = \begin{bmatrix} \frac{\partial z_1}{\partial x_1} \space \space \frac{\partial z_1}{\partial x_2} \\ \frac{\partial z_2}{\partial x_1}\space \space \frac{\partial z_2}{\partial x_2} \end{bmatrix} \]

\[ J_f*J_{f^{-1}} = I \]

Determinant

行列式-det(Matrix)

1

Change of Variable Theorem

假设 Generator 为 \(x=f(z)\),input distribution 为\(\pi(z)\),output distribution 为 \(p(x)\)

1

distribution 的面积恒为1,利用微积分求出二者的关系

1

1 \[ p\left(x^{\prime}\right)\left|\operatorname{det}\left[\begin{array}{ll}\Delta x_{11} & \Delta x_{21} \\\Delta x_{12} & \Delta x_{22}\end{array}\right]\right|=\pi\left(z^{\prime}\right) \Delta z_{1} \Delta z_{2} \]

\[ p\left(x^{\prime}\right)\left|\frac{1}{\Delta z_{1} \Delta z_{2}}\operatorname{det}\left[\begin{array}{a}\Delta x_{11} & \Delta x_{21} \\\Delta x_{12} & \Delta x_{22}\end{array}\right]\right|=\pi\left(z^{\prime}\right) \]

\[ p\left(x^{\prime}\right)\left|\operatorname{det}\left[\begin{array}{a}\frac{\Delta x_{11}}{\Delta z_1} & \frac{\Delta x_{21}}{\Delta z_1} \\\frac{\Delta x_{12}}{\Delta z_2} &\frac{\Delta x_{22}}{\Delta z_2}\end{array}\right]\right|=\pi\left(z^{\prime}\right) \]

\[ p\left(x^{\prime}\right)\left|\operatorname{det}\left[\begin{array}{ll}\partial x_{1} / \partial z_{1} & \partial x_{2} / \partial z_{1} \\ \partial x_{1} / \partial z_{2} & \partial x_{2} /\partial z_{2} \end{array}\right]\right|=\pi\left(z^{\prime}\right) \]

\[ p\left(x^{\prime}\right)\left|\operatorname{det}\left[\begin{array}{ll}\partial x_{1} / \partial z_{1} & \partial x_{1} / \partial z_{2} \\ \partial x_{2} / \partial z_{1} & \partial x_{2} / \partial z_{2} \end{array}\right]\right|=\pi\left(z^{\prime}\right) \]

\[ p\left(x^{\prime}\right)\left|\operatorname{det}[J_f]\right|=\pi\left(z^{\prime}\right) \]

\[ p\left(x^{\prime}\right)=\pi\left(z^{\prime}\right)\left|\operatorname{det}[J_{f^{-1}}]\right|\text{, x=f(z)} \]

Flow-based model

\[ G^*=arg \underset{G}{max}\sum_{i=1}^{m} logP_G(x^i)\text{, x}=G(z) \]

\[ p_G\left(x^{i}\right)=\pi\left(z^{i}\right)\left|\operatorname{det}[J_{G^{-1}}]\right| \text{, z}^i=G^{-1}(x^i) \]

\[ logp_G\left(x^{i}\right)=log\pi\left(G^{-1}(x^i)\right)+log\left|\operatorname{det}[J_{G^{-1}}]\right| \]

  • we need compute: \(det[J_{G^{-1}}]\), \(G^{-1}\)

  • G 可逆,x 和 z 的大小必须一样,G的限制比较大

1 \[ p_1\left(x^{i}\right)=\pi\left(z^{i}\right)\left|\operatorname{det}[J_{G_1^{-1}}]\right| \]

\[ p_2\left(x^{i}\right)=\pi\left(z^{i}\right)\left|\operatorname{det}[J_{G_1^{-1}}]\right|\left|\operatorname{det}[J_{G_2^{-1}}]\right| \]

\[ p_k\left(x^{i}\right)=\pi\left(z^{i}\right)\left|\operatorname{det}[J_{G_1^{-1}}]\right|...\left|\operatorname{det}[J_{G_k^{-1}}]\right| \]

\[ logp_k\left(x^{i}\right)=log\pi\left(z^{i}\right)+\sum_{h=1}^{k}log\left|\operatorname{det}[J_{G_h^{-1}}]\right| \]

\[ z^i=G^{-1}(...G^{-k}(x^i)) \]

当 z = 0 时,\(\pi (z)\) 取得最大值

1

How to design G

  • Input :z vector
  • output:x vector
  • Generator:G

\[ x=G(z) \]

\[ J_{G}=\left[ \begin{array}{ll}\partial x_{1} / \partial z_{1}& \partial x_{1} / \partial z_{2} & ... & \partial x_{1} / \partial z_{D} \\\partial x_{2} / \partial z_{1} & \partial x_{2} / \partial z_{2}&...&\partial x_{2} / \partial z_{D}\\...&...&...&...\\\partial x_{D} / \partial z_{1} & \partial x_{D} / \partial z_{2} & ...& \partial x_{D} / \partial z_{D}\end{array} \right] \]

Couping Layer and comput \(G^{-1},J_{G}\)

Coupling Layer 将 z 分割为 前d维和后D-d 维两个vector,运算过程如图所示:

1

其中,F,H 为任何类型的Model 或者 Function

  • How to compute \(G^{-1}\)

\[ z_{i\le d} = x_{i\le d}\text{, 直接 copy 前d维} \]

\[ \beta_{d\le i \le D} = F(z_{i\le d})\text{, , }\gamma_{d\le i\le D}=H(z_{i\le d}) \]

\[ z_{d\le i\le D}=\frac{x_{d\le i\le D}-\gamma_{d\le i\le D}}{\beta_{d\le i\le D}} \]

  • How to compute \(J_{G}\)?

\[ p\left(x^{\prime}\right)\left|\operatorname{det}[J_G]\right|=\pi\left(z^{\prime}\right) \]

\[ x_{i\le d} |det[J_{G_{d\times d}}]| = z_{i\le d}\text{, 即}J_{G_{d\times d}} = I_{d\times d} \]

\[ x_{d\le i\le D} = \beta_{d\le i\le D}·z_{d\le i\le D}+\gamma_{d\le i\le D} \]

\[ J_{G_{(D-d) \times (D-d)}}=Diagonal\left[ \begin{array}{ll}\beta_d & 0 & ... & 0 \\0 & \beta_{d+1} &...&0\\...&...&...&...\\0 & 0 & ...& \beta_D \end{array} \right] \]

\[ det(J_G) = det \left( \begin{array}{ll} I_{d\times d} & O \\ ? & Diagonal \end{array} \right) = det(Diagonal) = \beta_d\beta_{d+1}...\beta_{D} \]

1

  • Stacking

堆叠如图所示,会出现前 d 维的 input 与最终的output 的前d维完全一样,只是从 π(z) sample 到的噪声

1

所以可以将 F,H的方向

1

1X1 Convolution

用在 GLOW model 中

  • theory

每个 pixel 的 RGB vector 乘以 3X3的 Matrix 矩阵得到新的 pixel

1

W need to be learned,can shuffle the channels.

if W is a invertible matrix, it is easy to compute \(W^{-1}\). because a matrix uninvertible only when it's determinant is 0 whitch hardly ever happens \[ x = f(z) = Wz \]

\[ J_{f}=\left[ \begin{array}{ll}\partial x_{1} / \partial z_{1}& \partial x_{1} / \partial z_{2} & \partial x_{1} / \partial z_{3} \\\partial x_{2} / \partial z_{1} & \partial x_{2} / \partial z_{2}&\partial x_{2} / \partial z_{3}\\\partial x_{3} / \partial z_{1} & \partial x_{3} / \partial z_{2} & \partial x_{3} / \partial z_{3}\end{array} \right] = \left[ \begin{array}{ll} w_{11}& w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23}\\w_{31} &w_{32}& w_{33}\end{array} \right]=W \]

假设input 为 d x d 的RGB image,每一个 RGB pixel 对应一个 W

1

得到整个 Image 的 determinant: \[ det(input) = (det(W))^{d\times d} \]

Application

1

1