[译] TensorFlow 教程 #13 - 可视化分析

763 阅读14分钟

题图来自:toyota.csail.mit.edu
本文主要对卷积神经网络做可视化分析。

01 - 简单线性模型 | 02 - 卷积神经网络 | 03 - PrettyTensor | 04 - 保存& 恢复
05 - 集成学习 | 06 - CIFAR 10 | 07 - Inception 模型 | 08 - 迁移学习
09 - 视频数据 | 11 - 对抗样本 | 12 - MNIST的对抗噪声

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube
中文翻译 thrillerist / Github

如有转载,请附上本文链接。


介绍

在之前的一些关于卷积神经网络的教程中,我们展示了卷积滤波权重,比如教程#02和#06。但单从滤波权重上看,不可能确定卷积滤波器能从输入图像中识别出什么。

本教程中,我们会提出一种用于可视化分析神经网络内部工作原理的基本方法。这个方法就是生成最大化神经网络内个体特征的图像。图像用一些随机噪声初始化,然后用给定特征关于输入图像的梯度来逐渐改变(生成的)图像。

可视化分析神经网络的方法也称为 特征最大化(feature maximization) 激活最大化(activation maximization)**。

本文基于之前的教程。你需要大概地熟悉神经网络(详见教程 #01和 #02),了解Inception模型也很有帮助(教程 #07)。

流程图

这里将会使用教程 #07中的Inception模型。我们想要找到使得神经网络内给定特征最大化的图像。输入图像用一些噪声初始化,然后用给定特征的梯度来更新图像。在执行了一些优化迭代之后,我们会得到一个这个特定特征“喜欢看到的”图像。

由于Inception模型是由很多相结合的基本数学运算构造的,使用微分链式法则,TensorFlow让我们很快就能找到损失函数的梯度。

from IPython.display import Image, display
Image('images/13_visual_analysis_flowchart.png')

导入

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

# Functions and classes for loading and using the Inception model.
import inception

使用Python3.5.2(Anaconda)开发,TensorFlow版本是:

tf.__version__

'1.1.0'

Inception 模型

从网上下载Inception模型

从网上下载Inception模型。这是你保存数据文件的默认文件夹。如果文件夹不存在就自动创建。

# inception.data_dir = 'inception/'

如果文件夹中不存在Inception模型,就自动下载。 它有85MB。

inception.maybe_download()

Downloading Inception v3 Model ...
Download progress: 100.0%
Download finished. Extracting files.
Done.

卷积层的名称

这个函数返回Inception模型中卷积层的名称列表。

def get_conv_layer_names():
    # Load the Inception model.
    model = inception.Inception()

    # Create a list of names for the operations in the graph
    # for the Inception model where the operator-type is 'Conv2D'.
    names = [op.name for op in model.graph.get_operations() if op.type=='Conv2D']

    # Close the TensorFlow session inside the model-object.
    model.close()

    return names
conv_names = get_conv_layer_names()

在Inception模型中总共有94个卷积层。

len(conv_names)

94

写出头5个卷积层的名称。

conv_names[:5]

['conv/Conv2D',
'conv_1/Conv2D',
'conv_2/Conv2D',
'conv_3/Conv2D',
'conv_4/Conv2D']

写出最后5个卷积层的名称。

conv_names[-5:]

['mixed_10/tower_1/conv/Conv2D',
'mixed_10/tower_1/conv_1/Conv2D',
'mixed_10/tower_1/mixed/conv/Conv2D',
'mixed_10/tower_1/mixed/conv_1/Conv2D',
'mixed_10/tower_2/conv/Conv2D']

找到输入图像的帮助函数

这个函数用来寻找使网络内给定特征最大化的输入图像。它本质上是用梯度法来进行优化。图像用小的随机值初始化,然后用给定特征关于输入图像的梯度来逐步更新。

def optimize_image(conv_id=None, feature=0,
                   num_iterations=30, show_progress=True):
    """
    Find an image that maximizes the feature
    given by the conv_id and feature number.

    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last fully-connected layer
             before the softmax output.
    feature: Index into the layer for the feature to maximize.
    num_iteration: Number of optimization iterations to perform.
    show_progress: Boolean whether to show the progress.
    """

    # Load the Inception model. This is done for each call of
    # this function because we will add a lot to the graph
    # which will cause the graph to grow and eventually the
    # computer will run out of memory.
    model = inception.Inception()

    # Reference to the tensor that takes the raw input image.
    resized_image = model.resized_image

    # Reference to the tensor for the predicted classes.
    # This is the output of the final layer's softmax classifier.
    y_pred = model.y_pred

    # Create the loss-function that must be maximized.
    if conv_id is None:
        # If we want to maximize a feature on the last layer,
        # then we use the fully-connected layer prior to the
        # softmax-classifier. The feature no. is the class-number
        # and must be an integer between 1 and 1000.
        # The loss-function is just the value of that feature.
        loss = model.y_logits[0, feature]
    else:
        # If instead we want to maximize a feature of a
        # convolutional layer inside the neural network.

        # Get the name of the convolutional operator.
        conv_name = conv_names[conv_id]

        # Get a reference to the tensor that is output by the
        # operator. Note that ":0" is added to the name for this.
        tensor = model.graph.get_tensor_by_name(conv_name + ":0")

        # Set the Inception model's graph as the default
        # so we can add an operator to it.
        with model.graph.as_default():
            # The loss-function is the average of all the
            # tensor-values for the given feature. This
            # ensures that we generate the whole input image.
            # You can try and modify this so it only uses
            # a part of the tensor.
            loss = tf.reduce_mean(tensor[:,:,:,feature])

    # Get the gradient for the loss-function with regard to
    # the resized input image. This creates a mathematical
    # function for calculating the gradient.
    gradient = tf.gradients(loss, resized_image)

    # Create a TensorFlow session so we can run the graph.
    session = tf.Session(graph=model.graph)

    # Generate a random image of the same size as the raw input.
    # Each pixel is a small random value between 128 and 129,
    # which is about the middle of the colour-range.
    image_shape = resized_image.get_shape()
    image = np.random.uniform(size=image_shape) + 128.0

    # Perform a number of optimization iterations to find
    # the image that maximizes the loss-function.
    for i in range(num_iterations):
        # Create a feed-dict. This feeds the image to the
        # tensor in the graph that holds the resized image, because
        # this is the final stage for inputting raw image data.
        feed_dict = {model.tensor_name_resized_image: image}

        # Calculate the predicted class-scores,
        # as well as the gradient and the loss-value.
        pred, grad, loss_value = session.run([y_pred, gradient, loss],
                                             feed_dict=feed_dict)

        # Squeeze the dimensionality for the gradient-array.
        grad = np.array(grad).squeeze()

        # The gradient now tells us how much we need to change the
        # input image in order to maximize the given feature.

        # Calculate the step-size for updating the image.
        # This step-size was found to give fast convergence.
        # The addition of 1e-8 is to protect from div-by-zero.
        step_size = 1.0 / (grad.std() + 1e-8)

        # Update the image by adding the scaled gradient
        # This is called gradient ascent.
        image += step_size * grad

        # Ensure all pixel-values in the image are between 0 and 255.
        image = np.clip(image, 0.0, 255.0)

        if show_progress:
            print("Iteration:", i)

            # Convert the predicted class-scores to a one-dim array.
            pred = np.squeeze(pred)

            # The predicted class for the Inception model.
            pred_cls = np.argmax(pred)

            # Name of the predicted class.
            cls_name = model.name_lookup.cls_to_name(pred_cls,
                                               only_first_name=True)

            # The score (probability) for the predicted class.
            cls_score = pred[pred_cls]

            # Print the predicted score etc.
            msg = "Predicted class-name: {0} (#{1}), score: {2:>7.2%}"
            print(msg.format(cls_name, pred_cls, cls_score))

            # Print statistics for the gradient.
            msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}"
            print(msg.format(grad.min(), grad.max(), step_size))

            # Print the loss-value.
            print("Loss:", loss_value)

            # Newline.
            print()

    # Close the TensorFlow session inside the model-object.
    model.close()

    return image.squeeze()

绘制图像和噪声的帮助函数

函数对图像做归一化,则像素值在0.0到1.0之间。

def normalize_image(x):
    # Get the min and max values for all pixels in the input.
    x_min = x.min()
    x_max = x.max()

    # Normalize so all values are between 0.0 and 1.0
    x_norm = (x - x_min) / (x_max - x_min)

    return x_norm

这个函数绘制一张图像。

def plot_image(image):
    # Normalize the image so pixels are between 0.0 and 1.0
    img_norm = normalize_image(image)

    # Plot the image.
    plt.imshow(img_norm, interpolation='nearest')
    plt.show()

这个函数在坐标系内绘制6张图。

def plot_images(images, show_size=100):
    """
    The show_size is the number of pixels to show for each image.
    The max value is 299.
    """

    # Create figure with sub-plots.
    fig, axes = plt.subplots(2, 3)

    # Adjust vertical spacing.
    fig.subplots_adjust(hspace=0.1, wspace=0.1)

    # Use interpolation to smooth pixels?
    smooth = True

    # Interpolation type.
    if smooth:
        interpolation = 'spline16'
    else:
        interpolation = 'nearest'

    # For each entry in the grid.
    for i, ax in enumerate(axes.flat):
        # Get the i'th image and only use the desired pixels.
        img = images[i, 0:show_size, 0:show_size, :]

        # Normalize the image so its pixels are between 0.0 and 1.0
        img_norm = normalize_image(img)

        # Plot the image.
        ax.imshow(img_norm, interpolation=interpolation)

        # Remove ticks.
        ax.set_xticks([])
        ax.set_yticks([])

    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

优化和绘制图像的帮助函数

这个函数优化多张图像并绘制它们。

def optimize_images(conv_id=None, num_iterations=30, show_size=100):
    """
    Find 6 images that maximize the 6 first features in the layer
    given by the conv_id.

    Parameters:
    conv_id: Integer identifying the convolutional layer to
             maximize. It is an index into conv_names.
             If None then use the last layer before the softmax output.
    num_iterations: Number of optimization iterations to perform.
    show_size: Number of pixels to show for each image. Max 299.
    """

    # Which layer are we using?
    if conv_id is None:
        print("Final fully-connected layer before softmax.")
    else:
        print("Layer:", conv_names[conv_id])

    # Initialize the array of images.
    images = []

    # For each feature do the following. Note that the
    # last fully-connected layer only supports numbers
    # between 1 and 1000, while the convolutional layers
    # support numbers between 0 and some other number.
    # So we just use the numbers between 1 and 7.
    for feature in range(1,7):
        print("Optimizing image for feature no.", feature)

        # Find the image that maximizes the given feature
        # for the network layer identified by conv_id (or None).
        image = optimize_image(conv_id=conv_id, feature=feature,
                               show_progress=False,
                               num_iterations=num_iterations)

        # Squeeze the dim of the array.
        image = image.squeeze()

        # Append to the list of images.
        images.append(image)

    # Convert to numpy-array so we can index all dimensions easily.
    images = np.array(images)

    # Plot the images.
    plot_images(images=images, show_size=show_size)

结果

为浅处的卷积层优化图像

举个例子,寻找让卷积层conv_names[conv_id]中的2号特征最大化的输入图像,其中conv_id=5

image = optimize_image(conv_id=5, feature=2,
                       num_iterations=30, show_progress=True)

Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.81%
Gradient min: -0.000083, max: 0.000100, stepsize: 76290.32
Loss: 4.83793

Iteration: 1
Predicted class-name: kite (#397), score: 15.12%
Gradient min: -0.000142, max: 0.000126, stepsize: 71463.42
Loss: 5.59611

Iteration: 2
Predicted class-name: wall clock (#524), score: 6.85%
Gradient min: -0.000119, max: 0.000121, stepsize: 80427.39
Loss: 6.91725

...
Iteration: 28
Predicted class-name: bib (#941), score: 19.26%
Gradient min: -0.000043, max: 0.000043, stepsize: 214742.82
Loss: 17.7469

Iteration: 29
Predicted class-name: bib (#941), score: 18.87%
Gradient min: -0.000047, max: 0.000059, stepsize: 218511.00
Loss: 17.9321

plot_image(image)

为卷积层优化多张图像

下面,我们为Inception模型中的卷积层优化多张图像,并绘制它们。这些图像展示了卷积层“想看到的”内容。注意更深的层次里图案变得越来越复杂。

optimize_images(conv_id=0, num_iterations=10)

Layer: conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5

optimize_images(conv_id=3, num_iterations=30)

Layer: conv_3/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=4, num_iterations=30)

Layer: conv_4/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=5, num_iterations=30)

Layer: mixed/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=6, num_iterations=30)

Layer: mixed/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=7, num_iterations=30)

Layer: mixed/tower/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=8, num_iterations=30)

Layer: mixed/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=9, num_iterations=30)

Layer: mixed/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=10, num_iterations=30)

Layer: mixed/tower_1/conv_2/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=20, num_iterations=30)

Layer: mixed_2/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=30, num_iterations=30)

Layer: mixed_4/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=40, num_iterations=30)

Layer: mixed_5/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=50, num_iterations=30)

Layer: mixed_6/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=60, num_iterations=30)

Layer: mixed_7/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=70, num_iterations=30)

Layer: mixed_8/tower/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=80, num_iterations=30)

Layer: mixed_9/tower_1/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=90, num_iterations=30)

Layer: mixed_10/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

optimize_images(conv_id=93, num_iterations=30)

Layer: mixed_10/tower_2/conv/Conv2D
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6

Softmax前最终的全连接层

现在,我们为Inception模型中的最后一层优化并绘制图像。这是在softmax分类器前的全连接层。该层特征对应了输出的类别。

我们可能希望在这些图像里看到一些可识别的图案,比如对应输出类别的猴子、鸟类等,但图像只显示了一些复杂的、抽象的图案。

optimize_images(conv_id=None, num_iterations=30)

Final fully-connected layer before softmax.
Optimizing image for feature no. 1
Optimizing image for feature no. 2
Optimizing image for feature no. 3
Optimizing image for feature no. 4
Optimizing image for feature no. 5
Optimizing image for feature no. 6


上面只显示了100x100像素的图像,但实际上是299x299像素。如果我们执行更多的优化迭代并画出完整的图像,可能会有一些可识别的模式。那么,让我们再次优化第一张图像,并以全分辨率来绘制。

Inception模型以大约100%的确信度将结果图像分类成“敏狐”,但在人眼看来,图像只是一些抽象的图案。

如果你想测试另一个特征号码,要注意,号码必须介于0到1000之间,因为它对应了最终输出层的一个有效类别号。

image = optimize_image(conv_id=None, feature=1,
                       num_iterations=100, show_progress=True)

Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.98%
Gradient min: -0.006252, max: 0.004451, stepsize: 3734.48
Loss: -0.837608

Iteration: 1
Predicted class-name: ballpoint (#907), score: 8.52%
Gradient min: -0.007303, max: 0.006427, stepsize: 2152.89
Loss: -0.416723
...
Iteration: 98
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.007732, max: 0.010692, stepsize: 1286.44
Loss: 67.5603

Iteration: 99
Predicted class-name: kit fox (#1), score: 100.00%
Gradient min: -0.005850, max: 0.006159, stepsize: 1863.65
Loss: 75.6356

plot_image(image=image)

关闭TensorFlow会话

在上面使用Inception模型的函数中已经关闭了TensorFlow会话。这么做是为了节省内存,因此当计算图中添加了很多梯度函数时,电脑不会奔溃。

总结

这篇教程说明了如何优化输入图像,使得神经网络内的特征最大化。由于神经网络内给定特征(或神经元)对特定的图像反应最强烈,这让我们可以对其“喜欢看到的东西”进行可视化分析。

对神经网络的较低层,图像包含了简单的图案,比如不同类型的波浪线。随着网络越来越深,图像模式越来越复杂。我们可能会希望深层网络的模式是可识别的,比如猴子、狐狸、汽车等等,但实际上深层网络的图像模式更加复杂和抽象。

这是为什么?回想在教程 #11中,Inception模型很容易就被一些对抗噪声糊弄,而将任何输入图分类为另外的目标类别。因此,不难想象Inception模型可以识别这些在人眼看来并不清楚的抽象图像模式。可能存在无穷多的能够最大化神经网络内部特征的图像,并且人类只能识别出其中的一小部分。这也许是优化过程只找到抽象图像模式的原因。

其他方法

研究文献中还有许多指导优化过程的建议,从而找到人类更易识别的图像模式。

这篇文章提出了一种结合启发式来引导图像模式的优化过程。论文中展示了一些类别的样本图像,比如火烈鸟、鹈鹕、黑天鹅,人眼多多少少都能识别出来。在这里有方法的实现(精确的行数以后可能会改变)。这个方法需要启发式的组合并对参数进行微调,以生成这些图像。但论文中参数的选择并不明确。尽管尝试了一番,我还是无法重现他们的结果。也许我误解了这篇论文,或许启发式对他们网络架构(一种AlexNet的变体)的微调是好的,然而这篇教程中用的是更先进的Inception模型。

这篇文章提出了另一种生成人眼可识别的图像的方法。然而,实际上这个方法作弊了,因为它遍历训练集中的所有图像(比如ImageNet),找到能最大激活神经网络中给定特征的图像。然后对相似的图像做聚类和平均。将这个作为优化程序的初始图像。因此,当使用从真实照片构造的图像时,这个方法能得到更好的结果也不足为怪了。

练习

下面使一些可能会让你提升TensorFlow技能的一些建议练习。为了学习如何更合适地使用TensorFlow,实践经验是很重要的。

在你对这个Notebook进行修改之前,可能需要先备份一下。

  • 尝试为网络中较低层的特征运行多次优化。得到的图像总是相同吗?
  • 试着用更少或更多的优化迭代。这对图像质量有何影响?
  • 试着改变卷积特征的损失函数。这可以用不同的方法来做。它将如何影响图样模式?为什么?
  • 你认为优化器除了增大我们想要最大化的那个特征之外,会放大其他特征吗?你要怎么度量这个?你确定优化器一次只会最大化一个特征吗?
  • 试着同时最大化多个特征。
  • 在MNIST数据集上训练一个小一点的网络,然后试着对特征和层次做可视化。会更容易在图像中看到图案吗?
  • 试着实现上述论文中的方法。
  • 试着用你自己的方法来改善优化的图像。
  • 向朋友解释程序如何工作。