Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN)  0.13
Performance library for Deep Learning
A Performance Library for Deep Learning

The Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an open source performance library for Deep Learning (DL) applications intended for acceleration of DL frameworks on Intel(R) architecture. Intel MKL-DNN includes highly vectorized and threaded building blocks for implementation of convolutional neural networks (CNN) with C and C++ interfaces. This project is created to help the DL community innovate on the Intel(R) processor family.

The library supports the most commonly used primitives necessary to accelerate bleeding edge image recognition topologies, including Cifar*, AlexNet*, VGG*, GoogleNet*, and ResNet*. The primitives include convolution, inner product, pooling, normalization, and activation with support for inference operations. The library includes the following classes of functions:

Intel MKL DNN primitives implement a plain C/C++ application programming interface (API) that can be used in the existing C/C++ DNN frameworks, as well as in custom DNN applications.

Programming Model

Intel MKL-DNN models memory as a primitive similar to an operation primitive. This allows reconstruction of the graph of computations at run time.

Basic Terminology

Intel MKL-DNN operates on the following main objects:

A typical workflow is to create a set of primitives to run, push them to a stream all at once or one at a time, and wait for completion.

Creating Primitives

In Intel MKL-DNN, creating primitives involves three levels of abstraction:

To create a memory primitive:

  1. Create a memory descriptor. The memory descriptor contains the dimensions, precision, and format of the data layout in memory. The data layout can be either user-specified or set to any. The any format allows the operation primitives (convolution and inner product) to choose the best memory format for optimal performance.
  2. Create a memory primitive descriptor. The memory primitive descriptor contains the memory descriptor and the target engine.
  3. Create a memory primitive. The memory primitive requires allocating a memory buffer and attaching the data handle to the memory primitive descriptor. Note: in the C++ API for creating an output memory primitive, you do not need to allocate buffer unless the output is needed in a user-defined format.

To create an operation primitive:

  1. Create a logical description of the operation. For example, the description of a convolution operation contains parameters such as sizes, strides, and propagation type. It also contains the input and outpumemory descriptors.
  2. Create a primitive descriptor by attaching the target engine to the logical description.
  3. Create an instance of the primitive and specify the input and output primitives.

Examples

A walk-through example for implementing an AlexNet topology using the c++ API:

An introductory example to low-precision 8-bit computations:

The following examples are available in the /examples directory and provide more details about the API.

Performance Considerations

The following link provides a guide to MKLDNN verbose mode for profiling execution:

Operational Details

Auxiliary Types

Legal information