Using the GPU for custom function evaluation without writing a metal shader.


MPSGraph enables us to run the math on our GPUs without needing to write C++ based Metal shaders. We can instead combine logical and arithmetic nodes in a graph that the framework then stitches together so that we can run our evaluation on the GPU.


In this post, we will go into the steps needed to evaluate a polynomial function, of variable size, rapidly on the GPU. This can be easily modified for other arithmetic functions. In a future post, I plan on documenting how we can use the new control flow nodes to provide GPU based solvers for finding approximate numerical solutions to computationally complex functions.


Building the graph


import Metal
import MetalPerformanceShaders
import MetalPerformanceShadersGraph


let graph = MPSGraph()

// Create placeholder for the exponents of polynomial
let exponents = graph.placeholder(
  shape: nil,
  dataType: .float32,
  name: "exponents"
)
     
// Create placeholder coefficients of the polynomial
let coefficients = graph.placeholder(
  shape: nil,
  dataType: .float32,
  name: "coefficients"
)

// Create placeholder for the x value that we will evaluate
let location = graph.placeholder(
  shape: nil,
  dataType: .float32,
  name: "location"
)

These are the inputs to our function. The exponents and coefficients will be used in the function f(location) = coefficients[0]*(location^ exponents[0]) + coefficients[1]*(location^exponents[1]) + ...


At this point, we are not constraining the shape of the values since we would like to be able to re-use this code for many different sizes of the polynomial.


Creating the first node

The first thing we need to do is raise the location to the power in the exponents.

let locationHat = graph.power(
 location,
 exponents,
  name: "power"
)

The framework provides a .power(::name) method that creates a node to do this. Since the location is a single value this node will return a new vector of the same length as the exponents raising the location to the power or each corresponding component.

let values = graph.multiplication(
  coefficients,
  locationHat,
  name: "multiplication"
)

Next we create a multiplication(::name) node combining the coefficients vector with the pairwise values from the locationHat vector.

let value = graph.reductionSum(
  with: values,
  axes: nil,
  name: "sum"
)

Final we apply a reductionSum(with:axis:name) to this output to resolve a single value as our output.

Preparing data to evaluate

There are multiple solutions to do this and depending on how frequently you need to change the data you might need to consider a more involved method. For simplicity here will create the exponents and coefficients in Swift on the CPU and copy/share them with the GPU.

Consider the simple example of f(x) = 2 + 2x + 2x^2 + 2x^3:

let exponentValues: [Float32] = [0, 1, 2, 3]
let coefficientValues: [Float32] = [2, 2, 2, 2]

With these values defined in Swift to copy these to the GPU, create an extension on Array so that we can produce a matching MTLBuffer on a target metal device.

extension Array where Element == Float32 {   
  func asBuffer(on device: MTLDevice) -> MTLBuffer? {
    self.withContiguousStorageIfAvailable { pointer -> MTLBuffer? in
      guard let baseAddress = pointer.baseAddress else {
        return nil
      }
       
      return device.makeBuffer(
        bytes: baseAddress,
        length: pointer.count * MemoryLayout<Float32>.size,
        options: MTLResourceOptions.storageModeShared
      )
    } ?? nil
  }
}

This buffer lacks information about the data type that is stored within it so we need to create an MPSVector that wraps the buffer describing its value type.

extension Array where Element == Float32 {   
  func asVector(on device: MTLDevice) -> MPSVector? {
    guard let buffer = self.asBuffer(on: device) else {
      return nil
    }
    return MPSVector(
      buffer: buffer,
      descriptor: MPSVectorDescriptor(
        length: self.count,
        dataType: MPSDataType.float32
      )
    )
  }
}

We can now create instances of MPSGraphTensorData that the graph takes as input.

// Get your GPU
let device = MTLCreateSystemDefaultDevice()!

let exponentsTensorData = MPSGraphTensorData(
  exponentValues.asVector(on: device)!
)

let coefficientsTensorData = MPSGraphTensorData(
  coefficientsValues.asVector(on: device)!
)

We still need to create one more input that is the value of x that we want to evaluate.

let locationTensorData = MPSGraphTensorData(
  MPSNDArray(device: device, scalar: 2.0)
)

For simplicity right now we are just going to evaluate a single location.

Running the Graph

Depending on how you plan on using the graph you can use the simple .run that we will be using here or you can instead pre-compile it and then you can run it as part of a larger metal compute shader pipeline (this could be useful if you need to pre-compute/post-compute some of the data using other Metal shaders).

let results = graph.run(
  feeds: [
    location : locationTensorData,
    exponents: exponentsTensorData,
    coefficients: coefficientsTensorData
  ],
  targetTensors: [value],
  targetOperations: nil
) 

This will (on the first call) compile your graph into a metal shader and then dispatch it to compute.

Retrieving the result

To retrieve the computed result from our graph we need to copy the value back from the GPU to the CPU so that we can use the value within our regular Swift code.

// We can index the results returned by the run method using the `value` tensor.
let result = results[value]

let outputNDArray = result?.mpsndarray()

// Allocate a local array of the correct size so we can copy the values in
var outputValues: [Float32] = [0]

// Copy all the values from the Metal Array to the swift Array
outputNDArray?.readBytes(&outputValues, strideBytes: nil)

The outputValues array will now contain the function evaluation. This method of evaluating polynomials is only worth it if you have a high order polynomial with 100s if not 1000s of terms.

Evaluating multiple locations at once

We can modify the above code a little bit to enable us to evaluate many locations at once.

Firstly we need to create a matrix of our locations with the values saved in the first column. An extension on Array provides an easy way to construct this.

func asMatrix(on device: MTLDevice) -> MPSMatrix? {
  guard let buffer = self.asBuffer(on: device) else {
    return nil
  }

  return MPSMatrix(
    buffer: buffer,
    descriptor: .init(
      rows: self.count,
      columns: 1, rowBytes: MemoryLayout<Float32>.size,
       dataType: .float32
    )
  )
}

We also need to use this when we declare the locationTensorData

let locationValues: [Float32] = [-1, 0, 1]
let locationTensorData = MPSGraphTensorData(   
  locationValues.asMatrix(on: device)!
)

We also need to update the reductionSum to correctly sum up the values for each location rather than all the values over all locations.

let value = graph.reductionSum(
  with: values,
  axes: [NSNumber(1)],
  name: "sum"
)

And finally, we need to ensure when the graph finishes the outputValues array is large enough to have a result for each input location.

  var outputValues: [Float32] = [0, 0, 0]

With these changes, we can evaluate many points at once on the GPU in parallel.