Machine Learning models are getting increasingly popular today. There have been implementations both on the browser, and on the server (like this fake-person generator).
Typically, deploying Machine Learning models at scale involves virtualization, networking, and infrastructure-related know-how. It also comes at a steep cost — even when no data is being processed, servers need to be kept running.
In this blog post, we explore how we can take advantage of the serverless paradigm to deploy our Machine Learning models using Now. This approach allows us simplified deployments, avoids the need for infrastructure maintenance, and includes built-in scalability and cost-control.

An Object-Detection API

To demonstrate serverless Machine Learning, we will build an API to detect objects found within images supplied to us. We will use TensorFlow to collect predictions out of a pre-trained model. We expect our API to accept an image of our choosing, and return the bounding boxes for the detected objects.

An illustration of our API returning the bounding boxes for a plane detected within the supplied image.

[
  {
    "bbox": [{ "x": 205.61, "y": 315.36 }, { "x": 302.98, "y": 345.72 }],
    "class": "airplane",
    "score": 0.736
  }
]

The expected API response structure for the illustration made above.

Getting TensorFlow Ready for Serverless

Now that we know what we want to build, let's start on the implementation. As mentioned earlier, we're going to be using TensorFlow. TensorFlow is available in both Python and JavaScript. We will choose Javascript for this example, but the same approach could be used to develop in Python as well.
After looking it up, we find that there are two packages available to help run TensorFlow in JavaScript:
  • @tensorflow/tfjs, a JavaScript implementation of TensorFlow, accelerated with WebGL
  • @tensorflow/tfjs-node, a version with native C++ bindings
Both packages offer the same API and work within a Node environment, but they differ in their performance. @tensorflow/tfjs-node invokes a library written in C++, which makes it fast when used with Node. @tensorflow/tfjs uses WebGL to replace the C++ library for in-browser performance, which is not a great fit for using with Node.
Therefore, we pick @tensorflow/tfjs-node. Unfortunately, @tensorflow/tfjs-node comes with a large bundle size of approximately 140 MB since it includes the compiled TensorFlow C++ library. The serverless lambda size limit we have available is 50 MB, so our first challenge is to fit @tensorflow/tfjs-node within that.
To overcome that, we can do the following:
  1. Download all TensorFlow dependencies into a replicated lambda environment
  2. Compress and package the dependencies with Brotli
  3. Deflate the package and require it at runtime, during a cold start
To abstract this process for the next time, we created tensorflow-lambda. This allows us to forget about the complexity of deflating packages during runtime.
const loadTf = require('tensorflow-lambda')

const main = async () => {
  const tf = await loadTf()
  // At this point, Tensorflow is ready to use
}

main()

By deflating and requiring the packages at runtime, we are able to overcome the 50 MB size limit of serverless lambdas.

We can now run TensorFlow code within a serverless lambda.

Completing the API

The next step is to create a simple structure for our API. To deal with requests and responses, we will introduce micro and content-type into the mix.
const loadTf = require('tensorflow-lambda')
const { send, buffer } = require('micro')
const contentType = require('content-type')

module.exports = async (req, res) => {
  const tf = await loadTf()
  const tfModel = await loadModel()
  const { type: mimeType } = contentType.parse(req)

  if (['image/jpeg', 'image/png'].includes(mimeType)) {
    const buf = await buffer(req, { limit: '5mb' })
    const { tensor, width, height } = await imgToTensor(buf)
    const { scores, boxes } = await predict(tfModel, tensor)
    const bboxes = await tensorsToBBoxes({ scores, boxes, width, height })
    return send(res, 200, bboxes)
  }

  return send(res, 400, {
    code: 'bad_request',
    message: 'Only png and jpg images are supported'
  }
}

A simple structure for our API, that checks for supplied PNG or JPG images.

To complete this example, we need four functions: loadModel, predict, imgToTensor, and tensorsToBBoxes.
Since several pre-trained models are already available, we will make use of a suitable one from them: ssdlite_mobilenet_v2. TensorFlow has built-in functions to download a model from a url, so we can keep our code simple and terse.
let tfModelCache

async function loadModel() {
  if (tfModelCache) return tfModelCache

  try {
    tfModelCache = await tf.loadGraphModel(`${TF_MODEL_URL}/model.json`)
    return tfModelCache
  } catch (err) {
    console.log(err)
  }
}

We only want to download the model on the first lambda invocation (the cold start), and reuse it for subsequent invocations. Here, we define the loadModel function using a singleton design pattern.

async function predict(tfModel, tensor) {
  const result = await tfModel.executeAsync(tensor)
  const scores = result[0].arraySync()[0]
  const boxes = result[1].dataSync()
  return { scores, boxes }
}

The predict function feeds our model with a tensor and returns the result.

The final step in completing our API is to add normalization functions around our model. We want to give our API a PNG or JPG image, and in return, we need the bounding box coordinates along with their labels.
The remaining functions do exactly this:

A Web App to Try the API

Now that we have our API ready, we can show off the demo using a neat frontend built with Next.js. We will skip the creation of the frontend in this blog post for brevity, but you can take a look at the source code here. Thanks to Now's monorepo support, we can deploy and maintain both the API and frontend together.

object-detection.now.sh: drag and upload any image, or choose from the example images. We find that the object detection marks the detected object with a border.

Optimization

We have everything working now, but there is still something slightly upsetting. On a cold start, the function can takes several seconds to make a prediction.
To alleviate this, we create a new route in the same lambda, called /warm. We move our existing prediction API to /predict. The /warm route is responsible for loading TensorFlow, and then downloading the model.
if (req.url === '/api/warm') {
  await loadTf()
  await loadModel()
  return send(res, 200)
}

The /warm endpoint deflates TensorFlow dependencies and downloads the model.

Now for a trick: we call /warm when our front-end starts. The new React Hooks API makes this easy:
useEffect(() => {
  fetch('/api/warm')
})

Using the React Hooks API, we call /warm when our front-end starts.

In the time that our demo visitor is still debating on which image to analyse, our lambda is warmed up and ready, since it gets enough time to deflate the TensorFlow dependencies and download the model.

Conclusion

In this post, we explored how we could deploy Machine Learning on Now in a serverless fashion, using TensorFlow and Now. Once we overcome the initial challenges, we get an application that is reliably fast, scalable, and cost effective — making the exercise worth it. This post highlights that the serverless paradigm is versatile and supports a powerful set of use-cases.
The entire source code for this demo is open sourced, and can be found on GitHub. We look forward to see what you make with tensorflow-lambda. In case of any questions or feedback, please reach out to us via Twitter or chat.