TensorFlow . PyTorch . JAX: Deep Learning Frameworks Compared

TensorFlow . PyTorch . JAX: Deep Learning Frameworks Compared
TensorFlow vs. PyTorch vs. JAX: Deep Learning Frameworks Compared
type="image/webp">>
TensorFlow, PyTorch or JAX: Which deep learning framework is the key to your success depends on various factors.
Photo: Evannovostro – shutterstock.com

Deep learning is changing our lives in many ways: Whether it’s Siri or Alexa following our voice commands, real-time translators on our smartphones, or the computer vision technology that enables intelligent robots and autonomous driving. What all these deep learning use cases have in common is that they are built on one of the three leading frameworks:

This article takes a comparative look at the three frameworks mentioned and gives you an indication of the strengths and weaknesses of the deep learning frameworks.

“Nobody was ever fired for buying IBM” was the mantra of the IT industry in the 1970s and 1980s. This could also be reworded with reference to TensorFlow – in the 2010s the measure of things in terms of deep learning. As is well known, however, IBM got into more difficult waters in the 1990s. And TensorFlow? Almost seven years after its release in 2015, is the framework still competitive?

To make it short: yes. Finally, TensorFlow has also evolved since its debut: TensorFlow 1.x was essentially about creating static graphs in a very “un-Python-esque” way. Since TensorFlow 2.x, the “Eager” mode is now also available to create models and directly evaluate their operations – this makes working with TensorFlow feel much more like PyTorch.

At the high level, TensorFlow offers Keras for simplified development, and at the low level, the optimizing XLA (Accelerated Linear Algebra) compiler for speed. This works wonders for GPU performance boosting and is the primary way to take advantage of Google’s TPUs (Tensor Processing Units). These offer unprecedented performance when it comes to training models on a large scale. Then there are all the things TensorFlow has been doing well for years:

  • TensorFlow Serving ensures that models can be served in a well-defined and repeatable manner on a mature platform.

  • TensorFlow.js and TensorFlow Lite make it possible to realign model implementations for the web, for low-power computers such as smartphones, or for resource-constrained devices such as IoT devices.

  • With Google still running 100 percent of its productive deployments on TensorFlow, rest assured that TensorFlow can scale with you.

However, in recent years, TensorFlow has suffered from a certain “lack of energy” that is difficult to ignore. Upgrading from TensorFlow 1.x to TensorFlow 2.x was – to put it bluntly – brutal. So brutal that some companies have decided to switch to PyTorch instead because of the expected effort they would have had to put into making their code work properly on the new major version. TensorFlow also lost ground within the research community in favor of PyTorch’s flexibility.

Ultimately, the “Keras affair” was not beneficial for TensorFlow either: Keras was integrated into the TensorFlow releases two years ago – but was recently spun off again into a separate library with its own release plan. The spin-off from Keras is certainly not something that significantly influences the daily life of a developer – but such a publicly effective about-face does not go through as a confidence-building measure. Apart from that, TensorFlow is a reliable framework and hosts an extensive deep learning ecosystem: you can build applications and models on top of TensorFlow that can be scaled to almost any size – and you will be in good company. Nevertheless, TensorFlow is no longer necessarily the first choice in 2022.

According to what you’ve just read, PyTorch is no longer the upstart that’s following on TensorFlow’s heels. Instead, the framework – primarily in the research sector but increasingly also in production environments – has become a driving force in the world of deep learning.

Since “eager” mode has now become the default development method for both TensorFlow and PyTorch, PyTorch’s autograding seems to have won the “war” against static graphs. Unlike TensorFlow, PyTorch hasn’t experienced any major breaks in its core code since the variable API was abolished in version 0.4. Previously variable was required to use autograd with tensors – now everything is a tensor. That’s not to say there haven’t been a few missteps here and there: for example, if you’ve used PyTorch to train a model on multiple GPUs, you’ve likely encountered the differences between DataParallel and the more recent DistributedDataParallel. You should almost always use DistributedDataParallel (yet DataParallel is not really deprecated).

  • PyTorch has long had to lag behind TensorFlow and JAX when it comes to XLA/TPU support – that has changed in 2022: PyTorch can now rely on TPU VMs and TPU node support.

  • The whole thing is combined with a simple deployment option via the command line.

  • If you don’t want to deal with code nitpicks, there are high-level supplements like PyTorch Lightning that allow you to focus on your actual work instead of rewriting training loops.

On the downside, work on PyTorch Mobile continues, but it’s still far less mature than TensorFlow Lite. In terms of production, PyTorch offers integrations with framework-agnostic platforms like Kubeflow, while the TorchServe project handles deployment details like scaling, metrics, and batch inference. This brings you all the MLOps benefits in a small package maintained by the PyTorch developers themselves. Anyone who says PyTorch isn’t scalable is lying. has been using PyTorch in production for years. Still, there are arguments that PyTorch isn’t quite as well suited as JAX for very, very large training runs that require a multitude of GPUs or TPUs.

PyTorch’s popularity in recent years has also been tied to the success of Hugging Face’s Transformers library. Yes, Transformers now also supports TensorFlow and JAX, but it started out as a PyTorch project and remains closely associated with the framework. With the rise of the Transformers architecture, the flexibility of PyTorch for research, and the ability to Source so many new models within days or hours of release via Hugging Face’s model hub, it’s no wonder that PyTorch resonates everywhere.

If you can do without TensorFlow, Google might have something else for you: JAX is a deep learning framework developed, maintained, and used by Google – but is not an official Google product. However, a look at Google/DeepMind’s publications over the past year makes it clear that much of Google’s research has moved to JAX.

The simplest way to think of JAX is as follows: A GPU/TPU accelerated version of NumPy that can magically vectorize a Python function and perform all of the derivative calculations for those functions. Finally, JAX also has a JIT (Just-In-Time) component that optimizes your code for the XLA compiler, resulting in significant performance gains compared to TensorFlow and PyTorch. Code can potentially run four to five times faster simply by reimplementing it in JAX – with no real optimization. Because JAX works at the NumPy level, JAX code is written at a much lower level than TensorFlow/Keras and, yes, PyTorch too. Fortunately, there’s a small but growing ecosystem of surrounding projects that add extra bits:

  • Neural network libraries offer, for example, Flax or Haiku.

  • Optax is recommended for all optimization needs and PIX for image processing.

Once you’ve worked with a program like Flax, it’s relatively easy to create neural networks. However, be aware that JAX also has a few rough edges: For example, it handles random numbers differently than many other frameworks. If you are deep into a research project with large models that require massive resources to train, converting to JAX is worth considering. The advances the framework offers in areas like deterministic training could be worth the switch alone.

So which deep learning framework should you use? It is not possible to give a general answer to this question. It all depends on the type of problem you want to solve and the scale at which you want to use your models. The computing platforms you are targeting also play a role.

  • If you work in the text and image area and do small or medium-sized research with the aim of using the models in production, is PyTorch probably the best choice for this at the moment.

  • However, if you want to squeeze the last bit of performance out of computing-weak devices, TensorFlow recommended.

  • If you work on training models with tens or hundreds of billions of parameters or more and train them primarily for research purposes, then you should JAX to give a chance.

(FM)

This post is based on an article from our US sister publication Infoworld.


The article is in German

Tags: TensorFlow PyTorch JAX Deep Learning Frameworks Compared

PREV Electricity and gas shortages – The most important graphics on the impending energy crisis – News
NEXT These three women are there!