🔥 Burn Fat Fast. Discover How! 💪

MLOps basics: 5 formats for transferring ML models For ML syst | Big Data Science

MLOps basics: 5 formats for transferring ML models
For ML systems, portability between different stages of the life cycle, from development to deployment in production, is important. For example, a Data Scientist writes code in notebooks like Jupyter Notebook or Google Colab. When porting this code to a production environment, it should be converted to a lightweight interchange format, compressed and serialized, that is independent of the development language. These formats are as follows:
• Pickle is a binary version of a Python object for serialization and deserialization of its structure, ie. converting a hierarchy of Python objects to a stream of bytes and vice versa;
• ONNX (Open Neural Network Exchange) is an open source format for ML models that provides a common set of operators and a universal file format for various platforms and tools. The ONNX format describes a computation graph (input, output, and operations) and is self-contained. It is deep learning focused, supported by Microsoft and Facebook, and works great with TensorFlow and PyTorch.
• PMML (Predictive Model Markup Language) is an XML-based predictive model exchange format that allows you to develop a model in one system for one application and deploy it to another using another application by passing an XML configuration file.
• PFA (Portable Format for Analytics) is a standard for statistical models and data transformation engines that is easily portable between different systems and models. Pre-processing and post-processing functions can be chained together and built into complex workflows. A PFA can be a simple raw data transformation or a complex set of parallel data mining models with a JSON or YAML configuration file.
• NNEF (Neural Network Exchange Format) is a format that facilitates the process of deploying machine learning, allowing you to use a set of neural network training tools for applications on various devices and platforms.
There are also framework-specific formats, such as POJO/MOJO for the H2O AutoML platform and Spark MLWritable for Apache Spark.