Metaprogramming @ Evertz

Posted on Apr 01, 2022 by Akshay Kumar

Creating client libraries with Metaprogramming

Evertz develops a wide range of products and services for the media and entertainment industry and these products have APIs of different types. The APIs that these products expose allow our customers to interact with our services and build their automation using them. We document these APIs and, when appropriate, we create client libraries to make integration even easier. Our APIs may follow standards like REST, JSON RPC, SOAP, GraphQL, or GRPC, but some of our products have APIs that are custom and don’t follow any standard strictly. In this post we will look at one of these non-standard APIs and how metaprogramming can be used to generate a client library for it.

What is metaprogramming?

Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself while running.

Compilers, transpilers, assemblers and interpreters are examples of metaprograms. They take programs in one form and transform them into machine code, bytecode or even source code in another language. One of the popular examples of metaprogramming is the open-source project called openapi-generator which allows developers to generate client libraries using their REST API spec written in OpenAPI Specification format. Similarly, there is a proprietary code generation tool for SOAP APIs by SMARTBEAR. These tools are great for REST and SOAP APIs but some of our products have custom APIs that do not belong to any standard.

Non-standard API

Most of our APIs follow open standards like REST or GRPC, while others follow in-house standards. For the sake of this post lets assume that in our API, every call is an HTTP POST to the same URL. Instead of the HTTP verb and path describing what the request is, it uses a field in the payload data. For example, let’s say we have a book management system called the library-service, and it manages domain objects like books and bookstores. It has an API that has a method to get a book by its ISBN which works like this:

Request:

POST /library-service/api HTTP/1.1
Content-Type: application/json

{
    "Method": "getBook",
    "Arguments": {
        "ISBN": "9780743273565"
    }
}

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
    "Output": {
        "ISBN": "9780743273565",
        "Title": "The Great Gatsby",
        "Author": "F. Scott Fitzgerald"
    }
}

Basic Client

At Evertz we love python and use it whenever appropriate, so for this post let’s assume we want to make a simple python client library for the library-service. We can use the amazing requests library to write a simple client. We can wrap it up in a class called BasicLibraryService that has a generic_call method. This method takes in the payload as a dictionary, and returns the JSON response parsed into a dictionary. The expectation is that this client should be usable in the following way:

import os
from evertz_library_service.base import BaseLibraryService

library_service = BaseLibraryService('192.0.2.46', os.getenv('username'), os.getenv('password'))
book = library_service.generic_call("getBook", {"ISBN": "9780743273565"})
print(book["Author"])

Using the requests library, this simple client can be written like this:

import requests

class BasicLibraryService:

    def __init__(self, server, user:str, password:str):
        self.url = f'http://{self.server}/library-service/api'
        self._headers = {"Content-type": "application/json", "Accept": "application/json"}
        self.session_key = session_key
        self.server = server

    def generic_call(self, method, **arguments):
        cmd = dict(Method=method, Arguments=dict(**arguments))
        result = requests.post(self.url, json=cmd, headers=self._headers)
        root = result.json()
        output = root.get('Output')
        return output

But this requires the user to know the structure of the dictionary and these structures can get quite complex if your domain objects are mature.

Advanced client

Writing a basic client was simple, but can we create a better client library that allows users to construct native python objects rather than dictionaries. So, our connection object can have different methods for each API call instead of just the generic_call method. These methods should accept and return native python objects. Like this:

import os
from evertz_library_service import LibraryService

mediator = LibraryService('192.0.2.46', os.getenv('username'), os.getenv('password'))
book = mediator.get_book(isbn="9780743273565")
print(book.author)

The new client has a specific method for getBook called get_book. and it returns a native python Book object rather than a dictionary. With proper type hinting, these native domain object classes can improve the development experience.

Our advanced client library needs to include three pieces which are explained in later sections.

  1. Domain object classes
  2. Serialization code
  3. Connection object

Domain object classes

Our client library needs to include simple data holder classes like Book or Bookstore. A simple book class can be written like this:

class Book:

    def __init__(self, isbn:str=None, title: str=None, author: str=None):
        self.isbn = isbn
        self.title = title
        self.author = author

    def set_isbn(self, isbn: str) -> 'Book':
        self.isbn = isbn
        return self

    def set_title(self, title: str) -> 'Book':
        self.title = title
        return self

    def set_author(self, author: str) -> 'Book':
        self.author = author
        return self

Serialization

Since our library-service API accepts the payload in JSON format, we need a way to convert our new native python objects into JSON, so they can be sent over in an API call. Similarly, we also need a way to convert the JSON response of our API calls back to native python objects. These processes are called serialization and deserialization respectively.

One of the most stable and feature-full python libraries for serialization is marshmallow. Marshmallow allows you to write schema classes which can be used to serialize objects to dictionaries and deserialize dictionaries back into native python objects. The official quickstart guide has a great example that explains how schemas work in marshmallow. In our case the marshmallow schema for the book class can look like this:

from marshmallow import fields, post_load
from evertz_library_service.domain.base_domain_schema import BaseDomainSchema
from evertz_library_service.domain import book

class BookSchema(BaseDomainSchema):
    load_return_class_name = 'Book'

    isbn = fields.Str(required=False, allow_none=True, dump_to="ISBN", load_from="ISBN")
    title = fields.Str(required=False, allow_none=True, dump_to="Title", load_from="Title")
    author = fields.Str(required=False, allow_none=True, dump_to="Author", load_from="Author")

    @post_load
    def make_type(self, data):
        return book.Book(**data)

book_schema = BookSchema()

We can now use the schema in the get_book method explained below.

Client object with API methods

Finally, our client would need a service class that has separate methods for each API call. For this, we can extend the basic service class we wrote above called BasicLibraryService. For example the get_book method below calls the generic_call method from the BasicLibraryService and then uses the BookSchema to convert the json response from generic_call to a Book object.

from evertz_library_service.base import BasicLibraryService
from evertz_library_service.domain.book_schema import book_schema

class LibraryService(BasicLibraryService):

    def __init__(self, server, user:str, password:str):
        super(LibraryService, self).__init__(server, user=user, password=password)
        
    def get_book(self, isbn: 'str' = None) -> 'Book':
        """
        :param isbn: The book ISBN to get
        """
        params = dict()

        if isbn is not None:
            params['ISBN'] = isbn

        response_raw = self.generic_call('get', **params)
        response, errors = book_schema.load(response_raw)

We could handwrite the three pieces but for large and mature services there may be a lot of API calls and domain objects to write. For example, we have more than 1200 API calls and 2500 domain objects in just one of our products. This is where we can use metaprogramming to generate this client library.

Client Library Generation with Metaprogramming

Let’s say we want to create the client library generator in python as well. For that we need 3 pieces.

  1. Custom API Specification
  2. Templating
  3. Entrypoint

Custom API Specification

The first thing we need to decide is where are we going to get information about our API from? We need a machine-readable description of our API, that can act as an input to the generation process. We can get it in a few ways:

  • Use a code syntax parsing library to inspect the codebase of the service and generate a json file that describes the domain objects and the API method arguments and responses. Libraries like antlr4, javalang, pycparser, and lark are good starting points for parsing code.
  • Add a build step in your service to export a custom API spec and domain object structure in JSON format.

Essentially, We need a document that describes our API and domain objects in an in-house specification format. Just like OpenAPI Specification is used to describe REST APIs and WSDL is used to describe SOAP APIs, because our API is non-standard, we will need to come up with our own specification format to describe it. For example, we can describe our API in this format:

{
    "DomainObjects": [
        {
            "Name": "Book",
            "Fields": [
                {"Name": "ISBN", "Type": "string"},
                {"Name": "Title", "Type": "string"},
                {"Name": "Author", "Type": "string"},
            ]
        }
    ],
    "Methods": [
        {
            "Name": "getBook",
            "Arguments": [
                {
                    "Name": "ISBN",
                    "Type": "string",
                    "description": "The ISBN of the book you want to get"
                }
            ],
            "ResponseType": "Book"
        }
    ]
}

Templating

We found templating to be the best approach for generating code. We can use a templating library called Jinja2. For example, In Jinja2 you can create a template of an email body like this.


from jinja2 import Template

intro_message_template_text = """
Hello {{ candidate_name }}, 

Your experience at {{ current_company }} matches very well with our role for {{ target_role }}.
Please checkout the job posting and let me know if you are interested.

{{ job_link }}
"""

intro_message_template = Template(intro_message_template_text)

message_to_send = intro_message_template.render(candidate_name="John Doe",
                                                current_company="Decepticons Inc.",
                                                target_role="Front End Developer",
                                                job_link="https://example.com/careers/front-end.html")
print(message_to_send)

As you can see the intro_message_template_text has placeholders for name and company and these can be filled with variables in the render stage. You can do more advanced templating logic in jinja with for loops and if conditions.

In Jinja2 the template for domain object classes could look like:


{%- for used_type_name, used_type in type_context.used_types|dictsort -%}
from {{ used_type.source }} import {{ used_type.snake }} as {{ used_type.snake }}_
{% endfor -%}
from typing import Iterator, List, Callable, Union, Dict


class {{ type_context.name_pascal }}:

    def __init__(self{% for field_name, field in type_context.fields|dictsort %},
                 {{ field.name_snake }}: '{{ field.python_type }}' = None{% endfor %}{% if type_context.fields|length == 0 %}, *args{% endif %}):
{%- for field_name, field in type_context.fields|dictsort %}
        self.{{ field.name_snake }} = {{ field.name_snake }}
{%- endfor %}

{%- for field_name, field in type_context.fields|dictsort %}
    def set_{{ field.name_snake }}(self, {{ field.name_snake }}: '{{ field.python_type }}') -> '{{ type_context.name_pascal }}':
        self.{{ field.name_snake }} = {{ field.name_snake }}
        return self
{%- endfor %}

We can write jinja2 templates to generate marshmallow Schema classes as well.

Templating considerations

  • Alphabetical order: We realized that if we generated the methods and arguments in alphabetical order, when a new version is generated the diff looks really neat.
  • Reserved words: Some python keywords like file and id should not be used as variable or method names. To resolve this conflict you can add an underscore after any variable or method name that conflicts with a python keyword. Here are some other useful methods:
def camel_to_snake(name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    s2 = re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
    s3 = s2 + '_' if s2 in PYTHON_RESERVED_WORDS else s2
    return s3.replace('-', '__').replace(' ', '')

def camel_to_pascal(name):
    pascal = (name[0].upper() + name[1:]).replace('-', '_').replace('/', '_').replace('.', '_').replace(':', '_').replace(' ', '')
    pascal = '_' + pascal if pascal[0].isdigit() or pascal in PYTHON_RESERVED_WORDS else pascal
    return pascal

def value_to_enum(name):
    if name == '':
        return '_'
    enum = camel_to_snake(name)
    pascal = enum.upper().replace('-', '_').replace('/', '_').replace('.', '_').replace(':', '_').replace(' ','').replace(',', '').replace('(', '').replace(')', '').replace('+', 'PLUS')
    pascal = 'E' + pascal if pascal[0].isdigit() or pascal in PYTHON_RESERVED_WORDS else pascal
    return pascal

Entrypoint

Now, we just need to use the templates and the API specification to render the code for the classes

from jinja2 import Template

with open('templates/domain_class_py3.7.jinja2') as file_:
    domain_class_template = Template(file_.read())

with open('library_service_spec.json', 'r') as api_spec_file:
    api_spec = json.load(api_spec_file)

    for domain_object_spec in api_spec['DomainObjects']:
        type_context = build_type_context(domain_object_spec) # A separate method you will need to write to convert/massage the API Specification to jinja template context
        domain_python_text = domain_class_template.render(type_context=type_context)
        with open(domain_file_path, 'w+') as f:
            f.write(domain_python_text)

Continuous Integration Considerations

As we add more code to our product we add new domain objects and new API calls. So, naturally we needed to generate and release new versions of the library everytime we release our product.

  • Monorepo: One way to achieve continuous integration is to use a monorepo and move the code to generate the client library in the same repo as the service code. This way we can add a build stage/target to generate the library and run its own tests as well.
  • Linting: Using a standard linting tool like pylint in the build/CI steps would be very beneficial to make sure the generated code follows your style guide.
  • Tests: Writing unit tests using frameworks like pytest and running them after every build is also a good way to make sure you didn’t break anything.
Akshay Kumar
Software Engineering Manager