Alessandro Mangone

Alessandro Mangone Photo

Data Engineer

29th September 2019 #hue  #code  #refactoring  #python 5 min read

Changing code - almost - without editing it: the magic of decorators

As a Data Engineer, one of the common tasks is to support the Data Science team. Sometimes you are required to customize one of the products they use daily. How to make these customizations the least invasive, so that you can easily update it?

Problem

Let's say your data science team is using Hue to perform data exploration activities on the datasets in your data lake. They find the dataset they want, they understand how to extract the information they want from it, so they create a wonderful query and want to save it to a file, for example to schedule it.

Here comes the pain:

Queries are saved in the internal Hue database, there is no option to store queries as files somewhere else (e.g. S3 or the local filesystem).

It is true that one could simply copy-paste the query in a local sql file and version it using git or any other SVC, but that's not the best user experience, especially for someone that is not super technical.

Exploring the Hue codebase

You put aside your data engineer-y skills and start digging into the Hue codebase, trying to find the code that saves queries in the hue database. Eventually you find that it is in the notebook library.

After some further digging you discover how Hue saves a notebook:

def _save_notebook(notebook, user):
  notebook_type = notebook.get('type', 'notebook')
  save_as = False

  if notebook.get('parentSavedQueryUuid'): # We save into the original saved query, not into the query history
    notebook_doc = Document2.objects.get_by_uuid(user=user, uuid=notebook['parentSavedQueryUuid'])
  elif notebook.get('id'):
    notebook_doc = Document2.objects.get(id=notebook['id'])
  else:
    notebook_doc = Document2.objects.create(name=notebook['name'], uuid=notebook['uuid'], type=notebook_type, owner=user)
    Document.objects.link(notebook_doc, owner=notebook_doc.owner, name=notebook_doc.name, description=notebook_doc.description, extra=notebook_type)
    save_as = True

    if notebook.get('directoryUuid'):
      notebook_doc.parent_directory = Document2.objects.get_by_uuid(user=user, uuid=notebook.get('directoryUuid'), perm_type='write')
    else:
      notebook_doc.parent_directory = Document2.objects.get_home_directory(user)

  notebook['isSaved'] = True
  notebook['isHistory'] = False
  notebook['id'] = notebook_doc.id
  _clear_sessions(notebook)
  notebook_doc1 = notebook_doc._get_doc1(doc2_type=notebook_type)
  notebook_doc.update_data(notebook)
  notebook_doc.search = _get_statement(notebook)
  notebook_doc.name = notebook_doc1.name = notebook['name']
  notebook_doc.description = notebook_doc1.description = notebook['description']
  notebook_doc.save()
  notebook_doc1.save()

  return notebook_doc, save_as


@api_error_handler
@require_POST
@check_document_modify_permission()
def save_notebook(request):
  response = {'status': -1}

  notebook = json.loads(request.POST.get('notebook', '{}'))

  notebook_doc, save_as = _save_notebook(notebook, request.user)

  response['status'] = 0
  response['save_as'] = save_as
  response.update(notebook_doc.to_dict())
  response['message'] = request.POST.get('editorMode') == 'true' and _('Query saved successfully') or _('Notebook saved successfully')

  return JsonResponse(response)

Interesting, you observe what Hue is passing in the request and discover that the _save_notebook method contains all the data you need to save the query to a file:

  • the name selected when saving the query
  • the query
  • the query type

Time to customize Hue

Now that you feel like a hacker because you discovered how Hue works under the hood, you are tempted to change the function code by adding some lines to save your query to a local file.

Something like this (This code is untested and doesn't validate the notebook object before using its properties, don't use it in production!!!):

@api_error_handler
@require_POST
@check_document_modify_permission()
def save_notebook(request):
  response = {'status': -1}

  notebook = json.loads(request.POST.get('notebook', '{}'))

  notebook_doc, save_as = _save_notebook(notebook, request.user)

  # Save query to a local file
  query_name = notebook['name'].replace(' ', '_')
  query_type = notebook['type'] # e.g. 'hive-query'
  destination = '/path/to/queries/{}/{}/{}.sql'.format(request.user, query_type, query_name)
  output_file = open(destination, 'w')
  output_file.write(str(notebook['snippets'][0]['raw_statement']))
  output_file.close()


  response['status'] = 0
  response['save_as'] = save_as
  response.update(notebook_doc.to_dict())
  response['message'] = request.POST.get('editorMode') == 'true' and _('Query saved successfully') or _('Notebook saved successfully')

  return JsonResponse(response)

This code saves your query to a local file, so you can reference it from external tools.

Good job team, let's go grab some coffee ☕!

Not that fast!

The Product Owner of your team rushes into the office and tells you that you need to update to the latest Hue version that has just been released.

It's time to:

  • roll the update
  • re-apply your customization, plus eventually any other customizations that have been done to Hue.

For some reason you cannot make a Pull Request to the Hue project and you have to maintain this customization on your own.

Wouldn't it be great if there was a way to re-apply your customization to any latest version of Hue?

Enter Python Decorators

According to Wikipedia

A decorator is any callable Python object that is used to modify a function, method or class definition.

A decorator is passed the original object being defined and returns a modified object, which is then bound to the name in the definition.

Decorators are an easy way to extend the behavior of a function by doing something before or after the function is called:

def my_decorator(function_to_decorate):
    def decoration_function(*args, **kwargs):
        print('Before function call')
        function_to_decorate(*args, **kwargs)
        print('After function call')
    return decoration_function

@my_decorator
def decorable_function():
    print('Inside function call')

decorable_function()

Executing the following code will produce the output:

Before function call
Inside function call
After function call

With this powerful Python feature, we could save our query locally after it has been stored in the Hue database without raising any Exception.

Time to decorate!

Knowing all these informations, we can write a decorator that takes our notebook object and saves the query to a local file.

Therefore we create a query_decorators.py file:

def save_query_locally(function_to_decorate):
    def decoration_function(*args, **kwargs):
        # Before function call
        function_to_decorate(*args, **kwargs)
        # After function call
        # Save query to a local file
        notebook = args[0]
        user = args[1]
        query_name = notebook['name'].replace(' ', '_')
        query_type = notebook['type'] # e.g. 'hive-query'
        destination = '/path/to/queries/{}/{}/{}.sql'.format(user, query_type, query_name)
        output_file = open(destination, 'w')
        output_file.write(str(notebook['snippets'][0]['raw_statement']))
        output_file.close()
    return decoration_function

We copy this file inside the desktop/libs/notebook/src/notebook/ directory of the Hue installation.

And in the api.py file, first we import the newly created decorator by adding a new import statement

from notebook.query_decorators import save_query_locally

we then decorate the _save_notebook function:

@save_query_locally
def _save_notebook(notebook, user):
    # ...

Everytime a user saves a query, the new decorator will also take care of saving the query to a local file.

If needed the decorator can even go further by pushing the query to automatically version it, imagination is the limit.

N.B. All these steps can be automated, even by a simple bash script, for example by using cp or sed commands.

Conclusions

Decorators are quite a powerful Python feature that allows to achieve a lot of goals when writing your code, including extending product features without heavily modifying codebases that you cannot directly change or maintain.

Useful Resources

If you want to learn more about decorators, you could see PEP 318 and the Python Decorator Wiki, they both contain historical discussions and guidelines on how decorators should be implemented. You can find some examples of decorators in the Python Decorator Library.