STFN

Python: Dependency injection for cleaner I/O handling

15 minutes

Apart from these few posts about simple things in MicroPython, this is the first post on this blog in which I talk about programming in Python in a slightly more advanced way.

My official job title is “Senior Python Developer”, and I feel that I do have some experience in working with Python, but I’ve never before felt that I had something interesting enough to share on this blog in that area. And of course there is the whole matter of impostor syndrome, making me doubt whether what I know and use is actually the best thing that should be shared with others. But there can always be a first time, right?

In this blog post I want to share a way of writing code that I was taught by a super cool senior developer. She taught me a lot of things, mostly in terms of ideas of how software should be written. I think this is one of the things that differentiates junior developers from senior developers. Juniors think in code, like “I’ll put a for loop here” or, “I should do a class here”, whereas seniors know and think in ideas, “our software needs to be better separation of concerns”, or “we are breaking the single responsibility principle in this place”. Learning to think in abstract ideas and knowing ways to implement it is imho an important part of one’s programming journey.

Fans of Clean Architecture should recognize some concepts from it in this blog post. CA is something I have barely scraped the surface of, but I am already a big fan of it and I try to implement it as best I can in my software. Maybe one day, when I get better at it, I will write another blog post focusing on it.

OK, the intro is done, let’s go to the main part.

Setting the stage

Let’s imagine a scenario that you are at work, and your boss comes to you and says

You need to create a piece of software that will download a joke about Chuck Norris, and count the number of times a specific word is used in that joke. That functionality is absolutely crucial for our business!

Weird but ok. The actual usefulness of this code is irrelevant, what we should focus on is how it works. What is important is that the code needs to have two distinct functionalities: business logic (counting the number of occurrences of a word) and input/output handling (external API calls).

You are a good developer, so you sit down to work and create a class that does what it should, and add tests. You should always have tests for your code.

The code is available at codeberg.org

First solution

main.py

import httpx


CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"

ID1 = "a4aNCYsKQu-4LKDLkTQLSA"


class ChuckNorrisJokeWordCounter:
    def _download_joke(self, id: str) -> dict:
        url = CHUCK_NORRIS_API_URL + id
        return httpx.get(url=url).json()

    def word_counter(self, word: str, id: str) -> int:
        print(f"fetching joke with id {id}...")
        joke_data = self._download_joke(id=id)
        joke_text = joke_data["value"]
        print(f"downloaded joke with content: {joke_text}")
        return joke_text.count(word)


if __name__ == "__main__":
    chuck_word_counter = ChuckNorrisJokeWordCounter()
    word_count = chuck_word_counter.word_counter("Chuck", ID1)
    print(f"in that joke, the word Chuck was found {word_count} times")

test.py

from httpx import Response

from main import CHUCK_NORRIS_API_URL, ChuckNorrisJokeWordCounter


def test_counting_words_chuck(respx_mock):
    id = "123qwe"
    response_data = {
        "categories": [],
        "created_at": "2020-01-05 13:42:26.766831",
        "icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
        "id": id,
        "updated_at": "2020-01-05 13:42:26.766831",
        "url": f"https://api.chucknorris.io/jokes/{id}",
        "value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
    }
    mock_response = respx_mock.get(CHUCK_NORRIS_API_URL + id)
    mock_response.return_value = Response(200, json=response_data)

    chuck_word_counter = ChuckNorrisJokeWordCounter()
    word_count = chuck_word_counter.word_counter("Chuck", id)

    assert word_count == 1


def test_counting_words_norris(respx_mock):
    id = "123qwe"
    response_data = {
        "categories": [],
        "created_at": "2020-01-05 13:42:26.766831",
        "icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
        "id": id,
        "updated_at": "2020-01-05 13:42:26.766831",
        "url": f"https://api.chucknorris.io/jokes/{id}",
        "value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
    }
    mock_response = respx_mock.get(CHUCK_NORRIS_API_URL + id)
    mock_response.return_value = Response(200, json=response_data)

    chuck_word_counter = ChuckNorrisJokeWordCounter()
    word_count = chuck_word_counter.word_counter("Norris", id)

    assert word_count == 1

Let’s run it:

python main.py
>>> fetching joke with id a4aNCYsKQu-4LKDLkTQLSA...
>>> downloaded joke with content: Chuck Norris once cast a fishing line 
>>> into the Atlantic Ocean and caught 243 fish...then the hook hit the water
>>> in that joke, the word Chuck was found 1 times

The code works, but there is a number of issues with it, that are best visible when reading the tests.

The class ChuckNorrisJokeWordCounter is doing two things: downloading the joke from some HTTP API, and doing the business logic, which is counting the number of times a certain word is used in the joke. Therefore, each time we want to test the business logic, we need to mock the API call. This results in a lot of boilerplate code, and worsens the readability of the tests. You can reduce the amount of code using fixtures, but still, this is not great.

And, what if one day the Chuck Norris API moves from HTTP to say, downloading jokes as txt file from S3? Or being sent by Matrix-style neural link? Then we would need to rewrite the download logic inside the class, modify all the tests, etc. Not good. This can be done better.

Second solution

And here is another way to do it:

main.py

from typing import Protocol
import httpx


ID1 = "a4aNCYsKQu-4LKDLkTQLSA"


class ChuckNorrisJokeClientProtocol(Protocol):
    def get_joke(self, id: str) -> dict: ...


class ChuckNorrisJokeHTTPClient:
    CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"

    def get_joke(self, id: str) -> dict:
        url = self.CHUCK_NORRIS_API_URL + id
        return httpx.get(url=url).json()


class ChuckNorrisJokeWordCounter:
    def __init__(self, client: ChuckNorrisJokeClientProtocol):
        self.client = client

    def word_counter(self, word: str, id: str) -> int:
        print(f"fetching joke with id {id}...")
        joke_data = self.client.get_joke(id=id)
        joke_text = joke_data["value"]
        print(f"downloaded joke with content: {joke_text}")
        return joke_text.count(word)


if __name__ == "__main__":
    chuck_word_counter = ChuckNorrisJokeWordCounter(client=ChuckNorrisJokeHTTPClient())
    word_count = chuck_word_counter.word_counter("Chuck", ID1)
    print(f"in that joke, the word Chuck was found {word_count} times")

test.py

from httpx import Response

from main import ChuckNorrisJokeWordCounter, ChuckNorrisJokeHTTPClient


def test_chuck_norris_http_client(respx_mock):
    id = "123qwe"
    response_data = {
        "categories": [],
        "created_at": "2020-01-05 13:42:26.766831",
        "icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
        "id": id,
        "updated_at": "2020-01-05 13:42:26.766831",
        "url": f"https://api.chucknorris.io/jokes/{id}",
        "value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
    }
    mock_response = respx_mock.get(ChuckNorrisJokeHTTPClient.CHUCK_NORRIS_API_URL + id)
    mock_response.return_value = Response(200, json=response_data)
    http_client = ChuckNorrisJokeHTTPClient()

    response = http_client.get_joke(id=id)

    assert response == response_data
    assert mock_response.called


class ChuckNorrisJokeMockClient:
    def get_joke(self, id) -> dict:
        return {
            "categories": [],
            "created_at": "2020-01-05 13:42:26.766831",
            "icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
            "id": id,
            "updated_at": "2020-01-05 13:42:26.766831",
            "url": f"https://api.chucknorris.io/jokes/{id}",
            "value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
        }


def test_counting_words_chuck(respx_mock):
    id = "123qwe"

    chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
    word_count = chuck_word_counter.word_counter("Chuck", id)

    assert word_count == 1


def test_counting_words_norris(respx_mock):
    id = "123qew"
    chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
    word_count = chuck_word_counter.word_counter("Norris", id)

    assert word_count == 1

A lot has changed, so let’s go through the code slowly.

class ChuckNorrisJokeClientProtocol(Protocol):
    def get_joke(self, id: str) -> dict: ...


class ChuckNorrisJokeHTTPClient:
    CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"

    def get_joke(self, id: str) -> dict:
        url = self.CHUCK_NORRIS_API_URL + id
        return httpx.get(url=url).json()


class ChuckNorrisJokeWordCounter:
    def __init__(self, client: ChuckNorrisJokeClientProtocol):
        self.client = client

There are now two classes, ChuckNorrisJokeWordCounter and ChuckNorrisJokeHTTPClient. The HTTPClient is only responsible for doing the API HTTP calls. The CHUCK_NORRIS_API_URL constant was moved to inside that class, as it is only applicable to it.

The ChuckNorrisJokeWordCounter class accepts a class as an argument, a class which needs to adhere to the ChuckNorrisJokeClientProtocol. This is exactly the Dependency Injection part. The WordCounter class depends on a client that will provide it with the joke, and that dependency is injected in the class’ constructor. The WordCounter does not care from where the joke is fetched, as long as it can fetch it using the public get_joke() method. The WordCounter knows that this method is available because it is in the protocol.

    def word_counter(self, word: str, id: str) -> int:
        print(f"fetching joke with id {id}...")
        joke_data = self.client.get_joke(id=id)
        joke_text = joke_data["value"]
        print(f"downloaded joke with content: {joke_text}")
        return joke_text.count(word)

And now in the business logic part, the WordCounter can just use the client’s get_joke() method by calling self.client.get_joke().

The code could be further improved by defining a Pydantic Model called ChuckNorrisJoke to which the get_joke() output would be serialized, but that is outside of the scope of this post. For this example let’s just assume that we know that the get_joke() method always returns a dict with a key value.

By doing this change we have fixed the code in two ways:

First, it now adheres to the first principle of SOLID: Single responsibility principle. Each class does only one thing.

But wait, there’s more! The code now also adheres to the second principle of SOLID: Open–closed principle.

The code is now “open for extension, but closed for modification”. The WordCounter class word_counter method can be left unmodified, and if at one point we will need to fetch the jokes from another source, we can just write another client, for example ChuckNorrisFTPClient that will expose a get_joke() method downloading jokes from an FTP server.

And now the tests.

The tests are now in two groups. The first group (here there’s only one test for it, but in real life there would of course be a lot more) is responsible for testing the ChuckNorrisJokeHTTPClient itself, and those tests still need to mock out external API calls.

The second group is the cool part:

class ChuckNorrisJokeMockClient:
    def get_joke(self, id) -> dict:
        return {
            "categories": [],
            "created_at": "2020-01-05 13:42:26.766831",
            "icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
            "id": id,
            "updated_at": "2020-01-05 13:42:26.766831",
            "url": f"https://api.chucknorris.io/jokes/{id}",
            "value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
        }


def test_counting_words_chuck(respx_mock):
    id = "123qwe"

    chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
    word_count = chuck_word_counter.word_counter("Chuck", id)

    assert word_count == 1

No need to mock HTTP calls anymore. The WordCounter accepts any Client as long as it exposes a get_joke() method, right? So we can write a mock client that just returns a dictionary. The WordCounter can then take that client, and it will have no idea, nor will it care that we are handling it just a dict saved in a file. And we can have as many mock clients as we want, providing different jokes, or providing broken replies, etc.

The tests are now much cleaner, easier and quicker to write. There is much less underlying logic (Explicit is better than implicit).

Summing up

This concept can be applied whenever there is code that needs to do fetching data from somewhere, and then do business logic on it.

A very good example of it, about which I might write in the future, is doing database calls, for example in Django.

In that case, a ChuckNorrisDBClient could be written that does something like

from .models import ChuckNorrisJoke

class ChuckNorrisJokeDBClient:
	def model_to_dict(self, model_instance) -> dict:
		# some logic to deserialize model instance to a dictionary
		# again, Pydantic would also be great here
		return model_dict

	def get_joke(id: str) -> dict:
		return self.model_to_dict(ChuckNorrisJoke.objects.get(id=id))

And then in the test we would again use a ChuckNorrisJokeMockClient which does not use the DB at all, making the test suite much quicker to run.

And that’s it! My first post on “advanced Python”. I hope you liked it, I would love to receive feedback on it by any means possible, whether a comment below, a mention in the Fediverse, or via email. The links are in the footer.

Now that I wrote this post, I feel like writing more posts like this, so there is a high possibility this will not be the last one :)

Thanks for reading!

If you enjoyed this post, please consider helping me make new projects by supporting me on the following crowdfunding sites: