Python: Dependency injection for cleaner I/O handling
Apart from these few posts about simple things in MicroPython, this is the first post on this blog in which I talk about programming in Python in a slightly more advanced way.
My official job title is “Senior Python Developer”, and I feel that I do have some experience in working with Python, but I’ve never before felt that I had something interesting enough to share on this blog in that area. And of course there is the whole matter of impostor syndrome, making me doubt whether what I know and use is actually the best thing that should be shared with others. But there can always be a first time, right?
In this blog post I want to share a way of writing code that I was taught by a super cool senior developer. She taught me a lot of things, mostly in terms of ideas of how software should be written. I think this is one of the things that differentiates junior developers from senior developers. Juniors think in code, like “I’ll put a for loop here” or, “I should do a class here”, whereas seniors know and think in ideas, “our software needs to be better separation of concerns”, or “we are breaking the single responsibility principle in this place”. Learning to think in abstract ideas and knowing ways to implement it is imho an important part of one’s programming journey.
Fans of Clean Architecture should recognize some concepts from it in this blog post. CA is something I have barely scraped the surface of, but I am already a big fan of it and I try to implement it as best I can in my software. Maybe one day, when I get better at it, I will write another blog post focusing on it.
OK, the intro is done, let’s go to the main part.
Setting the stage
Let’s imagine a scenario that you are at work, and your boss comes to you and says
You need to create a piece of software that will download a joke about Chuck Norris, and count the number of times a specific word is used in that joke. That functionality is absolutely crucial for our business!
Weird but ok. The actual usefulness of this code is irrelevant, what we should focus on is how it works. What is important is that the code needs to have two distinct functionalities: business logic (counting the number of occurrences of a word) and input/output handling (external API calls).
You are a good developer, so you sit down to work and create a class that does what it should, and add tests. You should always have tests for your code.
The code is available at codeberg.org
First solution
main.py
import httpx
CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"
ID1 = "a4aNCYsKQu-4LKDLkTQLSA"
class ChuckNorrisJokeWordCounter:
def _download_joke(self, id: str) -> dict:
url = CHUCK_NORRIS_API_URL + id
return httpx.get(url=url).json()
def word_counter(self, word: str, id: str) -> int:
print(f"fetching joke with id {id}...")
joke_data = self._download_joke(id=id)
joke_text = joke_data["value"]
print(f"downloaded joke with content: {joke_text}")
return joke_text.count(word)
if __name__ == "__main__":
chuck_word_counter = ChuckNorrisJokeWordCounter()
word_count = chuck_word_counter.word_counter("Chuck", ID1)
print(f"in that joke, the word Chuck was found {word_count} times")
test.py
from httpx import Response
from main import CHUCK_NORRIS_API_URL, ChuckNorrisJokeWordCounter
def test_counting_words_chuck(respx_mock):
id = "123qwe"
response_data = {
"categories": [],
"created_at": "2020-01-05 13:42:26.766831",
"icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
"id": id,
"updated_at": "2020-01-05 13:42:26.766831",
"url": f"https://api.chucknorris.io/jokes/{id}",
"value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
}
mock_response = respx_mock.get(CHUCK_NORRIS_API_URL + id)
mock_response.return_value = Response(200, json=response_data)
chuck_word_counter = ChuckNorrisJokeWordCounter()
word_count = chuck_word_counter.word_counter("Chuck", id)
assert word_count == 1
def test_counting_words_norris(respx_mock):
id = "123qwe"
response_data = {
"categories": [],
"created_at": "2020-01-05 13:42:26.766831",
"icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
"id": id,
"updated_at": "2020-01-05 13:42:26.766831",
"url": f"https://api.chucknorris.io/jokes/{id}",
"value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
}
mock_response = respx_mock.get(CHUCK_NORRIS_API_URL + id)
mock_response.return_value = Response(200, json=response_data)
chuck_word_counter = ChuckNorrisJokeWordCounter()
word_count = chuck_word_counter.word_counter("Norris", id)
assert word_count == 1
Let’s run it:
python main.py
>>> fetching joke with id a4aNCYsKQu-4LKDLkTQLSA...
>>> downloaded joke with content: Chuck Norris once cast a fishing line
>>> into the Atlantic Ocean and caught 243 fish...then the hook hit the water
>>> in that joke, the word Chuck was found 1 times
The code works, but there is a number of issues with it, that are best visible when reading the tests.
The class ChuckNorrisJokeWordCounter
is doing two things: downloading the joke from some HTTP API, and doing the business logic, which is counting the number of times a certain word is used in the joke. Therefore, each time we want to test the business logic, we need to mock the API call. This results in a lot of boilerplate code, and worsens the readability of the tests. You can reduce the amount of code using fixtures, but still, this is not great.
And, what if one day the Chuck Norris API moves from HTTP to say, downloading jokes as txt file from S3? Or being sent by Matrix-style neural link? Then we would need to rewrite the download logic inside the class, modify all the tests, etc. Not good. This can be done better.
Second solution
And here is another way to do it:
main.py
from typing import Protocol
import httpx
ID1 = "a4aNCYsKQu-4LKDLkTQLSA"
class ChuckNorrisJokeClientProtocol(Protocol):
def get_joke(self, id: str) -> dict: ...
class ChuckNorrisJokeHTTPClient:
CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"
def get_joke(self, id: str) -> dict:
url = self.CHUCK_NORRIS_API_URL + id
return httpx.get(url=url).json()
class ChuckNorrisJokeWordCounter:
def __init__(self, client: ChuckNorrisJokeClientProtocol):
self.client = client
def word_counter(self, word: str, id: str) -> int:
print(f"fetching joke with id {id}...")
joke_data = self.client.get_joke(id=id)
joke_text = joke_data["value"]
print(f"downloaded joke with content: {joke_text}")
return joke_text.count(word)
if __name__ == "__main__":
chuck_word_counter = ChuckNorrisJokeWordCounter(client=ChuckNorrisJokeHTTPClient())
word_count = chuck_word_counter.word_counter("Chuck", ID1)
print(f"in that joke, the word Chuck was found {word_count} times")
test.py
from httpx import Response
from main import ChuckNorrisJokeWordCounter, ChuckNorrisJokeHTTPClient
def test_chuck_norris_http_client(respx_mock):
id = "123qwe"
response_data = {
"categories": [],
"created_at": "2020-01-05 13:42:26.766831",
"icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
"id": id,
"updated_at": "2020-01-05 13:42:26.766831",
"url": f"https://api.chucknorris.io/jokes/{id}",
"value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
}
mock_response = respx_mock.get(ChuckNorrisJokeHTTPClient.CHUCK_NORRIS_API_URL + id)
mock_response.return_value = Response(200, json=response_data)
http_client = ChuckNorrisJokeHTTPClient()
response = http_client.get_joke(id=id)
assert response == response_data
assert mock_response.called
class ChuckNorrisJokeMockClient:
def get_joke(self, id) -> dict:
return {
"categories": [],
"created_at": "2020-01-05 13:42:26.766831",
"icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
"id": id,
"updated_at": "2020-01-05 13:42:26.766831",
"url": f"https://api.chucknorris.io/jokes/{id}",
"value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
}
def test_counting_words_chuck(respx_mock):
id = "123qwe"
chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
word_count = chuck_word_counter.word_counter("Chuck", id)
assert word_count == 1
def test_counting_words_norris(respx_mock):
id = "123qew"
chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
word_count = chuck_word_counter.word_counter("Norris", id)
assert word_count == 1
A lot has changed, so let’s go through the code slowly.
class ChuckNorrisJokeClientProtocol(Protocol):
def get_joke(self, id: str) -> dict: ...
class ChuckNorrisJokeHTTPClient:
CHUCK_NORRIS_API_URL = "https://api.chucknorris.io/jokes/"
def get_joke(self, id: str) -> dict:
url = self.CHUCK_NORRIS_API_URL + id
return httpx.get(url=url).json()
class ChuckNorrisJokeWordCounter:
def __init__(self, client: ChuckNorrisJokeClientProtocol):
self.client = client
There are now two classes, ChuckNorrisJokeWordCounter
and ChuckNorrisJokeHTTPClient
. The HTTPClient is only responsible for doing the API HTTP calls. The CHUCK_NORRIS_API_URL
constant was moved to inside that class, as it is only applicable to it.
The ChuckNorrisJokeWordCounter
class accepts a class as an argument, a class which needs to adhere to the ChuckNorrisJokeClientProtocol
. This is exactly the Dependency Injection part. The WordCounter class depends on a client that will provide it with the joke, and that dependency is injected in the class’ constructor. The WordCounter does not care from where the joke is fetched, as long as it can fetch it using the public get_joke()
method. The WordCounter knows that this method is available because it is in the protocol.
def word_counter(self, word: str, id: str) -> int:
print(f"fetching joke with id {id}...")
joke_data = self.client.get_joke(id=id)
joke_text = joke_data["value"]
print(f"downloaded joke with content: {joke_text}")
return joke_text.count(word)
And now in the business logic part, the WordCounter can just use the client’s get_joke()
method by calling self.client.get_joke()
.
The code could be further improved by defining a Pydantic Model called ChuckNorrisJoke
to which the get_joke
() output would be serialized, but that is outside of the scope of this post. For this example let’s just assume that we know that the get_joke()
method always returns a dict with a key value
.
By doing this change we have fixed the code in two ways:
First, it now adheres to the first principle of SOLID: Single responsibility principle. Each class does only one thing.
But wait, there’s more! The code now also adheres to the second principle of SOLID: Open–closed principle.
The code is now “open for extension, but closed for modification”. The WordCounter class word_counter
method can be left unmodified, and if at one point we will need to fetch the jokes from another source, we can just write another client, for example ChuckNorrisFTPClient
that will expose a get_joke()
method downloading jokes from an FTP server.
And now the tests.
The tests are now in two groups. The first group (here there’s only one test for it, but in real life there would of course be a lot more) is responsible for testing the ChuckNorrisJokeHTTPClient
itself, and those tests still need to mock out external API calls.
The second group is the cool part:
class ChuckNorrisJokeMockClient:
def get_joke(self, id) -> dict:
return {
"categories": [],
"created_at": "2020-01-05 13:42:26.766831",
"icon_url": "https://api.chucknorris.io/img/avatar/chuck-norris.png",
"id": id,
"updated_at": "2020-01-05 13:42:26.766831",
"url": f"https://api.chucknorris.io/jokes/{id}",
"value": "Chuck Norris once cast a fishing line into the Atlantic Ocean and caught 243 fish...then the hook hit the water",
}
def test_counting_words_chuck(respx_mock):
id = "123qwe"
chuck_word_counter = ChuckNorrisJokeWordCounter(ChuckNorrisJokeMockClient())
word_count = chuck_word_counter.word_counter("Chuck", id)
assert word_count == 1
No need to mock HTTP calls anymore. The WordCounter accepts any Client as long as it exposes a get_joke()
method, right? So we can write a mock client that just returns a dictionary. The WordCounter can then take that client, and it will have no idea, nor will it care that we are handling it just a dict saved in a file. And we can have as many mock clients as we want, providing different jokes, or providing broken replies, etc.
The tests are now much cleaner, easier and quicker to write. There is much less underlying logic (Explicit is better than implicit).
Summing up
This concept can be applied whenever there is code that needs to do fetching data from somewhere, and then do business logic on it.
A very good example of it, about which I might write in the future, is doing database calls, for example in Django.
In that case, a ChuckNorrisDBClient
could be written that does something like
from .models import ChuckNorrisJoke
class ChuckNorrisJokeDBClient:
def model_to_dict(self, model_instance) -> dict:
# some logic to deserialize model instance to a dictionary
# again, Pydantic would also be great here
return model_dict
def get_joke(id: str) -> dict:
return self.model_to_dict(ChuckNorrisJoke.objects.get(id=id))
And then in the test we would again use a ChuckNorrisJokeMockClient
which does not use the DB at all, making the test suite much quicker to run.
And that’s it! My first post on “advanced Python”. I hope you liked it, I would love to receive feedback on it by any means possible, whether a comment below, a mention in the Fediverse, or via email. The links are in the footer.
Now that I wrote this post, I feel like writing more posts like this, so there is a high possibility this will not be the last one :)
Thanks for reading!
If you enjoyed this post, please consider helping me make new projects by supporting me on the following crowdfunding sites: