For large Python projects, maintaining architectural cleanliness is a significant challenge, primarily reflected in the difficulty of maintaining simple and clear dependency relationships between packages and modules.
A large project typically contains hundreds of submodules, each implementing specific functionalities and depending on each other. Without architectural design and development practice constraints, these inter-module dependencies often evolve into a tangled mess, making it difficult to discern any order.
This leads to several problems:
- High cost of architectural understanding: When newcomers join the project, they may have many questions about the architecture. For example, why does the low-level
common.validators
utility module need to reference theConfigVar
model that resides in the high-levelworkloads
module? - Reduced development efficiency: Developers have difficulty determining the location of code when developing new features. "Which module should it be?" different people may have different opinions, it's hard to get consensus.
- Chaotic module responsibilities: Chaotic dependencies essentially mean that module responsibilities are also chaotic, indicating that some modules may be taking on too much, handling abstractions they shouldn't.
If we were to graph dependency relationships, the architecture of a healthy project would appear neatly organized, with all dependencies flowing in one direction and no circular dependencies. Healthy dependencies facilitate a state of "separation of concerns" between modules, each module is more likely to stay "single responsibility".
This article introduces a tool for governing inter-module dependencies: import-linter.
Introduction to import-linter
import-linter is an open source code linter tool developed by seddonym.
To use import-linter for dependency checking, you first need to define some "contracts" in the configuration file. Here is an example configuration file:
# file: .importlinter
[importlinter]
root_packages =
foo_proj
include_external_packages = True
[importlinter:contract:layers-main]
name=the main layers
type=layers
layers =
foo_proj.client
foo_proj.lib
The section [importlinter:contract:layers-main]
defines a "layers" type of contract named "the main layers". A "layers" contract means that higher-level modules can freely import content from lower-level modules, but not vice versa.
"the main layers" defines a hierarchical relationship: the module foo_proj.client
is at a higher level, while foo_proj.lib
is at a lower level.
By running the lint-imports
command, the tool scans all the import statements in the current project, builds a module dependency graph, and checks if the dependencies match all the contracts in the configuration.
Let's suppose the foo_proj/lib.py
file of the project contains the following content:
from foo_proj.client import Client
This would cause the lint-imports command to fail:
$ lint-imports
# ...
Broken contracts
----------------
the main layers
---------------
foo_proj.lib is not allowed to import foo_proj.client:
- foo_proj.lib -> foo_proj.client (l.1)
It won't pass until we remove the import statement.
In addition to the "layers" contract type, import-linter also includes two other contract types:
- Forbidden: Prevents specified modules from importing certain other modules.
- Independence: Marks a list of modules as independent, meaning that they will not import any content from each other.
If the built-in contracts do not meet your needs, you can also write custom contracts. See the official documentation for more details.
Introducing import-linter to your Project
To introduce import-linter into your project, you must first define all the contracts you expect. You can start by asking yourself a few questions:
- Looking at the project from a high level, what are the major layers and how do they relate to each other? Many projects have a layered structure such as
application -> services -> common -> utils
, document those layers as contracts. - For complex submodules, is there a clear layering within them? If you find a layering such as
views -> models -> utils
, document it as a corresponding contract. - Are there any submodules that fulfill "forbidden" or "independence" contracts? If so, make a note of them.
After writing these contracts into the configuration file, run lint-imports
, and you'll most likely be greeted with a barrage of error messages (if there are no errors, congratulations, your project is in good shape, you can stop reading this article and do something else!). These errors indicate which import relationships have violated the contracts you set.
Analyze these error messages one by one, adding the inappropriate import relationships to each contract's ignore_imports
section:
[importlinter:contract:layers-main]
name=the main layers
type=layers
layers =
foo_proj.client
foo_proj.lib
ignore_imports =
# Temporarily ignore imports of the client in the lib module to prevent errors.
foo_proj.lib -> foo_proj.client
Once all error messages have been addressed, the ignore_imports
may contain hundreds of import entries. At this point, running lint-imports
again should produce no errors (a false sense of tidiness).
Now comes the big event: actually fixing those import relationships.
Rome wasn't built in a day, and fixing dependencies should be done one at a time. Start by removing a single entry from ignore_imports
, then run lint-imports
, analyze the error message and try to find the most reasonable way to fix it.
Repeat this process until you've fixed all the dependency problems in the project.
Tip: As you trim down the
ignore_imports
configuration, you'll find that some relationships are harder to fix than others. This is normal. Fixing dependency problems is often a long process that requires continuous effort from the entire team.
Common methods for fixing dependency issues
The following are some common methods for fixing dependency issues.
For the sake of explanation, let's assume that in all of the following scenarios, a project has defined a "layers" type of contract, and the lower-level module violates the contract by depending on a higher-level module.
1. Merging and splitting modules
The most direct way to adjust dependencies is to merge modules.
Suppose there is a lower-level module called clusters
that improperly imports some code from a sub-module cluster_utils
of the higher-level module resources
. Since the code of cluster_utils
is somewhat related to clusters
, you can actually move it directly into the clusters.utils
sub-module, thereby eliminating this dependency.
As shown below:
# Layers:resources -> clusters
# Before
resources -> clusters
clusters -> resources.cluster_utils # This violates the contract!
# After
resources -> clusters
clusters -> clusters.utils
If the dependent code is not closely related to all modules, you can also choose to split it into a new module.
For example, a lower-level module users
depends on the code related to sending SMS from the higher-level module marketing
, which violates the contract. You could split the code from marketing
and place it in a new lower-level module utils.messaging
.
# Layers:marketing -> users
# Before
marketing -> users
users -> marketing # This violates the contract!
# Layers:marketing -> users -> utils
# After
marketing -> users
users -> utils.messaging
This will make the dependency relationship healthy again.
2. Dependency Injection
Dependency injection is a common technique for decoupling dependencies.
For example, a project has a layered contract: marketing -> users
, but the users
module directly imports the SmsSender
class from the marketing
module, violating the contract.
# file: users.py
from marketing import SmsSender # This violates the contract!
class User:
"""A simple user type."""
def __init__(self):
self.sms_sender = SmsSender()
def add_notification(self, message: str, send_sms: bool):
"""Adds a new notification to the user."""
# ...
if send_sms:
self.sms_sender.send(message)
To fix this problem through "dependency injection", we can remove the dependency on SmsSender
from the code, and instead require the caller to actively pass a "code notifier (sms_sender
)" object when instantiating a User
.
# file: users.py
class User:
"""A simple user type.
:param sms_sender: The notifier object for sending SMS notifications.
"""
def __init__(self, sms_sender):
self.sms_sender = sms_sender
After this change, the dependency of the User
on the "SMS notifier" is weakened and no longer violates the layers contract.
Adding type annotations
However, the above dependency injection solution is not perfect. If you want to add a type annotation to the sms_sender
parameter, you'll quickly find yourself back at the beginning: you can't write def __init__(self, sms_sender: SmsSender)
, as that would require bringing back the deleted import
statement.
# file: users.py
from typing import TYPE_CHECKING
if TYPE_CHECKING:
# To add annotation, the high-level module's SmsSender is back again, this violates the contract!!
from marketing import SmsSender
Even if you put the import
statement in a TYPE_CHECKING
branch as shown above, import-linter will still treat it as a regular import (Note: This behavior may change in the future, see Add support for detecting whether an import is only made during type checking · Issue #64) and consider it a violation of the contract.
To make type annotations work properly, we need to introduce a new abstraction in the users
module: the SmsSenderProtocol
protocol, which replaces the concrete SmsSender
type.
from typing import Protocol
class SmsSenderProtocol(Protocol):
def send(message: str):
...
class User:
def __init__(self, sms_sender: SmsSenderProtocol):
self.sms_sender = sms_sender
This solves the type annotation problem.
Tip: Introducing a Protocol to decouple dependency relationships is actually an application of the Dependency Inversion Principle. The Dependency Inversion Principle states that high-level modules should not depend on low-level modules; both should depend on abstractions.
In addition to adding a protocol, it's also possible to do this by setting exclude_type_checking_imports = true
in the import-linter configuration file. See the related issue.
3. Use simpler dependency types
In the code below, the low-level module monitoring
depends on the ProcService
type from the high-level module processes
:
# file:monitoring.py
from processes import ProcService # This violates the contract!
def build_monitor_config(service: ProcService):
"""Build monitoring related configuration.
:param service: a process service object
"""
# ...
# Use `service.port` and `service.host` to build the data
# ...
After analysis, it was determined that the build_monitor_config
function actually uses only two fields of the service
object: host
and port
, and does not rely on any other attributes or methods. Therefore, we can completely adjust the function signature to accept only the two necessary simple parameters:
# file:monitoring.py
def build_monitor_config(host: str, port: int):
"""Build monitoring related configuration.
:param host: The hostname.
:param port: The port.
"""
# ...
The invoker's code must also be modified accordingly:
# before
build_monitor_config(svc)
# after
build_monitor_config(svc.host, svc.port)
By simplifying the types of parameters the function receives, we eliminate the unreasonable dependencies between modules.
4. Delaying function implementation
Python is a very dynamic programming language, and we can take advantage of this to delay providing certain function implementations, thereby reversing the dependency relationship between modules.
Suppose the lower-level module users
currently violates the contract by directly depending on the send_sms
function of the higher-level module marketing
. To reverse this dependency, the first step is to define a global variable in the lower-level module users
to store the function and provide an entry for mutating.
The code is as follows:
# file: users.py
SendMsgFunc = Callable[[str], None]
# A global variable, it stores the current implementation of the "SMS sender" function
_send_sms_func: Optional[SendMsgFunc] = None
def set_send_sms_func(func: SendMsgFunc):
global _send_sms_func
_send_sms_func = func
When calling send_sms()
, we need to check if a concrete implementation has already been provided:
# file: users.py
def send_sms(message: str):
"""Send SMS notification."""
if not _send_sms_func:
raise RuntimeError("Must set the send_sms function")
_send_sms_func(message)
After the changes above, users
no longer needs to import the concrete implementation of the "SMS sender" from marketing
. Instead, the higher-level module marketing
can perform a "backward operation" by calling set_send_sms_func
to inject the implementation into the lower-level module users
:
# file: marketing.py
from user import set_send_sms_func
def _send_msg(message: str):
"""It provides the implementation of the SMS sending function."""
...
set_send_sms_func(_send_msg)
This completes the inversion of the dependency relationship.
Variant: a simple plugin mechanism
In addition to storing the specific implementation of a function in a global variable, you can make the API more complex by implementing a "plugin-like" type registration and invocation mechanism to meet a wider range of requirements.
For example, in the lower-level module, implement the abstract definition of a "plugin" and a registry to store specific plugins:
# file: users.py
from typing import Protocol
class SmsSenderPlugin(Protocol):
"""A plugin type which should be implemented by other modules."""
def __call__(self, message: str):
...
class SmsSenderPluginCenter:
"""Manages all "SMS sender" plugins."""
@classmethod
def register(cls, name: str, plugin: SmsSenderPlugin):
"""Register a plugin."""
# ...
@classmethod
def call(cls, name: str):
"""Call a plugin by name."""
# ...
In other modules, call SmsSenderPluginCenter.register
to register a specific plugin implementation:
# file: marketing.py
from user import SmsSenderPluginCenter
SmsSenderPluginCenter.register('default', DefaultSender())
SmsSenderPluginCenter.register('free', FreeSmsSender())
Like using a global variable, the plugin mechanism is also a concrete use of the Dependency Inversion Principle. The code above only contains the simplest principle demonstration. The actual code implementation would be more complex and is not covered in this article.
5. Configuration driven
Let's assume that the lower-level module users
improperly depends on a utility function send_sms
from the higher-level module marketing
. Besides the methods introduced above, we can also choose to define the import path of the utility function as a configuration item in the configuration file.
# file:settings.py
# The import path for SMS sneding function
SEND_SMS_FUNC = 'marketing.send_sms'
The users
module no longer directly references the marketing
module. Instead, the send_sms
function is used by dynamically importing a string that stores the function location specified in the configuration.
# file: users.py
from settings import SEND_SMS_FUNC
def send_sms(message: str):
func = import_string(SEND_SMS_FUNC)
return func(message)
This approach also breaks the dependency.
Hint: For specific implementation details of the
import_string
function, see the Django framework.
6. Replace function calls with event-driven approaches
For modules with inherently weaker coupling, you can opt for an event-driven approach instead of direct function calls.
For example, whenever the low-level networking
module updates custom domain data, it must call the deploy_networking
function from the high-level applications
module to update the corresponding resources, which violates the layers contract.
# file: networking/main.py
from applications.utils import deploy_networking # This violates the contract!
deploy_networking(app)
This problem is well suited to be addressed with an event-driven solution (the following code is based on the Signal mechanism of the Django framework).
To adopt an event-driven architecture, the first step is to send events. We need to modify the networking
module, removing the function call code and instead sending a signal of type custom_domain_updated
:
# file: networking/main.py
from networking.signals import custom_domain_updated
custom_domain_updated.send(sender=app)
The second step is to add event listener code within the applications
module to complete the resource update process:
# file: applications/main.py
from applications.utils import deploy_networking
from networking.signals import custom_domain_updated
@receiver(custom_domain_updated)
def on_custom_domain_updated(sender, **kwargs):
"""Triggers resource update operations"""
deploy_networking(sender)
In this way, the modules are decoupled.
Summary
Import-linter is an extremely useful tool for dependency governance. It allows us to explicitly articulate the implicit complex dependencies within a project through configuration files by offering various types of "contracts." Combined with engineering practices like CI, it can effectively help us maintain the cleanliness of our project architecture.
If you are planning to introduce import-linter into your project, the most important task is to fix any existing unreasonable dependencies. Common practices include merging and splitting, dependency injection, event-driven approaches, and so on. Although there are a variety of techniques, the key principle can be summarized in one sentence: Place each line of code in the most appropriate module, and if necessary, introduce a new abstraction into the current module to take advantage of its power to invert dependencies between modules.
May your project architecture stay clean!