Quickly master the Hook function in Python

Python Tutorial introduces the Hook hook function in Python

Quickly master the Hook function in Python

1. What is Hook

I often hear the concept of hook function. Recently I was looking at the target detection open source framework mmdetection, and there are also a lot of Hook programming methods in it. So what exactly is a hook? What is the function of hook?

  • what is hook? Hook, as the name suggests, can be understood as a hook, which is used to hang something when necessary. The specific explanation is: the hook function is to hook our own implemented hook function to the target mount point at a certain moment.

  • The role of hook function For example, the concept of hook is very common in Windows desktop software development, especially the mechanism of various event triggers; for example, in C MFC program, the mouse needs to be monitored When the left button is pressed, MFC provides an onLeftKeyDown hook function. Obviously, the MFC framework does not implement the specific operation of onLeftKeyDown for us, but only provides us with a hook. When we need to process it, we only need to rewrite this function and mount the operation we need in this hook. If we do not Mounting, the MFC event triggering mechanism performs empty operations.

It can be seen from the above that

  • The hook function is a predefined function in the program. This function is in the original program flow (exposing a hook Come out)

  • We need to implement certain specific details in the function block defined by the hook in the process. We need to hook or register our implementation into the hook. , making the hook function available to the target

  • Hook is a programming mechanism and has no direct relationship with the specific language

  • If you start from the design pattern It seems that the hook mode is an extension of the template method

  • The hook will only be used when it is registered, so in the original program process, when there is no registration or mounting, the execution is Empty (that is, no operation is performed)

This article uses python to explain the implementation of hooks, and shows the application cases of hooks in open source projects. The function of hook function is similar to another name we often hear: callback function, and can be understood according to the same model.

Quickly master the Hook function in Python

2. Hook implementation example

As far as I know, the hook function is most commonly used in some kind of process processing. This process often has many steps. Hook functions are often mounted in these steps to provide flexibility for adding additional operations.

The following is a simple example. The purpose of this example is to implement a universal function of inserting content into the queue. There are 2 process steps

  • The data needs to be filtered before inserting into the queueinput_filter_fn

  • Insert into the queueinsert_queue

class ContentStash(object):
    content stash for online operation
    pipeline is
    1. input_filter: filter some contents, no use to user
    2. insert_queue(redis or other broker): insert useful content to queue

    def __init__(self):
        self.input_filter_fn = None
        self.broker = []

    def register_input_filter_hook(self, input_filter_fn):
        register input filter function, parameter is content dict
            input_filter_fn: input filter function


        self.input_filter_fn = input_filter_fn

    def insert_queue(self, content):
        insert content to queue
            content: dict



    def input_pipeline(self, content, use=False):
        pipeline of input for content stash
            use: is use, defaul False
            content: dict


        if not use:

        # input filter
        if self.input_filter_fn:
            _filter = self.input_filter_fn(content)
        # insert to queue
        if not _filter:

# test
## 实现一个你所需要的钩子实现:比如如果content 包含time就过滤掉,否则插入队列
def input_filter_hook(content):
    test input filter hook
        content: dict

    Returns: None or content

    if content.get('time') is None:
        return content

# 原有程序
content = {'filename': 'test.jpg', 'b64_file': "#test", 'data': {"result": "cat", "probility": 0.9}}
content_stash = ContentStash('audit', work_dir='')

# 挂上钩子函数, 可以有各种不同钩子函数的实现,但是要主要函数输入输出必须保持原有程序中一致,比如这里是content

# 执行流程

3. Application of hook in open source framework

3.1 keras

In the deep learning training process, the hook function Reflected vividly.

A training process (excluding data preparation) will poll the training set multiple times, each time is called an epoch, and each epoch is divided into multiple batches for training. The process is broken down into:

  • Start training

  • Before training an epoch

  • Train one Before batch

  • After training a batch

  • After training an epoch

  • Evaluate the validation set

  • End training

These steps are interspersed in the process of training a batch data. These can be understood as hook functions. We may need to Some customized things are implemented in these hook functions. For example, after training an epoch, we need to save the trained model, and when end the training , use the best model to execute the test set effects, etc.

The hook function is implemented in keras through various callback functions. Put a parent class of callback here. When customizing, you only need to inherit this parent class and implement the hooks you are concerned about.

class Callback(object):
  """Abstract base class used to build new callbacks.

      params: Dict. Training parameters
          (eg. verbosity, batch size, number of epochs...).
      model: Instance of `keras.models.Model`.
          Reference of the model being trained.

  The `logs` dictionary that callback methods
  take as argument will contain keys for quantities relevant to
  the current batch or epoch (see method-specific docstrings).

  def __init__(self):
    self.validation_data = None  # pylint: disable=g-missing-from-attributes
    self.model = None
    # Whether this Callback should only run on the chief worker in a
    # Multi-Worker setting.
    # TODO(omalleyt): Make this attr public once solution is stable.
    self._chief_worker_only = None
    self._supports_tf_logs = False

  def set_params(self, params):
    self.params = params

  def set_model(self, model):
    self.model = model

  def on_batch_begin(self, batch, logs=None):
    """A backwards compatibility alias for `on_train_batch_begin`."""

  def on_batch_end(self, batch, logs=None):
    """A backwards compatibility alias for `on_train_batch_end`."""

  def on_epoch_begin(self, epoch, logs=None):
    """Called at the start of an epoch.

    Subclasses should override for any actions to run. This function should only
    be called during TRAIN mode.

        epoch: Integer, index of epoch.
        logs: Dict. Currently no data is passed to this argument for this method
          but that may change in the future.

  def on_epoch_end(self, epoch, logs=None):
    """Called at the end of an epoch.

    Subclasses should override for any actions to run. This function should only
    be called during TRAIN mode.

        epoch: Integer, index of epoch.
        logs: Dict, metric results for this training epoch, and for the
          validation epoch if validation is performed. Validation result keys
          are prefixed with `val_`.

  def on_train_batch_begin(self, batch, logs=None):
    """Called at the beginning of a training batch in `fit` methods.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict, contains the return value of `model.train_step`. Typically,
          the values of the `Model`'s metrics are returned.  Example:
          `{'loss': 0.2, 'accuracy': 0.7}`.
    # For backwards compatibility.
    self.on_batch_begin(batch, logs=logs)

  def on_train_batch_end(self, batch, logs=None):
    """Called at the end of a training batch in `fit` methods.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict. Aggregated metric results up until this batch.
    # For backwards compatibility.
    self.on_batch_end(batch, logs=logs)

  def on_test_batch_begin(self, batch, logs=None):
    """Called at the beginning of a batch in `evaluate` methods.

    Also called at the beginning of a validation batch in the `fit`
    methods, if validation data is provided.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict, contains the return value of `model.test_step`. Typically,
          the values of the `Model`'s metrics are returned.  Example:
          `{'loss': 0.2, 'accuracy': 0.7}`.

  def on_test_batch_end(self, batch, logs=None):
    """Called at the end of a batch in `evaluate` methods.

    Also called at the end of a validation batch in the `fit`
    methods, if validation data is provided.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict. Aggregated metric results up until this batch.

  def on_predict_batch_begin(self, batch, logs=None):
    """Called at the beginning of a batch in `predict` methods.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict, contains the return value of `model.predict_step`,
          it typically returns a dict with a key 'outputs' containing
          the model's outputs.

  def on_predict_batch_end(self, batch, logs=None):
    """Called at the end of a batch in `predict` methods.

    Subclasses should override for any actions to run.

        batch: Integer, index of batch within the current epoch.
        logs: Dict. Aggregated metric results up until this batch.

  def on_train_begin(self, logs=None):
    """Called at the beginning of training.

    Subclasses should override for any actions to run.

        logs: Dict. Currently no data is passed to this argument for this method
          but that may change in the future.

  def on_train_end(self, logs=None):
    """Called at the end of training.

    Subclasses should override for any actions to run.

        logs: Dict. Currently the output of the last call to `on_epoch_end()`
          is passed to this argument for this method but that may change in
          the future.

  def on_test_begin(self, logs=None):
    """Called at the beginning of evaluation or validation.

    Subclasses should override for any actions to run.

        logs: Dict. Currently no data is passed to this argument for this method
          but that may change in the future.

  def on_test_end(self, logs=None):
    """Called at the end of evaluation or validation.

    Subclasses should override for any actions to run.

        logs: Dict. Currently the output of the last call to
          `on_test_batch_end()` is passed to this argument for this method
          but that may change in the future.

  def on_predict_begin(self, logs=None):
    """Called at the beginning of prediction.

    Subclasses should override for any actions to run.

        logs: Dict. Currently no data is passed to this argument for this method
          but that may change in the future.

  def on_predict_end(self, logs=None):
    """Called at the end of prediction.

    Subclasses should override for any actions to run.

        logs: Dict. Currently no data is passed to this argument for this method
          but that may change in the future.

  def _implements_train_batch_hooks(self):
    """Determines if this Callback should be called for each train batch."""
    return (not generic_utils.is_default(self.on_batch_begin) or
            not generic_utils.is_default(self.on_batch_end) or
            not generic_utils.is_default(self.on_train_batch_begin) or
            not generic_utils.is_default(self.on_train_batch_end))

The original programs of these hooks are in the model training process

keras source code location: tensorflow\python\keras\engine\training.py

# The ## part is excerpted as follows (## I am hook):

# Container that configures and calls `tf.keras.Callback`s.
      if not isinstance(callbacks, callbacks_module.CallbackList):
        callbacks = callbacks_module.CallbackList(
            add_progbar=verbose != 0,

      ## I am hook
      training_logs = None
      # Handle fault-tolerance for multi-worker.
      # TODO(omalleyt): Fix the ordering issues that mean this has to
      # happen after `callbacks.on_train_begin`.
      data_handler._initial_epoch = (  # pylint: disable=protected-access
      for epoch, iterator in data_handler.enumerate_epochs():
        with data_handler.catch_stop_iteration():
          for step in data_handler.steps():
            with trace.Trace(
              ## I am hook
              tmp_logs = train_function(iterator)
              if data_handler.should_sync:
              logs = tmp_logs  # No error, now safe to assign to logs.
              end_step = step + data_handler.step_increment
              callbacks.on_train_batch_end(end_step, logs)
        epoch_logs = copy.copy(logs)

        # Run validation.

        ## I am hook
        callbacks.on_epoch_end(epoch, epoch_logs)

3.2 mmdetection

mmdetection is an open source framework for target detection that integrates many different target detection deep learning algorithms (pytorch version ), such as faster-rcnn, fpn, retianet, etc. Hooks are also used extensively to expose specific parts of the application implementation process.

For details, please see



def train_detector(model,
    logger = get_root_logger(cfg.log_level)

    # prepare data loaders

    # put model on gpus

    # build runner
    optimizer = build_optimizer(model, cfg.optimizer)
    runner = EpochBasedRunner(
    # an ugly workaround to make .log and .log.json filenames the same
    runner.timestamp = timestamp

    # fp16 setting
    # register hooks
    runner.register_training_hooks(cfg.lr_config, optimizer_config,
                                   cfg.checkpoint_config, cfg.log_config,
                                   cfg.get('momentum_config', None))
    if distributed:

    # register eval hooks
    if validate:
        # Support batch_size > 1 in validation
        eval_cfg = cfg.get('evaluation', {})
        eval_hook = DistEvalHook if distributed else EvalHook
        runner.register_hook(eval_hook(val_dataloader, **eval_cfg))

    # user-defined hooks
    if cfg.get('custom_hooks', None):
        custom_hooks = cfg.custom_hooks
        assert isinstance(custom_hooks, list), \
            f'custom_hooks expect list type, but got {type(custom_hooks)}'
        for hook_cfg in cfg.custom_hooks:
            assert isinstance(hook_cfg, dict), \
                'Each item in custom_hooks expects dict type, but got ' \
            hook_cfg = hook_cfg.copy()
            priority = hook_cfg.pop('priority', 'NORMAL')
            hook = build_from_cfg(hook_cfg, HOOKS)
            runner.register_hook(hook, priority=priority)

4. 总结


  • hook函数是流程中预定义好的一个步骤,没有实现

  • 挂载或者注册时, 流程执行就会执行这个钩子函数

  • 回调函数和hook函数功能上是一致的

  • hook设计方式带来灵活性,如果流程中有一个步骤,你想让调用方来实现,你可以用hook函数


