Decentralized Machine Learning

This documentation explains from a data science point of view the scope of each of the roles defined in a DecentralML pallet.

Each role participate in the machine learning pipeline in a different capacity. Refer to this development cycle for a machine learning system:

Model creator

The model creator is responsible to start the development of the machine learning system. The main tasks are:

  • Definition of the data required for the recognition task at hand. These data would need to be annotated accurately (Data annotation)

  • Definition of the model structure. The model structure depends on the task and can evolve over time according to the needs and the evaluation results.

  • Definition of the training procedures. The training procedure is what creates the actual model from the previously defined structure. Using the annotated data, the

  • Definition of the evaluation methods

The model creator can also define what are the specific tasks required in order to complete each aspect of the development of the machine learning system. In the specific example provided for DecentralML (https://github.com/livetreetech/DecentralML), these tasks are uploaded by the model creator as assets for each role.

A person accepting a task for each goal would perform its task in an offline manner.

Model engineer

The model engineer is a delegated role by the model creator to improve the structure of the model. She can redefine the model entirely, raising new training tasks and new model evaluation.

This role is very similar to the model creator, with the main difference being that her goal are more specific on improving the model rather than defining its entire application.

Data annotator

A data annotator impact the first step of the machine learning system development by providing useful annotation to the data that will be used for training. Some example of annotations can be:

  • Bounding box for object detection. In this case, a data annotator is provided with a sample of an object or its description and a series of images. The goal of this annotation would be to specify the coordinates of the bounding box that in the provided images contain the sample object. Multiple objects can be annotated at once. In the following example, the bounding box are for a dog, a bicycle and a car.

  • Song recognition in noisy audio. If the goal of the machine learning system is song recognition, a possible annotation task would be to detect when a specific song is played within a noisier audio or video. The annotation in this case would not only be the coordinates of the song (i.e. title, author, etc.) but more importantly the time of start and end of the interval in which it appears in the audio/video that is being annotated.

  • Improving speech-to-text. In this case, the annotators would be presented with a video or a podcast and an automatically produced text from a text-to-speech system. The goal of this annotation would be to correct translation mistakes due to accent, bad audio conditions, specific words that are not part of common dictionaries (i.e. brand names).

The annotation task can be really variegated. The general idea is to provide the model creators, engineers and contributor with the best annotated data for the machine learning model.

Model contributor

In the DecentralML structure, a model contributor is a person that given a model structure and annotated data, offers computational power to train the model. Each model contributor can be tasked with the training using a specific subset of data. The contribution of each training would then be joined in a federated manner by the model creator.

Another possibility is that the model contributors are tasked with the training of a specific part of the machine learning system. This is especially useful when the whole system is composed by multiple parts that aims at a specific recognition in order to then be ensembled for the final recognition (i.e. brand recognition using audio and video is an example in which the sound and the video can be processed separately and joined after the invididual recognition).

Last updated