Federated Learning Example
This page describes a simple example of federated learning with a local storage as sharing method for contributors models. In the DecentralML federated learning, the model creator is in charge of creating the scripts for all the steps for the contributors. These scripts must be provided as assets of each model contributor's task.
Here, we present an overview of each step required to complete a federated learning procedure using a recognition model for MNIST dataset (https://www.tensorflow.org/datasets/catalog/mnist).
Stricktly speaking, only the model creator and the model contributor participate in the federated learning process. The model engineer can also participate, but from the federated learning point of view, its role would become similar to the one of the model creator, with large freedom to change the model structure, re-train it and evaluate it. In this case, the model engineer can also create new tasks for the model contributors to re-train the new model.
Finally, the data annotators do not participate in a federated way to the learning process. They annotate data that can then be used for federated learning, but every annotation task results is submitted to model creator for analysis and preparation.
DecentralML integration
Before to explain each role, a brief description of the integration of this example with the DecentralML tasks is presented.
The federated learning process happens off-chain on each contributor system. The model creator (or model engineer) provides the files required by each role for the federated learning process as assets for each task (i.e. annotation, training, etc.).
When a contributors accept a task, the substrate python client will download the corresponding assets. The contributors can then execute the tasks off-chain. Once the task is completed, the contributors execute the send task result procedure from the python client which will upload the results again as assets of their results submission.
From the model creator point of view, these would be the steps to start the federated learning process with other model contributors:
Prepares python script for model contributors as assets:
Create model
Save model
Prepare function for model contributor's loading data (
load_data)Prepare function contributor's training procedure on (
train_model)Prepare all the scripts as assets, including a
start_task.shscript that would execute the task.
Create task for model contributors using
create_taskclient function specifying the previous created files as assetsWait for the task to be completed (....)
Once the task is completed, use the
validate_task_resultfunction of the client to retrieve all the task results for validation.The model creator can automatically accept all the result by setting the corresponding policy "AutoAccept"
Otherwise, he can accept only the results he consider valuable after evaluation.
In order to evaluate the model contributors' results, the model creator can use the
load_contributors_modelfunction in this example to load all the contributors model for validationFederate the weights using
federated_contributors_modelfunction in this example to create a final model for deployment.
From the model contributor point of view, these would be the steps to participate in the federated learning process with other model contributors:
Accept a task using
accept_taskfunction in the python client. This function download the corresponding assets for that task.Using a terminal in the assets folder, the model contributor can start the task using the
start_task.shscript that the model creator must provide.Once the task is completed, using the substrate python client, the contributor can call the
send_task_resultfunction in order to upload the results of training procedure as assets.The contributor gets paid according the validation procedure decided by the model creator (see item num. 4 in the model creator's list of step).
In this first implementation, all the steps in the previous list are started manually by the corresponding role. For example, we keep the accept_task procedure, its execution and the send_task_results as separated. This separated an for multiple reasons:
This process ensures that each role has the flexibility to perform the task offline, and at its convenience. A contributor might accept a task but given the duration of some of the steps (i.e. data annotation, or model training) might decide to actually perform the task at a second time.
It allow the possibility for the model creator to completely customise each task to different models, frameworks, or recognition task by providing self-contained code that he/she created according to the specific needs. We reckon that this approach weights more on the model creator, but given the gain in flexibility
The following example of federated learning explains the off-chain steps that each role would perform.
Model creator
The model creator first creates a model structure and compiles it:
Once the first model is generated, it can be trained:
Evaluated:
And more importantly, saved for the model contributors to use:
In a federated system, the training step is generally delegated to the model contributors, but the model creator could perform some training just to initiate the system. The contributors can then subsequently training on new data (see Model contributor).
After the training job is completed by the contributors, the creator can load the models generated and saved by the contributors:
Then, the results of each trained model are federated averaging the model weights:
Finally, the new weights can be applied to the same model structure that was originally created:
This final model is the one that can be deployed, or reused for future re-training.
Model contributor
A model contributor receive a model and trains it on a subset of data that can be either provided by the model creator, or provided locally by the contributor. In this example, the data are provided by the model creator as a method to load them:
The contributor can then load the model provided by the creator for the training:
The contributor then starts the training of the model according with the indication of the creator (i.e. epochs, batch_size, etc.):
The generated model can finally be saved for the model creator to federate the model training results:
The contributor ID in this example is a randomly generated string used to uniquely identify the different models generated by the contributors as part of the training.
Model engineer
As previously explained, in the federated learning procedure, the model engineer has similar duties to the model creator. However, his file are still provided by the model creator as assets for the model engineer's task. Here's a simple example.
The model creator could provide a script with a redefine_model function in which the model engineer can create a new structure for the model. The engineer could train and evaluate the model similarly to the model creator.
Finally, once the engineer is ready to submit the results of its task, he saves the model:
Similarly to the model contributor, a contributor ID is required when saving the model. In this example is a randomly generated string used to uniquely identify the different models generated by the contributors as part of the training.
Last updated