Federated Learning Example

This page describes a simple example of federated learning with a local storage as sharing method for contributors models. In the DecentralML federated learning, the model creator is in charge of creating the scripts for all the steps for the contributors. These scripts must be provided as assets of each model contributor's task.

Here, we present an overview of each step required to complete a federated learning procedure using a recognition model for MNIST dataset (https://www.tensorflow.org/datasets/catalog/mnist).

Stricktly speaking, only the model creator and the model contributor participate in the federated learning process. The model engineer can also participate, but from the federated learning point of view, its role would become similar to the one of the model creator, with large freedom to change the model structure, re-train it and evaluate it. In this case, the model engineer can also create new tasks for the model contributors to re-train the new model.

Finally, the data annotators do not participate in a federated way to the learning process. They annotate data that can then be used for federated learning, but every annotation task results is submitted to model creator for analysis and preparation.

DecentralML integration

Before to explain each role, a brief description of the integration of this example with the DecentralML tasks is presented.

The federated learning process happens off-chain on each contributor system. The model creator (or model engineer) provides the files required by each role for the federated learning process as assets for each task (i.e. annotation, training, etc.).

When a contributors accept a task, the substrate python client will download the corresponding assets. The contributors can then execute the tasks off-chain. Once the task is completed, the contributors execute the send task result procedure from the python client which will upload the results again as assets of their results submission.

From the model creator point of view, these would be the steps to start the federated learning process with other model contributors:

Prepares python script for model contributors as assets:
1. Create model
2. Save model
3. Prepare function for model contributor's loading data (load_data)
4. Prepare function contributor's training procedure on (train_model)
5. Prepare all the scripts as assets, including a start_task.sh script that would execute the task.
Create task for model contributors using create_task client function specifying the previous created files as assets
Wait for the task to be completed (....)
Once the task is completed, use the validate_task_result function of the client to retrieve all the task results for validation.
1. The model creator can automatically accept all the result by setting the corresponding policy "AutoAccept"
2. Otherwise, he can accept only the results he consider valuable after evaluation.
In order to evaluate the model contributors' results, the model creator can use the load_contributors_model function in this example to load all the contributors model for validation
Federate the weights using federated_contributors_model function in this example to create a final model for deployment.

From the model contributor point of view, these would be the steps to participate in the federated learning process with other model contributors:

Accept a task using accept_task function in the python client. This function download the corresponding assets for that task.
Using a terminal in the assets folder, the model contributor can start the task using the start_task.sh script that the model creator must provide.
Once the task is completed, using the substrate python client, the contributor can call the send_task_result function in order to upload the results of training procedure as assets.
The contributor gets paid according the validation procedure decided by the model creator (see item num. 4 in the model creator's list of step).

In this first implementation, all the steps in the previous list are started manually by the corresponding role. For example, we keep the accept_task procedure, its execution and the send_task_results as separated. This separated an for multiple reasons:

This process ensures that each role has the flexibility to perform the task offline, and at its convenience. A contributor might accept a task but given the duration of some of the steps (i.e. data annotation, or model training) might decide to actually perform the task at a second time.
It allow the possibility for the model creator to completely customise each task to different models, frameworks, or recognition task by providing self-contained code that he/she created according to the specific needs. We reckon that this approach weights more on the model creator, but given the gain in flexibility

The following example of federated learning explains the off-chain steps that each role would perform.

Model creator

The model creator first creates a model structure and compiles it:

def create_model():
    model = tf.keras.models.Sequential([
                tf.keras.layers.Flatten(input_shape=(28, 28)),
                tf.keras.layers.Dense(128, activation='relu'),
                tf.keras.layers.Dense(10)
                ])
    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
    )
    return model

Once the first model is generated, it can be trained:

def train_model(model, x_train, y_train, epochs=1):
    model.fit(x_train, y_train, epochs)
    return model

Evaluated:

def evaluate_model(model, x_test, y_test):
    return model.evaluate(x_test, y_test)

And more importantly, saved for the model contributors to use:

def save_model(model, output_path, model_name):
    model.save(f"{output_path}/{model_name}")

In a federated system, the training step is generally delegated to the model contributors, but the model creator could perform some training just to initiate the system. The contributors can then subsequently training on new data (see Model contributor).

After the training job is completed by the contributors, the creator can load the models generated and saved by the contributors:

def load_contributors_models(contributors_models_path, model_name):
    model_contributors_folders_path = f"{contributors_models_path}/{model_name}_*"
    model_folders = [f for f in glob.glob(model_contributors_folders_path)]
    contributors_models = list()
    for model_folder in model_folders:
        model = tf.keras.models.load_model(model_folder)
        contributors_models.append(model)
    return contributors_models

Then, the results of each trained model are federated averaging the model weights:

def federate_contributors_model(contributors_models, policy="average"):
    client_weights = [model.trainable_variables for model in contributors_models]

    new_weights = None

    if policy=="average":
        # Compute the average weights for each layer
        avg_weights = [
            tf.reduce_mean(layer_weight_tensors, axis=0)
            for layer_weight_tensors in zip(*client_weights)
        ]
        new_weights = avg_weights
    
    return new_weights

Finally, the new weights can be applied to the same model structure that was originally created:

def set_model_weights(model, weights):
    model.set_weights(weights)
    return model

This final model is the one that can be deployed, or reused for future re-training.

Model contributor

A model contributor receive a model and trains it on a subset of data that can be either provided by the model creator, or provided locally by the contributor. In this example, the data are provided by the model creator as a method to load them:

def load_data():
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    return (x_train, y_train), (x_test, y_test)

The contributor can then load the model provided by the creator for the training:

def load_model(model_path, model_name):
    model = tf.keras.models.load_model(f"{model_path}/{model_name}")
    print(model.summary())
    return model

The contributor then starts the training of the model according with the indication of the creator (i.e. epochs, batch_size, etc.):

def train_model(model, x_train, y_train, epoch=100, batch_size=32):
    model.fit(x_train, y_train, epochs=epoch, batch_size=batch_size)
    return model

The generated model can finally be saved for the model creator to federate the model training results:

def save_model(model, output_path, model_name, contributo_id_length=10):
    contributor_id = get_random_string(contributo_id_length)
    model.save(f"{output_path}/{model_name}_{contributor_id}")

The contributor ID in this example is a randomly generated string used to uniquely identify the different models generated by the contributors as part of the training.

Model engineer

As previously explained, in the federated learning procedure, the model engineer has similar duties to the model creator. However, his file are still provided by the model creator as assets for the model engineer's task. Here's a simple example.

The model creator could provide a script with a redefine_model function in which the model engineer can create a new structure for the model. The engineer could train and evaluate the model similarly to the model creator.

def redefine_model():
    model = tf.keras.models.Sequential([
                tf.keras.layers.Flatten(input_shape=(28, 28)),
                tf.keras.layers.Dense(128, activation='relu'),
                tf.keras.layers.Dense(10)
                ])
    model.compile(
        optimizer=tf.keras.optimizers.Adam(0.001),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
    )
    return model

Finally, once the engineer is ready to submit the results of its task, he saves the model:

def save_model(model, output_path, model_name, contributo_id_length=10):
    contributor_id = get_random_string(contributo_id_length)
    model.save(f"{output_path}/{model_name}_{contributor_id}")

Similarly to the model contributor, a contributor ID is required when saving the model. In this example is a randomly generated string used to uniquely identify the different models generated by the contributors as part of the training.

PreviousDecentralized Machine Learning NextDecentralML tutorial

Last updated 2 years ago

hashtagDecentralML integration

hashtagModel creator

hashtagModel contributor

hashtagModel engineer

DecentralML integration

Model creator

Model contributor

Model engineer