Data annotation
This tutorial is based on the roles defined by DecentralML:
Model creator
Data annotator
Model engineer
Model contributor
and the corresponding task explained in Decentralised Machine Learning documentation. Here's a summary:
Data annotation
Model definition and restructuring
Model training
For each of this task we present a tutorial of all the roles involved and the corresponding functions that need to be executed. These functions are part of the python substrate-client.
All the tasks involve the model creator for creating the task, the required files and assets, and for validating the results.
This tutorial relies on separated scripts and files as assets to complete the machine learning tasks created by the model creator. The structure of the assets will be indicated for each machine learning task.
Data annotation
This is the task of annotating data that will be used for training of machine learning. In this example, we describe the data annotation task for object detection.
Asset files
Assets files are require for this task. You can find some example assets at the path substrate-client-decentralml/assets/data_annotator.
annotation_filesare the files that must be annotatedannotation_samplesare the sample's images tha the annotator would have to find and label in theannotation_files.start_task.shis a script that themodel_creatormust create for thedata_annotatorto actually execute the task.
Procedure
The
model_creatorcreate a task for data annotators using the function:In which:
annotation_typespecifies the kind of annotation. In this caseobject_detectionannotation_media_samplesis the list of samples for the annotators to detect in the images.annotation_filesis the list of files to be annotated by finding the samples in them.annotation_class_labelsis the list of classes to be annotated in the files. At least oneannotation_media_samplemust be provided for each label.annotation_jsoncan be include additional info about the annotation, or be used to specify the task execution script (i.e.start_task.shfrom assets)
For additional info on the substrate parameters (i.e. expiration block, substrate, etc.) consult the documentation of the python client or view the example (https://github.com/livetreetech/DecentralML/blob/main/substrate-client-decentralml/src/decentralml/create_task.py).
The
data_annotatorthen canlist_task(see Listing objects) and accept a task with:by specifying the
task_id. Assigning a task will download the corresponding assets for data annotation task.The
data_annotatorcan then start the task by executing the script created by themodel_creator:The
model_creatoris responsible for creating the data annotation procedure. The outputs of the annotation must be saved in a separate folder, for example, using the same assets folder, creating an output folder:Once the data annotation task is completed, the
data_annotatorcan send the results using:This function accepts a parameter
result_pathwhich will have to be set to the output folder of the annotation task (i.e.outputsfrom theassetsfolder).The
model_creatorcan list the available results for each task using thelist_task_results(see Listing objects).Once, a result is available, the
model_creatorcan start validating the results using thevalidate_task_results. The validation of the results can be performed according to three policies:AutoAccept: the results are automatically accepted
ManualAccept: the
model_creatormanually accepts each task resultsCustomAccept: the
model_creatorcan implement custom methods for automatically validating the results.
Once the results are validated, the
model_creatoror the automatic validation procedure can either accept or reject the results, using respectivelyaccept_task_results()orreject_task_results().Accepting the results issues the payment to the contributor.
Last updated