To feed a learner of Machine Learning (ML), we need data. But, rather than using image pixels, it is much more powerful to use ImageJ measurements (or features in the ML jargon) ...
Starting from the image of Fig. 1, we need to extract some measurements describing and characterizing the shapes.
Fig.1: Image of circles, triangles, and squares |
1. Measurements in ImageJ
In ImageJ, many different measurements are available inAnalyze > Set Measurements
as shown in Fig. 2.Fig. 2: Dialog Window with all the measurements available in ImageJ. |
In this case, because we don't know which ones are best describing the shapes, I selected:
- Area
- Centroid (X- and Y-coordinates)
- Fit ellipse (major, minor axes + angle)
- Shape descriptors including:
- Circularity
- % of Area
- Aspect Ratio
- Roundness
- Solidity
- Feret's diameter (max and min diameters + angle + center)
Note: Centroid
is only useful for display...
2. Image Features in ML Terminology
A dataset
+++ dataset +++
+++ End of dataset +++
In the ML terminology, the data (as displayed in a ImageJ table or a spreadsheet) is composed of rows (termed observations, examples, or feature vectors in ML) and each cell (or column) corresponds to a feature.
3. Dataset = Training + Test + Validation sets
In a ML project, three steps are usually carried out using different datasets. Indeed, it is very important that you don't use the same data for all the steps (no overlap between the various datasets). That's why the dataset is split in three non-overlapping subsets.3.1. Training step (60% of the dataset)
During the training step, you are feeding the learner with the various features + the targets (labels) of the feature vectors (60% of the dataset). The targets correspond to the correct/expected outputs. Here, this is the type of shapes: triangle, circle, and square. Thus, for each observation, we need to set — by hand — the shape's type.3.2. Cross-validation step (20% of the dataset)
In this step, you are comparing various models generated with several ML algorithms and/or tring to define the best parameters to generate the best model.3.3 Test step ( (20% of the dataset)
Check the accuracy and quality of the prediction for the best model.Note: In simple projects, the validation step is skipped and only the test step is done with 40% of the dataset.
4. Final dataset: IJ Measurements + Vertices
The complete dataset — downloadable here — contains 245 graphics objects with their ImageJ measurements + the targets (circle, square, and triangle).
Note: The Results table in ImageJ only accepts numeric values. Thus, the shapes types were replaced by the number of vertices: 0 for a circle, 3 for a triangle, and 4 for a square. A column termed "Vertices" was added in the dataset.
Download this file in CSV (comma separted values) format and then in ImageJ, open it in a Results window with
File > Open...
Note: The training set was obtained from Fig.1 by applying aProcess > Binary > Make Binary
, thenProcess > Binary > Watershed
andAnalyze > Analyze Particles...
and then, the shapes types were visually assigned by appending a new column termed Vertices.
5. Other crazybiocomputing posts
- Machine Learning Glossary
- Machine Learning in ImageJ Series [Link]
- JavaScript/ECMAScript TOC [Link]
No comments:
Post a Comment