Here, we expand upon our prior segmentation work with a focus on characterizing highly strained FSP microstructures. FSP microstructures present unique challenges to quantitative analysis. Internal strain in FSP microstructures results in high intragranular contrast in SEM-BSE images, which complicates both traditional analysis and the hand-labeling of grain boundaries. To address impediments to strained image analysis, we examined the coupling of different modalities (BSE and SE) to EBSD measurements to establish high-quality labeled data on which to train semantic segmentation models to identify grain boundaries. We then used the ensemble of models trained on the best-performing modality to segment a series of BSE images of samples manufactured with different FSP parameters. We found that despite the modality of the training data being different from that of the new test data, the model provided accurate predictions. Our results showed that the physics-based processes of microscopy imaging were key to determining the 'goodness' of the training data, which was crucial for model performance.
Image segmentation can be used to accelerate grain size analysis over large areas based on SE/BSE imaging. It is well-known that EBSD measurements are more time-consuming than SE/BSE and that the data acquisition time is closely related to the selected step size. For example, in our training data, a map collected in 50 nm steps covering an area of approximately 450 µm required approximately 17 min of EBSD data collection. A similar area can be imaged in a few seconds via SE/BSE, but the quality varies depending on the image resolution and pixel dwell time (i.e., time consumption). The SE/BSE images in this work have a native resolution of 2560 × 2048 pixels, covering an approximate area of 490 µm (which allowed for cropping and alignment relative to the EBSD images), and were collected over 160 s (2 min and 40 s), resulting in a potential 14-15 min time savings per image. This work aims to understand the effect of SE and BSE image modes, as well as the effect of different acceleration voltages, on image segmentation and model accuracy. An additional motivation for this work is the relative widespread accessibility of SE/BSE imaging over EBSD, as the former is available in virtually all SEM instruments at academic and research institutions.
The first step in developing a segmentation model for grain boundary detection is to obtain high-quality, properly labeled microscopy data on which training can be conducted in a controlled and reliable fashion. However, there are fundamental aspects to the generation of SE and BSE images that first need to be clarified. Although these concepts are well-known within the microscopy community, materials scientists and data scientists who are less familiar with microscopy may find these clarifications useful for future engagement with AI/ML for microscopy. First, we briefly examine the physical mechanisms behind image collection modes and acceleration voltages. Readers are encouraged to review other sources for a more detailed understanding of electron-matter interactions in SEM. A quick summary of image generation modes is provided as follows:
In the current study of FSP 316L stainless steel samples, image contrast variations in both SE and BSE modes due to atomic number can be disregarded since austenite is a solid solution that reflects the average composition of the steel. Contrast variations due to orientation and electron channeling become of prime importance, especially in fine-grained face centered cubic (FCC) solid solutions produced via solid-phase processing during FSP. Figure 1 summarizes the variability of the microstructural features of FSP 316L austenitic stainless steel observed via BSE as a function of the acceleration voltage in steps of 2 keV. In addition, BSE and SE modes are compared at acceleration voltages of 10 and 20 keV, as well as against EBSD data obtained at 20 keV.
Increasing the acceleration voltage increases the contrast (i.e., signal-to-noise ratio) between microstructural features in the BSE images, especially at and above 6 keV. Interestingly, certain microstructural features appear or disappear as the acceleration voltage is increased. This effect is illustrated by the red rectangles in Fig. 1, which highlight an austenitic grain containing annealing twins that are only observable below 12 keV. Furthermore, the intragranular features within white squares, which are not associated with HAGBs or LAGBs, seem to fluctuate as the acceleration voltage is changed. These regions are associated with dense dislocation walls separating small domains inside grains, with misorientation angles smaller than 2°. SE images taken at 10 and 20 keV are comparatively less noisy and have milder contrast compared to their BSE counterparts. Nonetheless, these images are still sensitive to crystallographic contrast, effectively revealing grain and twin boundaries, while being less sensitive to dense dislocation walls. Therefore, SE images are an alternative imaging option that contains the features of interest in this work while limiting the contrast of small intragranular misorientation.
Next, our analysis of EBSD step size and imaging accelerating voltage raises an important aspect regarding the inherent variability of electron microscopy images as a function of imaging conditions. The variability implies that there is not a single EBSD, SE, or BSE image that defines the ultimate 'ground truth' microstructure of a sample. However, a compendium of multiple images represents the same microstructure seen by EBSD, which alters our perception of the ground truth of a sample's microstructure. Consequently, if a ground truth image from microscopy is required for training deep learning models, it must be accompanied by an adequate label or metadata, describing the measurement conditions that were used to generate such image.
First, to define a quality crystallographic ground truth and to understand the time consumption associated with each measurement, we explored different step sizes and areas of interest, as summarized in Fig. 2. Our main objective was to obtain EBSD maps that contain sharp grain boundaries, in a reasonable measurement time, and with minimum loss of information. As seen in Fig. 2A, time consumption increases both as the step size is reduced (higher pixel densities) and as the area of interest is increased. The effect of step size on the number of measured grains and on the accuracy of boundary identification, particularly for CSL Ʃ3 twin boundaries, is shown in Fig. 2B. Coarse step size yields fast results but at the cost of reduced twin boundary detection and grain count. More details on the quality of the reconstructed grain boundaries can be seen in the supplementary information (Fig. S1). For model training purposes, we selected a step size of 50 nm to maximize the quality of our crystallographic ground truth, requiring an elapsed time of 17 min per ~450 µm area of interest. Grain boundaries were carefully reconstructed following the protocols described in the Methods section, aiming to obtain continuous grain boundary skeletons that fully enveloped every identified grain. Examples of discontinuous grain boundaries and successful post-processing are shown in the supplementary information (Fig. S2).
Acknowledging the inherent variability of SE and BSE images, we opted to define our ground truth as the crystallography-based data generated via EBSD at a fixed 20 keV acceleration voltage and 50 nm step size. During processing of the EBSD data, we reconstructed the grain boundaries and deconvoluted this information into skeleton-like grain boundary maps.
Semantic segmentation requires labeled training data, meaning that the input image must have a corresponding segmentation map with a pixel-to-pixel match. To create labeled training data for microscopy images, previous studies have used hand-drawn segmentation maps. FSP causes the formation of numerous LAGBs and dense dislocation walls in the 316L austenitic stainless steel microstructure. Consequently, BSE images are of high contrast (orientation and electron channeling effects), compromising accurate manual identification of grain boundaries. Therefore, we performed sequential, correlated SEM (SE/BSE) and EBSD measurements to produce labeled training data.
To create the training data, the ground truth grain boundary map (model output) was obtained from the EBSD boundary reconstruction. LAGBs, HAGBs, and twins were grouped into a single grain boundary class. Sample images are shown in Fig. 3. The registration of the SEM images and grain boundary maps is complicated due to stage tilting and trapezoidal distortion in the EBSD image relative to SE/BSE, requiring specialized post-processing procedures. Differences in tilt angles between SEM and EBSD lead to foreshortening of grains and varying interaction volumes, while differences in magnification and working distance lead to varying image resolution, and differences in accelerating voltage and beam current lead to varying probe sizes -- all of which require correction to obtain an accurate pixel-to-pixel match. We first registered the fiducial marks in the EBSD grain boundary map and SEM images, then manually adjusted the EBSD grain boundary map to obtain pixel-to-pixel correspondence. The addition of markers greatly improved the success of obtaining a pixel-to-pixel correspondence by maximizing the spatial overlap between observation areas (see supplementary information Fig. S3). This process resulted in four sets of training pairs (see the Methods section for more details).
Our SEM-EBSD registration approach is similar to that of Shen et al., who also used correlated SEM-EBSD measurements combined with manual adjustment of the EBSD map. Notably, while their segmentation models trained on this approach were able to accurately distinguish austenite and martensite phases in dual-phase steel, the models were not able to determine the exact locations of grain boundaries. This limitation was presumed to arise from the fine, indistinct, and "somewhat fuzzy" boundaries between phases.
Table 1 shows the average performance of the three individual UNet++ models for each modality and model pretraining/loss scheme. Specifically, the image modality label presents the information on the type of image (BSE or SE) and the accelerating voltage at which the image was collected that was used for training the UNet++ models. The F scores, HD95, and mean absolute error (MAE) in the mean equivalent circle diameter (ECD) for individual models trained on each modality are presented in supplementary information Tables S2‒S4.
Comparing across image modalities, we see that models trained on the SE image taken at an accelerating voltage of 10 keV (SE 10) performed best across all models over all three metrics. Across each modality, models pretrained on MicroNet outperformed those pretrained on ImageNet. The benefit of the addition of topoloss to the loss function (denoted TopoDICE) is unclear, with performance improving and worsening across different metrics and different training images. It appears that TopoDICE enhances accuracy in ECD when model performance is better, but decreases accuracy if a certain performance threshold cannot be met with DICE alone. The topological loss rewards conformity in the number of continuous, enclosed areas in the ground truth and predicted grain boundary maps, without considering the actual location of the grain boundary pixels. Conversely, DICE rewards pixel-level overlap and does not consider continuity. Therefore, we hypothesize that if pixel-level overlap cannot be accurately learned, rewarding continuity only further decreases accuracy.
For the best-performing modality (SE 10), TopoDICE reduced the MAE in ECD from 0.68 to 0.57 µm but gave the same average F score of 0.62, though the average HD95 slightly increased from 26.3 to 28.7 pixels. Because our target task was to characterize grain structure, we weighted the MAE in ECD higher than HD95 and, therefore, consider our best set of models to be the MicroNet/TopoDICE models trained on the SE 10 keV image.
To understand why model performance was highest when trained on the SE images obtained at an acceleration voltage of 10 keV, we reviewed the concept of interaction volume of the sample during imaging and image contrast. First, in the case of backscattered electrons, Monte Carlo simulations (provided in supplementary information Fig. S6) were performed for an equivalent 316L stainless steel solid solution and a beam normal to the surface. The estimated maximum penetration depth of backscattered electrons was approximately 17 ± 4 nm, 58 ± 10 nm, 121 ± 18 nm, and 190 ± 29 nm for beam energies of 5 keV, 10 keV, 15 keV, and 20 keV, respectively. During EBSD, however, the sample was tilted to 70° relative to the horizon, which reduces the interaction volume to 50-100 nm. Based on this, an electron image that pairs with the EBSD map should be collected at a reduced acceleration voltage to reach a similar interaction volume. This is readily evident for BSE images but should also be considered for SE images, which are still mildly sensitive to crystallographic contrast. This is consistent with our observations in Fig. 1, where the best visual match between BSE/SE and EBSD information was observed for acceleration voltages below 12 keV, i.e., nearly half of the EBSD acceleration voltage of 20 keV.
Initially, the SE-EBSD pair may seem counterintuitive because of their differing scattering mechanisms. However, the reduced sensitivity of SE images to crystallographic contrast and electron channeling contrast, along with the shallower interaction volume relative to BSE, results in a more suitable image pair. One persisting limitation, even for SE-EBSD pairs, is related to the contrast variations caused by geometrically necessary dislocations. Although dense dislocation walls were excluded during the grain boundary reconstruction protocol, such regions are still present in the EBSD data and can be better highlighted via kernel average misorientation (KAM) analysis. Comparatively, regions of the microstructure inside the white rectangles in Fig. 1 show contrast variations associated mainly with misorientation build-ups around 2°. These regions can still be a source of false positive identifications by the segmentation models, especially if the SE image is acquired at a high contrast condition, and can lead to artificially fine grain size predictions.
Because our models were trained on images from a sample produced using a single set of FSP conditions, we were interested in investigating the ability of the models to accurately segment images from samples produced under different FSP conditions. Segmentation models for microscopy images are known to generalize poorly to out-of-distribution (OOD) images due to differences in a variety of imaging and material parameters. However, it is crucial for a segmentation model to be able to perform accurately with OOD images not used during training to improve its applicability across samples produced at different processing conditions. To assess the OOD performance of our models, we examined their performance in segmenting a set of 20 BSE images with differences from the training set in terms of both material processing conditions and imaging parameters. Specifically, the OOD images segmented by our models were manufactured using FSP conditions different from those used to process the sample from which the training image data was obtained. Additionally, the OOD images were also collected at different microscopy parameters, namely a different instrument, a different instrument operator, and a different modality. Table 2 gives the processing conditions, and Table 3 gives the imaging conditions. The only commonality among the OOD images and all training images is the material (316L stainless steel). The BSE training sets share the same imaging modality with the OOD set, and the BSE 20 keV training set also shares a common accelerating voltage.
Figure 4 shows a BSE image from the OOD set overlaid with segmentation maps for the MicroNet/TopoDICE model trained on the SE 10 keV image, along with the corresponding grain detections. This image demonstrates the poor grain boundary closure for individual sets, and the improvement gained with ensembling. The individual models tended to produce segmentation maps with gaps in grain boundaries, which leads to erroneous grain detection and artificially increases the measured grain diameters. Ensembling the predictions by summing segmentation maps from the three models trained on the same modality led to improved grain boundary closure and, thus, grain detection. Supplementary information Tables S5 and S6 give the mean number of grains and error in ECD for each modality training set and ensemble. In each case, ensembling recovers more grains, which improves the accuracy of grain diameter measurements. Going forward, we applied ensembling for each model training modality when predicting segmentation maps for the OOD images.
We did not have pixel-to-pixel alignment between BSE images and EBSD measurements of the OOD samples. Thus, we validated model performance through comparison of the mean ECD determined from the ensembled segmentation maps and EBSD measurements. It should be noted that a perfect match in ECD between the two modalities is not expected due to differences in interaction volume, as discussed previously, as well as measurement technique. For instance, BSE images show contrast between grains, subgrains, and regions surrounded by dense dislocation walls, but these cannot be accurately categorized individually. Conversely, EBSD can provide such differentiation based on misorientation analysis.
Humphries et al. discussed the reasons behind the mismatch between grain size measurements between light optical microscopy, SEM imaging, and EBSD in weakly and strongly texturized aluminum. Strongly texturized microstructures containing high densities of LAGBs tend to show a smaller grain size based on imaging techniques, especially when images are sensitive to crystallographic contrast. This occurs as all measurable boundaries contribute to grain size calculations via the line intercept methodology. In more randomized and recrystallized microstructures with low densities of LAGBs, optical, SEM, and EBSD based calculations tend to agree.
Table 4 gives the overall MAE in ECDs obtained from the ensembled segmentation maps over the full OOD sample set. The MicroNet/TopoDICE model trained on the SE 10 keV image gives the lowest MAE of 0.34 µm, followed by the MicroNet/DICE model trained on SE 10 keV of 0.40 µm. The MAEs for the MicroNet models trained on BSE 10 keV and 20 keV images were extremely high due to the drastic underprediction of grain boundary pixels.
Based on the MAE, it appears the ImageNet/DICE model provided more consistent, albeit less accurate, predictions across training image modalities. However, further examination revealed that the ImageNet/DICE model produced a narrow range of ECD values across the OOD samples. Figure 5A shows the individual predictions on OOD samples across models trained on the SE 10 keV images, and Fig. 5B compares ECD distributions for the models and EBSD. ECD distributions for the remaining training set modalities are given in supplementary information Table S7. EBSD-determined ECDs range from 1.47 to 4.68 µm, while ImageNet/DICE/SE 10 keV ECDs range from 3.07‒4.40 µm, which are within the range of the grain sizes obtained from the EBSD 'ground truth'. The MicroNet models more closely reproduce the expected ECD distribution, especially at lower grain sizes.
Despite training on the SE 10 keV image, the models successfully transferred their learning of SE images to BSE images taken using different microscopes, different imaging settings, and by different operators of samples processed under different conditions with a wider range of mean grain sizes. From this observation, we can conclude that carefully considering the physical properties underlying the collection of the training data allows the generation of segmentation models that can accurately analyze images collected from different samples using different imaging modalities. Training involves learning of pixel-to-pixel correlations between the input (SE) and output (EBSD) data, while prediction is validated by mean grain size rather than exact pixel overlap. BSE imaging indeed captures grain boundaries, though at a deeper interaction volume than SE or EBSD. We expect grains at the same location to have similar ECDs across the approximately 250 nm depth captured by BSE compared to the approximately 20 nm depth captured by EBSD.