M. Chiriboga, C.M. Green, D.A. Hastman, D. Mathur, Q. Wei, I.L. Medintz, S.A. Díaz, R. Veneziano
United States Naval Research Laboratory,
United States
Keywords: artificial intelligence, machine learning, computer vision, image analysis, atomic force microscopy, DNA origami, DNA nanostructures, python
Summary:
The intra-image identification of DNA structures is essential to rapid prototyping and quality control of self-assembled DNA origami scaffold systems. One of the most cost-effective analysis modalities of DNA origami nanostructures is atomic force microscopy (AFM). In general, compared to other microscopy communities, the bio-AFM community has suffered from a lack of open source and freely available software enabling the automated identification and classification of DNA nanostructures. As a result, the high-throughput nature of AFM tends to generate datasets which cannot feasibly be analyzed manually, or with the available automation tools. This inability to analyze an otherwise advantageous excess of data tends to preclude the calculation of valuable statistics. In order to bridge this gap, we postulate the YOLO modern object detection platform commonly used for facial recognition can be repurposed and applied to rapidly scour AFM images for identifying correctly formed DNA nanostructures with high fidelity. To make this approach widely available, we use open-source software and provide a straightforward procedure for designing a tailored, intelligent identification platform which can easily be repurposed to fit arbitrary structural geometries beyond AFM images of DNA structures. We describe methods to acquire and generate the necessary components to create this robust system from a relatively small source dataset. Beginning with DNA structure design, we detail AFM imaging, data point annotation, data augmentation, model training, and inference. To demonstrate the adaptability of this system, we assembled two distinct DNA origami architectures (triangles and breadboards) for detection in raw AFM images. Using images acquired of each structure, we trained two separate single class object identification models unique to each architecture. By applying these models in sequence, we correctly identified and classified 3,470 structures from a total population of 3,617 using images that sometimes included a third DNA origami structure as well as other impurities. Analysis was completed in under 20 seconds with results yielding an F1 score of 0.96 using our approach.