LUS B-line quantification is a useful tool for diagnosis, prognosis, and monitoring response to treatment in patients with AHF [10,11,12,13]. European Society of Cardiology expert consensus guidelines support the use of LUS in the management of AHF [14]. LUS image acquisition and interpretation is highly dependent on the skill of the operator. Machine AI with technology to automatically quantify B-lines has the potential to decrease inter-operator variability and if this technology works it could allow novice learners the ability to count B-lines and incorporate findings into their clinical decisions in patients with AHF.
The results of our study suggest that after limited training, learners with some to no prior LUS experience were able to generate high-quality images. Machine AI was able to quantify B-lines using these images with fair correlation when compared to an expert reviewer. This data suggests that further AI technology development is necessary to improve the algorithm to achieve good correlation. If successful, this has great implications clinically as a clinician or even a non-clinician could track treatment progress in patients with AHF to determine response to treatment, guide additional therapy, and determine when a patient is decongested and ready for hospital discharge [2].
Overall machine AI tended to overcount the number of B-lines when compared to the expert in every lung zone expect left zone 4, where median counts were equal. Interestingly, the lateral zones had higher correlation with expert reads than the anterior lung zones. Anterior lung zones are generally easier to acquire compared to the lateral lung zones which require the probe be positioned in an oblique manner to stay within the rib space, which can be challenging for a novice learner to orient the probe. In addition, lateral lung zone image acquisition is difficult in obese patients.
The left anterior lung zones 1 and 2 had the lowest correlation 0.40 and 0.32, respectively. These zones also had the lowest number of assessments as the heart commonly sits in view and can make it tough to view the pleural line. If any portion of the pleural line was obtained these images were included in analysis. It is possible that the machine AI overcounted B-lines in these zones secondary to cardiac motion.
We found that after a 30-min training session, learners both novice and those with limited LUS experience were able to obtain high-quality images on the vast majority of patients. However, it is important to note that 28 (4%) images were inadequately acquired, where the learner was unable to obtain pleural line. In these instances, it was common for the learner to image below the diaphragm or over a rib. Bowel gas can appear similar to lung artifacts on US and could easily be confused by a novice learner. Additionally, rib with shadow can appear similar to pleural line, especially when viewing in a horizontal orientation. While this may have been expected in novice learners, the proportion of these inadequate images was quite low supporting that a brief training is sufficient. The importance of image quality is demonstrated by the fact that correlation between AI and expert counts increased as image quality improved—from 0.41 with lower quality images to 0.58 with higher-quality images (rated as a 4 or 5).
To our knowledge, this was the largest study assessing novice learners’ ability to use machine AI to objectively quantify LUS B-lines. Prior literature has found that machine-assisted quantification of LUS artifacts generally performs well, but studies are small. Corradi et al. [15] evaluated 32 patients with suspected community-acquired pneumonia, comparing quantitative ultrasound to chest X-ray and computed tomography as the gold standard. They found quantitative ultrasound to have high sensitivity, specificity, and diagnostic accuracy, outperforming chest X-ray and visual ultrasonography for the diagnosis of community-acquired pneumonia. Although this was a small study, evaluating pneumonia and not B-line quantification, they were able to show machine AI’s ability to accurately detect lung artifacts.
Brusasco et al. [8] studied 12 intensive care unit patients with acute respiratory distress, comparing an automated quantitative scoring system for B-lines with semi-quantitative measurements of extravascular lung water using thermo-dilution. They found computer-aided B-line quantification on LUS had a strong correlation with extravascular lung water (R2 = 0.57). This was a pilot study limited by a small sample size and single sonographer. Additionally, the B-line analysis was performed in post-processing, limiting the real-time application of the technology.
Corradi et al. [16] assessed computer-assisted LUS B-line quantification in 48 ventilated cardiac surgery patients, compared to pulmonary capillary wedge pressure or extravascular lung water assessments using thermo-dilution. They found high correlations between quantitative LUS and pulmonary congestion. This study differs from ours in that it included ventilated cardiac patients, different standards were used for comparison, and all images were obtained by the same operator.
The data from these studies and our study suggest that use of AI software to identify clinically useful LUS artifacts—including B-line quantification—shows promise, but further development is needed before widespread use. Future studies should be aimed at further refining this technology and prospectively assessing a larger number of patients and novice learners in diverse clinical environments, with careful attention to the impact of image quality on algorithm performance.
There are several limitations to consider. This was overall a relatively small study, although there were a large number of learners. Previously published methods of LUS B-line assessment have used differing protocols and semi-quantitative methods [17]. For this study, we used an 8-zone protocol and compared an automatic quantitative method to a semi-quantitative method by one expert. Our criterion standard was expert review using a semi-quantitative method. While this is currently used in clinical practice and has a high correlation with extravascular lung water (EVLW) [18], the true quantity of EVLW present remains unknown as there was no direct measurement thereof. In addition, we found high correlation between experts.
The machine AI software only has the ability to count 0–4 and ≥ 5 B-lines within one lung zone. Semi-quantitative methods typically use a scale of 0 through 10, or 0 through 20. From a clinical standpoint, a count of ≥ 5 B-lines within a lung zone by the US machine would be significant for a large amount of extravascular lung water, and thus clinically significant pulmonary edema. Finally, patients were scanned by multiple learners, but each set of LUS images was treated as independent in our analysis, thereby ignoring potential within-subject correlation for B-line counts. Image acquisition technique (probe location on chest wall, angle of insonation, timing during the respiratory cycle) has a substantial impact on image quality and thus B-line counts. Given the varied experience level of each learner and the fact that they scanned independently, we felt that, overall, between-learner variability in image acquisition technique would minimize the impact of the within-subject B-line correlation.