Acute Chest Pain CXR AI

Patients who arrive at the ED with acute chest pain (ACP) syndrome end up receiving a series of often-negative tests, but a new MGB-led study suggests that CXR AI might make ACP triage more accurate and efficient.

The researchers trained three ACP triage models using data from 23k MGH patients to predict acute coronary syndrome, pulmonary embolism, aortic dissection, and all-cause mortality within 30 days. 

  • Model 1: Patient age and sex
  • Model 2: Patient age, sex, and troponin or D-dimer positivity
  • Model 3: CXR AI predictions plus Model 2

In internal testing with 5.7k MGH patients, Model 3 predicted which patients would experience any of the ACP outcomes far more accurately than Models 2 and 1 (AUCs: 0.85 vs. 0.76 vs. 0.62), while maintaining performance across patient demographic groups.

  • At a 99% sensitivity threshold, Model 3 would have allowed 14% of the patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).

In external validation with 22.8k Brigham and Women’s patients, poor AI generalizability caused Model 3’s performance to drop dramatically, while Models 2 and 1 maintained their performance (AUCs: 0.77 vs. 0.76 vs. 0.64). However, fine-tuning with BWH’s own images significantly improved the performance of the CXR AI model (from 0.67 to 0.74 AUCs) and Model 3 (from 0.77 to 0.81 AUCs).

  • At a 99% sensitivity threshold, the fine-tuned Model 3 would have allowed 8% of BWH patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).

The Takeaway

Acute chest pain is among the most common reasons for ED visits, but it’s also a major driver of wasted ED time and resources. Considering that most ACP patients undergo CXR exams early in the triage process, this proof-of-concept study suggests that adding CXR AI could improve ACP diagnosis and significantly reduce downstream testing.

CXR AI’s Screening Generalizability Gap

A new European Radiology study detailed a commercial CXR AI tool’s challenges when used for screening patients with low disease prevalence, bringing more attention to the mismatch between how some AI tools are trained and how they’re applied in the real world.

The researchers used an unnamed commercial AI tool to detect abnormalities in 3k screening CXRs sourced from two healthcare centers (2.2% w/ clinically significant lesions), and had four radiology residents read the same CXRs with and without AI assistance, finding that the AI:

  • Produced a far lower AUROC than in its other studies (0.648 vs. 0.77–0.99)
  • Achieved 94.2% specificity, but just 35.3% sensitivity
  • Detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumors 
  • Only “modestly” improved the residents’ AUROCs (0.571–0.688 vs. 0.534–0.676)
  • Added 2.96 to 10.27 seconds to the residents’ average CXR reading times

The researchers attributed the AI tool’s “poorer than expected” performance to differences between the data used in its initial training and validation (high disease prevalence) and the study’s clinical setting (high-volume, low-prevalence, screening).

  • More notably, the authors pointed to these results as evidence that many commercial AI products “may not directly translate to real-world practice,” urging providers facing this kind of training mismatch to retrain their AI or change their thresholds, and calling for more rigorous AI testing and trials.

These results also inspired lively online discussions. Some commenters cited the study as proof of the problems caused by training AI with augmented datasets, while others contended that the AI tool’s AUROC still rivaled the residents and its “decent” specificity is promising for screening use.

The Takeaway

We cover plenty of studies about AI generalizability, but most have explored bias due to patient geography and demographics, rather than disease prevalence mismatches. Even if AI vendors and researchers are already aware of this issue, AI users and study authors might not be, placing more emphasis on how vendors position their AI products for different use cases (or how they train it).

Medical Imaging in 2022

For our final issue of 2022 we’re reflecting on some of the year’s biggest radiology storylines, including some trends that might have a major impact in 2023 and beyond.

“Post-COVID” – Radiology teams thankfully scanned and assessed far fewer COVID patients in 2022, but the pandemic was still partially responsible for most of the trends included in this recap.

Imaging Labor Crunch – Many organizations still didn’t have enough radiologists and technologists to keep up with their imaging volumes this year, driving up labor costs and making efficiency even more important.

Hospital Margin Crunch – There’s a very good chance that the hospitals you work for or sell to had a tough financial year in 2022, placing greater importance on initiatives/technologies that earn or save them money (and address their labor challenges).

AI Evolution – If a radiology outsider read a random Imaging Wire issue they might think that radiologists already use AI every day. We know that isn’t true, but imaging AI’s 2022 progress suggests that we’re slowly heading in that direction.

New Mega Practice Paradigm – After years of massive national expansions, recent unfavorable shifts in surprise billing reimbursements, radiologist staffing (costs & shortages), and the lending environment seemed to have caused large PE-backed radiology groups to pivot their 2022 strategies from practice growth to practice optimization.

The Patient Engagement Push – Radiology patient engagement gained momentum in 2022, as imaging teams and vendors worked to make imaging more accessible and understandable, more patient-centric imaging startups emerged, and radiology departments continued to get better at follow-up management.

The AI Shakeup – Everyone who has been predicting AI consolidation took a victory lap in 2022, which brought at least two strategic pivots (MaxQ AI & Kheiron) and the acquisitions of Aidence and Quantib (by RadNet), Nines (by Sirona), Arterys (by Tempus), MedoAI (by Exo), and Predible (by nference). This trend should continue in 2023, as VCs remain selective and larger AI players extend their lead over their smaller competitors.

Imaging Leaves the Hospital – Between the surge of hospital-at-home initiatives and payors’ efforts to move imaging exams to outpatient settings, imaging’s shift beyond hospital walls continued throughout 2022 and doesn’t seem to be slowing as we head into 2023.

The Mammography AI Generalizability Gap

The “radiologists with AI beat radiologists without AI” trend might have achieved mainstream status in Spring 2020, when the DM DREAM Challenge developed an ensemble of mammography AI solutions that allowed radiologists to outperform rads who weren’t using AI.

The DM DREAM Challenge had plenty of credibility. It was produced by a team of respected experts, combined eight top-performing AI models, and used massive training and validation datasets (144k & 166k exams) from geographically distant regions (Washington state, USA & Stockholm, Sweden).

However, a new external validation study highlighted one problem that many weren’t thinking about back then. Ethnic diversity can have a major impact on AI performance, and the majority of women in the two datasets were White.

The new study used an ensemble of 11 mammography AI models from the DREAM study (the Challenge Ensemble Model; CEM) to analyze 37k mammography exams from UCLA’s diverse screening program, finding that:

  • The CEM model’s UCLA performance declined from the previous Washington and Sweden validations (AUROCs: 0.85 vs. 0.90 & 0.92)
  • The CEM model improved when combined with UCLA radiologist assessments, but still fell short of the Sweden AI+rads validation (AUROCs: 0.935 vs. 0.942)
  • The CEM + radiologists model also achieved slightly lower sensitivity (0.813 vs. 0.826) and specificity (0.925 vs. 0.930) than UCLA rads without AI 
  • The CEM + radiologists method performed particularly poorly with Hispanic women and women with a history of breast cancer

The Takeaway

Although generalization challenges and the importance of data diversity are everyday AI topics in late 2022, this follow-up study highlights how big of a challenge they can be (regardless of training size, ensemble approach, or validation track record), and underscores the need for local validation and fine-tuning before clinical adoption. 

It also underscores how much we’ve learned in the last three years, as neither the 2020 DREAM study’s limitations statement nor critical follow-up editorials mentioned data diversity among the study’s potential challenges.

AI Crosses the Chasm

Despite plenty of challenges, Signify Research forecasts that the global imaging AI market will nearly quadruple by 2026, as AI “crosses the chasm” towards widespread adoption. Here’s how Signify sees that transition happening:

Market Growth – After generating global revenues of around $375M in 2020 and $400M and 2021, Signify expects the imaging AI market to maintain a massive 27.6% CAGR through 2026 when it reaches nearly $1.4B. 

Product-Led Growth – This growth will be partially driven by the availability of new and more-effective AI products, following:

  • An influx of new regulatory-approved solutions
  • Continued improvements to current products (e.g. adding triage to detection tools)
  • AI leaders expanding into new clinical segments
  • AI’s evolution from point solutions to comprehensive solutions/workflows
  • The continued adoption AI platforms/marketplaces

The Big Four – Imaging AI’s top four clinical segments (breast, cardiology, neurology, pulmonology) represented 87% of the AI market in 2021, and those segments will continue to dominate through 2026. 

VC Support – After investing $3.47B in AI startups between 2015 and 2021, Signify expects that VCs will remain a market growth driver, while their funding continues to shift toward later stage rounds. 

Remaining Barriers – AI still faces plenty of barriers, including limited reimbursements, insufficient economic/ROI evidence, stricter regulatory standards (especially in EU), and uncertain future prioritization from healthcare providers and imaging IT vendors. 

The Takeaway

2022 has been a tumultuous year for AI, bringing a number of notable achievements (increased adoption, improving products, new reimbursements, more clinical evidence, big funding rounds) that sometimes seemed to be overshadowed by AI’s challenges (difficult funding climate, market consolidation, slower adoption than previously hoped).  

However, Signify’s latest research suggests that 2022’s ups-and-downs might prove to be part of AI’s path towards mainstream adoption. And based on the steeper growth Signify forecasts for 2025-2026 (see chart above), the imaging AI market’s growth rate and overall value should become far greater after it finally “crosses the chasm.”

Imaging AI’s Unseen Potential

Amid the dozens of imaging AI papers and presentations that came out over the last few weeks were three compelling new studies highlighting how much “unseen” information AI can extract from medical images, and the massive impact this information could have. 

Imaging-Led Population Health – An excellent presentation from Ayis Pyrros, MD placed radiology at the center of healthcare’s transition to value-based care and population health, highlighting the AI training opportunities that will come with more value-based care HCC codes and imaging AI’s untapped potential for early disease detection and management. Dr. Pyrros specifically emphasized chest X-ray’s potential given the exam’s ubiquity (26M Medicare CXRs in 2021), CXR AI’s ability to predict outcomes (e.g. mortality, comorbidities, hospital stays), and how opportunistic AI screening can/should support proactive care that benefits both patients and health systems.

  • Healthcare’s value-based overhaul has traditionally been seen as a threat to radiology’s fee-for-service foundations. Even if that might still be true from a business model perspective, Dr. Pyrros makes it quite clear that the shift to value-based care could make radiology even more important — and importance is always good for business.

AI Race Detection – The final peer-reviewed version of the landmark study showing that AI models can accurately predict patient race was officially published, further confirming that AI can detect patients’ self-reported race by analyzing medical image features. The new paper showed that AI very accurately detects patient race across modalities and anatomical regions (AUCs: CXRs 0.91 – 0.99, chest CT 0.89 – 0.96, mammography 0.81), without relying on proxies or imaging-related confounding features (BMI, disease distribution, and breast density all had ≤0.61 AUCs).

  • If imaging AI models intended for clinical tasks can identify patients’ races, they could be applying the same racial biomarkers to diagnosis, thus reproducing or exacerbating healthcare’s existing racial disparities. That’s an important takeaway whether you’re developing or adopting AI.

CXR Cost Predictions – The smart folks at the UCSF Center for Intelligent Imaging developed a series of CXR-based deep learning models that can predict patients’ future healthcare costs. Developed with 21,872 frontal CXRs from 19,524 patients, the best performing models were able to relatively accurately identify which patients would have a top-50% personal healthcare cost after one, three, and five years (AUCs: 0.806, 0.771, 0.729). 

  • Although predicting which patients will have higher costs could be useful on its own, these findings also suggest that similar CXR-based DL models could be used to flag patients who may deteriorate, initiate proactive care, or support healthcare cost analysis and policies.

The Case for Algorithmic Audits

A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

The Model – The team developed their proximal femoral fracture detection DL model using 45.7k frontal X-rays performed at Australia’s Royal Adelaide Hospital (w/ 4,861 fractures).

The Validation – They then tested it against a 4,577-exam internal set (w/ 640 fractures), 400 of which were also interpreted by five radiologists (w/ 200 fractures), and against an 81-image external validation set from Stanford.

The Results – All three tests produced results that a typical study might have viewed as evidence of high-performance: 

  • The model outperformed the five radiologists (0.994 vs. 0.969 AUCs)
  • It beat the best performing radiologist’s sensitivity (95.5% vs. 94.5%) and specificity (99.5% vs 97.5%)
  • It generalized well with the external Stanford data (0.980 AUC)

The Audit – Despite the strong results, a follow-up audit revealed that the model might make some predictions for the wrong reasons, suggesting that it is unsafe for clinical deployment:

  • One false negative X-ray included an extremely displaced fracture that human radiologists would catch
  • X-rays featuring abnormal bones or joints had a 50% false negative rate, far higher than the reader set’s overall false negative rate (2.5%)
  • Salience maps showed that AI decisions were almost never based on the outer region of the femoral neck, even with images where that region was clinically relevant (but it still often made the right diagnosis)
  • The model scored a high AUC with the Stanford data, but showed a substantial model operating point shift

The Case for Auditing – Although the study might have not started with this goal, it ended up becoming an argument for more sophisticated preclinical auditing. It even led to a separate paper outlining their algorithmic auditing process, which among other things suggested that AI users and developers should co-own audits.

The Takeaway

Auditing generally isn’t the most exciting topic in any field, but this study shows that it’s exceptionally important for imaging AI. It also suggests that audits might be necessary for achieving the most exciting parts of AI, like improving outcomes and efficiency, earning clinician trust, and increasing adoption.A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

Imaging AI’s Big 2021

Signify Research’s latest imaging AI VC funding report revealed an unexpected surge in 2021, along with major funding shifts that might explain why many of us didn’t see it coming. Here’s some of Signify’s big takeaways and here’s where to get the full report.

AI’s Path to $3.47B – Imaging AI startups have raised $3.47B in venture funding since 2015, helped by a record-high $815M in 2021 after several years of falling investments (vs. 2020’s $592M, 2019’s $450M, 2018’s $790M).

Big Get Bigger – That $3.47B funding total came from over 200 companies and 290 deals, although the 25 highest-funded companies were responsible for 80% of all capital raised. VCs  increased their focus on established AI companies in 2021, resulting in record-high late-stage funding (~$723.5M), record-low Pre-Seed/Seed funding (~$7M), and a major increase in average deal size (~$33M vs. ~$12M in 2020). 

Made in China – If you’re surprised that 2021 was a record AI funding year, that’s probably because it targeted Chinese companies (~$260M vs. US’ ~$150M), continuing a recent trend (China’s AI VC share was 45% in 2020, 26% in 2019). We’re also seeing major funding go to South Korea and Australia’s top startups, adding to APAC AI vendors’ funding leadership.

Health VC Context – Although imaging AI’s $815M 2021 funding total seems big for a category that’s figuring out its path towards full adoption, the amount VC firms are investing in other areas of healthcare makes it seem pretty reasonable. Our two previous Digital Health Wire issues featured seven digital health startup funding rounds with a total value of $267M (and that’s from just one week).

The Takeaway

Signify correctly points out that imaging AI funding remains strong despite a list of headwinds (COVID, regulatory hurdles, lacking reimbursements), while showing more signs of AI market maturation (larger funding rounds to fewer players) and suggesting that consolidation is on the way. Those factors will likely continue in 2022. However, more innovation is surely on the way too and quite a few regional AI powerhouses still haven’t expanded globally, suggesting that the next steps in AI’s evolution won’t be as straightforward as some might think.

Autonomous AI Milestone

Just as the debate over whether AI might replace radiologists is starting to fade away, Oxipit’s ChestLink solution became the first regulatory-approved imaging AI product intended to perform diagnoses without involving radiologists (*please see editor’s note below regarding Behold.ai). That’s a big and potentially controversial milestone in the evolution of imaging AI and it’s worth a deeper look.

About ChestLink – ChestLink autonomously identifies CXRs without abnormalities and produces final reports for each of these “normal” exams, automating 15% to 40% of reporting workflows.

Automation Evidence – Oxipit has already piloted ChestLink in supervised settings for over a year, processing over 500k real-world CXRs with 99% sensitivity and no clinically relevant errors.

The Rollout – With its CE Class IIb Mark finalized, Oxipit is now planning to roll out ChestLink across Europe and begin “fully autonomous” operation by early 2023. Oxipit specifically mentioned primary care settings (many normal CXRs) and large-scale screening projects (high volumes, many normal scans) in its announcement, but ChestLink doesn’t appear limited to those use cases.

ChestLink’s ability to address radiologist shortages and reduce labor costs seem like strong and unique advantages. However, radiology’s first regulatory approved autonomous AI solution might face even stronger challenges:

  • ChestLink’s CE Mark doesn’t account for country-specific regulations around autonomous diagnostic reporting (e.g. the UK requires “appropriate reporting” with ionizing radiation-based exams)
  • Radiologist societies historically push back against anything that might undermine radiologists’ clinical roles, earning potential, and future career stability
  • Health systems’ evidence requirements for any autonomous AI tools would likely be extremely high, and they might expect similarly high economic ROI in order to justify the associated diagnostic or reputational risks
  • Even the comments in Oxipit’s LinkedIn announcement had a much more skeptical tone than we typically see with regulatory approval announcements

The Takeaway

Autonomous AI products like ChestLink could address some of radiology’s greatest problems (radiologist overwork, staffing shortages, volume growth, low access in developing countries) and their economic value proposition is far stronger than most other diagnostic AI products.

However, autonomous AI solutions could also face more obstacles than any other imaging AI products we’ve seen so far, suggesting that it would take a combination of excellent clinical performance and major changes in healthcare policies/philosophies in order for autonomous AI to reach mainstream adoption.

*Editor’s Note – April 21, 2022: Behold.ai insists that it is the first imaging AI company to receive regulatory approval for autonomous AI. Its product is used with radiologist involvement and local UK guidelines require that radiologists read exams that use ionizing radiation. All above analysis regarding the possibilities and challenges of autonomous AI applies to any autonomous AI vendor in the current AI environment, including both Oxipit and Behold.ai.

Complementary PE AI

A new European Radiology study out of France highlighted how Aidoc’s pulmonary embolism AI solution can serve as a valuable emergency radiology safety net, catching PE cases that otherwise might have been missed and increasing radiologists’ confidence. 

Even if that’s technically what PE AI products are supposed to do, studies using commercially available products and focusing on how AI complements radiologists (vs. comparing AI and rad accuracy) are still rare and worth a closer look.

The Diagnostic Study – A team from French telerad provider, IMADIS, analyzed AI and radiologist CTPA interpretations from patients with suspected PE (n = 1,202 patients), finding that:

  • Aidoc PE achieved higher sensitivity (0.926 vs. 0.9 AUCs) and negative predictive value (0.986 vs. 0.981 AUCs)
  • Radiologists achieved higher specificity (0.991 vs. 0.958 AUCs), positive predictive value (0.95 vs. 0.804 AUCs), and accuracy (0.977 vs. 0.953 AUCs)
  • The AI tool flagged 219 suspicious PEs, with 176 true positives, including 19 cases that were missed by radiologists
  • The radiologists detected 180 suspicious PEs, with 171 true positives, including 14 cases that were missed by AI
  • Aidoc PE would have helped IMADIS catch 285 misdiagnosed PE cases in 2020 based on the above AI-only PE detection ratio (19 per 1,202 patients)  

The Radiologist Survey – Nine months after IMADIS implemented Aidoc PE, a survey of its radiologists (n = 79) and a comparison versus its pre-implementation PE CTPAs revealed that:

  • 72% of radiologists believed Aidoc PE improved their diagnostic confidence and comfort 
  • 52% of radiologists the said the AI solution didn’t impact their interpretation times
  • 14% indicated that Aidoc PE reduced interpretation times
  • 34% of radiologists believed the AI tool added time to their workflow
  • The solution actually increased interpretation times by an average of 7.2% (+1:03 minutes) 

The Takeaway

Now that we’re getting better at not obsessing over AI replacing humans, this is a solid example of how AI can complement radiologists by helping them catch more PE cases and make more confident diagnoses. Some radiologists might be concerned with false positives and added interpretation times, but the authors noted that AI’s PE detection advantages (and the risks of missed PEs) outweigh these potential tradeoffs.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!