Super Learnerを用いた全米COVIDコホート共同研究におけるロングCOVIDの予測：コホート研究

/ /

日本語AIでPubMedを検索

PubMedの提供する医学論文データベースを日本語で検索できます。AI(Deep Learning)を活用した機械翻訳エンジンにより、精度高く日本語へ翻訳された論文をご参照いただけます。

JMIR Public Health Surveill.2024 Aug;10:e53322.

Super Learnerを用いた全米COVIDコホート共同研究におけるロングCOVIDの予測：コホート研究

Predicting Long COVID in the National COVID Cohort Collaborative Using Super Learner: Cohort Study.

PMID: 39146534

抄録

背景:

COVID-19の急性後遺症（PASC）は、長期COVIDとしても知られ、急性COVID-19の後に起こる様々な長期的症状を幅広くまとめたものである。これらの症状はさまざまな生物学的システムにわたって発現する可能性があり、PASCの危険因子およびこの疾患の原因的病因を決定する上での課題となっている。将来のPASCを予測する特徴を理解することは、高リスク者の同定や将来の予防努力に役立つため、貴重である。しかし、PASCの危険因子に関する現在の知見は限られている。

BACKGROUND: Postacute sequelae of COVID-19 (PASC), also known as long COVID, is a broad grouping of a range of long-term symptoms following acute COVID-19. These symptoms can occur across a range of biological systems, leading to challenges in determining risk factors for PASC and the causal etiology of this disorder. An understanding of characteristics that are predictive of future PASC is valuable, as this can inform the identification of high-risk individuals and future preventative efforts. However, current knowledge regarding PASC risk factors is limited.

目的:

National Institutes of Health Long COVID Computational Challengeの一環として、National COVID Cohort Collaborativeの55,257人の患者標本（PASC患者1人対マッチさせた対照4人の割合）を用いて、臨床的に情報提供された共変量からPASC診断の個人リスクを予測することを試みた。National COVID Cohort Collaborativeには、米国内の84施設から2,200万人以上の患者の電子カルテが登録されている。

OBJECTIVE: Using a sample of 55,257 patients (at a ratio of 1 patient with PASC to 4 matched controls) from the National COVID Cohort Collaborative, as part of the National Institutes of Health Long COVID Computational Challenge, we sought to predict individual risk of PASC diagnosis from a curated set of clinically informed covariates. The National COVID Cohort Collaborative includes electronic health records for more than 22 million patients from 84 sites across the United States.

方法:

Super Learner（スタッキングとしても知られるアンサンブル機械学習アルゴリズム）を使用して、受信者演算子曲線下面積を最大化するための勾配ブースティングとランダムフォレストアルゴリズムの最適な組み合わせを学習し、共変量情報を与えられた個人のPASC状態を予測した。我々は、変数の重要度（シャプレー値）を、個々の特徴、時間窓、臨床領域の3つのレベルに基づいて評価した。無作為に選択した研究施設のホールドアウトセットを用いて、これらの知見を外部で検証した。

METHODS: We predicted individual PASC status, given covariate information, using Super Learner (an ensemble machine learning algorithm also known as stacking) to learn the optimal combination of gradient boosting and random forest algorithms to maximize the area under the receiver operator curve. We evaluated variable importance (Shapley values) based on 3 levels: individual features, temporal windows, and clinical domains. We externally validated these findings using a holdout set of randomly selected study sites.

結果:

個々のPASC診断を正確に予測することができた（曲線下面積0.874）。観察期間の長さ、COVID-19急性期における医療介入の回数、およびウイルス性下気道感染症という個々の特徴が、その後のPASC診断を最も予測した。時間的には、ベースラインの特徴が、急性COVID-19の直前、最中、または直後の特徴と比較して、将来のPASC診断を最も予測することがわかった。医療利用の臨床的領域、人口統計学または身体計測、および呼吸器因子が、PASC診断を最も予測することがわかった。

RESULTS: We were able to predict individual PASC diagnoses accurately (area under the curve 0.874). The individual features of the length of observation period, number of health care interactions during acute COVID-19, and viral lower respiratory infection were the most predictive of subsequent PASC diagnosis. Temporally, we found that baseline characteristics were the most predictive of future PASC diagnosis, compared with characteristics immediately before, during, or after acute COVID-19. We found that the clinical domains of health care use, demographics or anthropometry, and respiratory factors were the most predictive of PASC diagnosis.

結論:

ここで概説した方法は、電子カルテデータを用いてPASC状態を予測するためにSuper Learnerを使用するオープンソースの応用例を提供するものであり、様々な環境で再現可能である。個々の予測因子と臨床領域にわたって、我々は一貫して、医療利用に関連する因子がPASC診断の最も強い予測因子であることを発見した。このことは、PASC診断を主要アウトカムとして用いる観察研究では、異質な医療利用を厳密に考慮しなければならないことを示している。われわれの一時的な知見は、臨床医が急性COVID-19診断前に患者のPASCリスクを正確に評価できるかもしれないという仮説を支持するものであり、これにより早期介入と予防医療を改善できる可能性がある。われわれの知見はまた、PASCリスク評価における呼吸器特性の重要性を強調している。

CONCLUSIONS: The methods outlined here provide an open-source, applied example of using Super Learner to predict PASC status using electronic health record data, which can be replicated across a variety of settings. Across individual predictors and clinical domains, we consistently found that factors related to health care use were the strongest predictors of PASC diagnosis. This indicates that any observational studies using PASC diagnosis as a primary outcome must rigorously account for heterogeneous health care use. Our temporal findings support the hypothesis that clinicians may be able to accurately assess the risk of PASC in patients before acute COVID-19 diagnosis, which could improve early interventions and preventive care. Our findings also highlight the importance of respiratory characteristics in PASC risk assessment.

国際登録報告書識別子（irrid）:

RR2-10.1101/2023.07.27.23293272.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2023.07.27.23293272.