Retrospective O2 analysis

Introduction

There is concern about the accuracy of pulse oximetery. There is a lot of data suggesting that pulse oximetry has bias particularly in Black patients. One challenge is that it is difficult to get real world data in order to understand if this is true. One way to do so is to leverage the EMR to align pulse oximetry and arterial blood gas (ABG) measurements. This was the approach taken by Sjoding et al.

Sjoding

  • Sjoding did a retrospective analysis of the O2 saturation data from the EMR.

  • They identified 10,789 arterial blood gas (ABG) measurements that were within 10 minutes of an SpO2 measurement, and had a saturation between 89-96.

  • Then, they divided the data by race.

  • This now well known paper demonstrated that, for a given arterial oxygen value, Black patients tended to have lower SpO2 values than White patients.

  • One critique of this paper is that 10 minutes may not be close enough. In a real world setting, clinicians will often treat patients quickly when the saturation is low (and an ABG is taken). A corresponding saturation from 10 minutes later may not be representative.

Equiox

  • Equiox is a prospective clinical trial that took place at ZSFG, whose goal was to examine the interaction between skin color and pulse oximetry readings in critically ill patients.
  • In this trial, SpO2 was noted by clinical staff at the same time as an ABG was taken, effectively meaning there was no delay in measurement.

The objective of this study was to compare the time intervals between pulse oximetry (SpO2) values and arterial blood gas (ABG) collection times as recorded in electronic medical records (EMR) with the true intervals determined by direct observation by the research team. Additionally, we aimed to assess whether these differences in timing affected the bias in SpO2 measurements relative to SaO2 (arterial oxygen saturation).

We hypothesized that the time intervals between SpO2 measurements and ABG collection as recorded in the EMR will be systematically shorter than the true intervals observed by the research team, leading to a potential underestimation of the actual timing differences.

We also hypothesized that the use of EMR-based time intervals will result in a significantly different calculated bias (SpO2 - SaO2) compared to biases calculated using directly observed time intervals, with the expectation that the EMR-based method underestimates the bias.

Methods

Data acquisition

Retrospective Dataset:

  • Inclusion criteria: We included all patients. All ages. In any of the ICUs.
  • Exclusion criteria:
  • Dates: April 2023 - Ocotober 8, 2024. April 2023 being when SaO2 was being recorded in EMR, prior to that only pO2 was recorded.

Equiox Dataset:

  • Inclusion criteria: All patients enrolled in the Equiox trial (e.g. patients admitted to the ICU)
  • Exclusion criteria:
    • Research only samples (not clinically indicated samples, because were not run on the lab machines and thus not in EMR.)
    • Samples taken before the patient physically in the ICU (e.g. samples taken in the ED), even though enrolled, they were not pulled in the EMR because it only included ICU patients.
  • Dates:

Data collection pathway

flowchart LR
    Col("A - ABG collected [SpO2 observation time]") 
    ColTime("B - ABG Collection Time Recorded in EMR") 
    ABG("C - ABG resulted time in EMR")
    ObsTime("D - Nearest EMR SpO2 observation time")
    
    Col --> ColTime --> ABG --> ObsTime

    %% Node styles
    style Col fill:#f9c2ff,stroke:#333,stroke-width:2px
    style ColTime fill:#ffb347,stroke:#333,stroke-width:2px
    style ABG fill:#77dd77,stroke:#333,stroke-width:2px
    style ObsTime fill:#779ecb,stroke:#333,stroke-width:2px

Figure 1: Data collection time points

This is the pathway of data collection. We have 4 main time points of data collection.

  1. The actual time of blood gas collection (directly observed by the research team, Equiox SpO2 is observed at this time)
  2. The time of blood gas collection recorded in the EMR
  3. The time the ABG result is populated in the EMR
  4. The closest SpO2 recorded in the EMR

As noted above, the Nearest EMR SpO2 time may actually be before the result time.

Data preparation

Merging: In the EMR dataset, the blood gas and SpO2 data were merged based on unique hospital encounter ID, which is unique to each patient admission. For each blood gas SaO2 value, we identified the closest SpO2 value from the EMR, either before or after the blood gas collection time.

The Equiox dataset did not contain encounter IDs, so data was merged with the EMR dataset based on MRN. In the EMR data, patients could be known by different MRNs in different encounters, while in Equiox each patient had only one MRN. We compared each MRN in the EMR data to the list of MRNs in the Equiox data, and used MRNs that were present in both datasets. For EMR patients that had no matching Equiox MRN (i.e. they were not enrolled in Equiox), we kept the original MRN from the EMR data.

For each patient, we aligned the observed ABG collection time with the nearest EMR ABG collection time. We reviewed all samples where there was a >= 20 minute difference between the observed collection time and the collection time noted in the chart. For these samples, we looked for EMR blood gas results that were a perfect match for the pH, paCO2, and SaO2 between the observed sample and the EMR report for that patient. Errors in the observed collection time were corrected for these samples.

Filtering: The resulting dataset was filtered to keep only samples where the nearest EMR SpO2 observation time was within 10 minutes of the EMR ABG collection time.

Recoding race: In the EMR data, patients could be categorized as more than one race. If one of those races contained “Black or African American”, or “White”, the patient was categorized as Black or White, respectively (except if the race contained both “Black or African American” and “White”, in which case the patient was categorized as “Other”). Missing races were also categorized as “Other”.

Analysis

Our variables of interest were SpO2 for both cohorts, SaO2, bias, and time intervals along the data collection pathway. Descriptive statistics were calculated for variables of interest.

For analysis of ABG and SpO2 collection time differences, descriptive statistics were reported using absolute values instead of relative values to avoid averaging together negative and positive values. Histograms reported relative values to identify whether there were trends in SpO2 values being aligned before or after sample collection.

For the EMR cohort, we calculated the bias as the difference between the EMR SpO2 and the actual SaO2. For the Equiox cohort, we calculated the bias as the difference between the observed SpO2 and the actual SaO2.

Statistic count
Other 2476
White 2307
Asian 2038
Black or African American 1571
White,Other 271
Decline to Answer 136
American Indian or Alaska Native 113
Asian,Other 110
Native Hawaiian or Other Pacific Islander 108
Other,Black or African American 107
Other,Decline to Answer 76
Other,White 63
Other,American Indian or Alaska Native 62
White,Other,Decline to Answer 60
Black or African American,Other 58
Native Hawaiian or Other Pacific Islander,Other,Black or African American 37
Decline to Answer,White,Other 31
Decline to Answer,Other 31
Decline to Answer,White 27
White,Black or African American 26
Other,Asian 26
White,American Indian or Alaska Native 23
Asian,White,Other 16
Black or African American,White 16
Asian,White 14
White,Black or African American,Decline to Answer 13
Other,White,Black or African American 12
Decline to Answer,Asian 12
Native Hawaiian or Other Pacific Islander,American Indian or Alaska Native 11
Other,American Indian or Alaska Native,White 9
Black or African American,Decline to Answer 9
Native Hawaiian or Other Pacific Islander,Asian 6
Asian,Decline to Answer 6
White,Decline to Answer 5
Other,Native Hawaiian or Other Pacific Islander 4
American Indian or Alaska Native,Native Hawaiian or Other Pacific Islander 4
Black or African American,Native Hawaiian or Other Pacific Islander 3
Native Hawaiian or Other Pacific Islander,Black or African American 3
Black or African American,Asian,Other 3
American Indian or Alaska Native,White 3
American Indian or Alaska Native,Other,Decline to Answer 2
White,Other,Asian 1
American Indian or Alaska Native,Black or African American,White 1
American Indian or Alaska Native,Other 1
Black or African American,American Indian or Alaska Native 1
Native Hawaiian or Other Pacific Islander,Other 1
Black or African American,American Indian or Alaska Native,Other 1
Asian,Black or African American 1
White,Asian 1

Results

Correction for time differences

There were 30 samples with a >= 20 minute difference between the observed collection time and the collection time noted in the chart.

20 samples had a perfect match for lab values. In 14 cases, we were able to identify and correct a typo in the observed capture time. The remaining 6 samples with perfect lab value matches were assumed to be true findings of a difference >= 20 minutes.

8 samples with >= 20 minute difference had no corresponding EMR blood gas with matching pH, PaCO2 and SaO2 values. For 2 samples, upon manual chart review, the Equiox sample was found to be taken before the patient had physically been located in the ICU, and therefore as not part of the EMR data pull. These samples were excluded. We assumed that the remaining 6 samples without matching lab values were erroneously classified as clinical samples when in fact they were research samples (and thus run on the research blood gas analyzer instead of hospital lab), and they were excluded.

The remaining 2 samples were excluded because they were before the EMR started recording SaO2.

Descriptive Statistics

The initial EMR data pull yielded 15879 ABG measurements. We filtered this to only include samples within 10 minutes of the ABG measurement, yielding 9921 ABG measurements.

Table 1: Descriptive Statistics
Number Percent
Race
Other 739 51.7%
White 401 28.1%
Black 289 20.2%
Sex
Male 1006 70.4%
Female 422 29.5%
Unknown 1 0.1%
Measurement Variables
Mean (SD) Median (range)
SpO2 (EMR) 97.49 (3.52) 98.0 (17.0, 100.0)
SaO2 97.96 (2.85) 98.7 (43.5, 100.0)
Age at Encounter 58.2 (60.0) 0 - 106

ABG and SpO2 collection time

Table 2: Absolute time difference between observed collection time, EMR recorded collection time, and EMR SpO2 observation time
Variable Mean (std) Median (range)
Observed to collected 3.5 (4.0) 3.0 (0.0-44.0)
EMR collection to EMR SpO2 3.1 (3.2) 2.0 (0.0-10.0)
Observed collection to EMR SpO2 5.2 (4.8) 4.0 (0.0-44.0)

On average, the time between the sample collection and being recorded in the EMR (Figure 1 A to B) was 3.5 (SD 4.0) minutes (Table 2). The time between when the ABG is recorded as collected in the EMR and the nearest SpO2 observation time (C to D) was 0.1 (SD 4.5 ) minutes on average. The time between the observed collection time and the EMR SpO2 observation time (A to D) was 3.5 (SD 6.2) minutes on average.

(a) Time from ABG collection to collection recorded in EMR
(b) Time from EMR collection to nearest EMR SpO2 time
(c) Time from observed collection to EMR SpO2 observation time
Figure 2: Histograms of time differences (relative)

Bias

Bias with direct observation vs EMR

(a) EMR-based bias distribution
(b) Equiox bias distribution
Figure 3: Bias distribution
Table 3: Mean bias for EMR and Equiox cohorts
Variable Mean (std) Median (range)
EMR-based bias -0.5 (3.0) 0.0 (-80.6-53.5)
Equiox bias -1.3 (2.6) -1.0 (-21.9-6.8)

The overall bias in our retrospective EMR cohort was -0.5 (median 0.0), while the bias using the Equiox data was -1.3 (median -1.0) (Table 3, Figure 3). In both cases the bias was negative, suggesting that the SpO2 was lower than the actual SaO2. There is a left shift in the Equiox bias distribution, with more postive bias values found in the EMR cohort. Therefore, using an EMR based method, bias was underestimated by 1.3%.

This difference may be slightly larger at the lower saturations (Figure 4).

Figure 4: SaO2 distribution over SpO2, by data source

Bias by race

There was no obvious difference in bias by race between the EMR and Equiox cohorts. The mean bias was approximately the same across all races (Table 4).

Table 4: Bias by race for EMR and Equiox cohorts
Equiox EMR
Mean Median Mean Median Mean Difference
Black -0.81 (2.96) -0.2 (-21.9, 3.7) 0.0 (3.19) 0.1 (-24.8, 53.5) -0.82
Other -1.3 (2.3) -1.2 (-9.1, 6.8) -0.31 (2.66) 0.0 (-30.0, 39.6) -0.99
White -1.88 (2.66) -1.7 (-19.3, 4.3) -1.04 (3.3) -0.7 (-80.6, 21.2) -0.83

Discussion

Conclusions

There are three main observations from this study:

  1. There is frequently a difference between the time the ABG is collected and the time it is recorded in the EMR. That means that even when limiting the EMR SpO2 observations within 10 minutes of the ABG measurement, the time difference is likely longer because of the charting delay.

  2. There is a difference in the calculated bias between EMR based methods and direct observation. In this study, the difference in bias was approximately 1%, which may be clinically important when assessing the accuracy of SpO2 measurements. This may be more pronounced at lower saturations (nb need statistical testing)

  3. There was no obvious difference in bias by race between the EMR and Equiox cohorts. The mean bias was approximately the same across all races.

Strengths and Limitations

  • Strengths of this study include the use of at the bedside time measuremnts, allowing us to control for diferences introduced because of time lag.

  • Though we gained higher temporal resolution to the data, there were transcription errors which introduce a different kind of bias. We were able to identify and correct many of these errors, but it is possible that some remain undetected. Though, because of the thresholds we used, it is unlikely to change the results.