Frankel’s health attention index – Gibran Hemani's lab book

Background

Frankel 1989 discussed that part of the issue of waiting lists was that professionals would prioritise treatment for things that were more fashionable rather than more urgent or prevalent.

This is the dataset:

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(data.table)


Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, first, last

library(ggplot2)
library(ggrepel)
library(knitr)
dat <- fread("frankel-index.csv")
kable(dat)

Diagnoses	Discharges and deaths	Index of interest (Papers/D&D × 1000)
Slow virus diseases of CNS	40	2000.0
Myasthenia Gravis	930	156.0
Crohn’s disease	6670	44.0
Carcinoma of the breast	41220	33.0
Rheumatoid arthritis	26060	27.0
Carcinoma of the bronchus	54440	20.0
Myocardial infarction	102720	10.0
Cerebrovascular disease	111250	7.7
Irritable bowel etc.*	19840	6.7
Cataract	54990	6.5
Hip replacement	37400	6.0
Haemorrhoids	20700	5.0
Inguinal hernia	64400	1.8
Tonsils and adenoids	76600	0.7
Varicose veins	47160	0.6

Work out the number of papers per disease

names(dat) <- c("disease", "dd", "index")
dat$papers <- dat$index * dat$dd / 1000

The paper points out that there are a highly disproportionate number of papers for slow viruses of the CNS. This does look like a major misalignment of resources and needs presented this way, but it is a ratio of two low numbers.

ggplot(dat, aes(x=dd, y=index)) +
geom_point() +
geom_text_repel(aes(label=disease), size=2)

Instead, plot papers against numbers of deaths and discharges. This might more reflect the degree to which there is misalignment between resources and health needs

ggplot(dat, aes(x=dd, y=papers)) +
geom_point() +
geom_text_repel(aes(label=disease), size=2) +
geom_smooth(method="lm")

`geom_smooth()` using formula = 'y ~ x'

The fraction of all papers that were on the slow viruses of the CNS was 0.012.

Inequality analysis

The Gini index represents the degree to which resources are equally allocated. A high Gini index means that a large fraction of resources are attributed to a small number of categories. The concentration index examines the degree to which another variable can explain the inequality within a Gini coefficient. A concentration index that is equal to the Gini index means that there is perfect alignment of the second variable to the resource. A concentration index of 0 means that there is no alignment (it’s basically random).

This plot shows that there is unequal attention to diseases in the literature, and there does match to some degree the unequal distribution of disease burden.

library(rineq)

out1 <- rineq::ci(
    ineqvar = dat$papers,
    outcome = dat$papers, 
    method = "direct"
)
gini <- tibble(
    ci = out1$concentration_index,
    ci_se = sqrt(out1$variance),
    ci_lci = ci - 1.96 * ci_se,
    ci_uci = ci + 1.96 * ci_se
)

out2 <- rineq::ci(
    ineqvar = dat$papers,
    outcome = dat$dd, 
    method = "direct"
)
ciout <- tibble(
    ci = out2$concentration_index,
    ci_se = sqrt(out2$variance),
    ci_lci = ci - 1.96 * ci_se,
    ci_uci = ci + 1.96 * ci_se
)

make_plot_dat <- function(x) {
  myOrder <- order(x$fractional_rank)
  xCoord <- x$fractional_rank[myOrder]
  y <- x$outcome[myOrder]
  cumdist <- cumsum(y) / sum(y)
  tibble(xCoord, cumdist)
}

plot_dat <- bind_rows(
    make_plot_dat(out1) %>% mutate(group=paste0("Number of papers (Gini index = ", round(out1$concentration_index, 2), ")")), 
    make_plot_dat(out2) %>% mutate(group=paste0("Deaths and discharges (concentration index = ", round(out2$concentration_index, 2), ")"))
)

ggplot(aes(x = xCoord, y = cumdist, group = group), data = plot_dat) +
  geom_line(aes(colour = group)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed") +
  theme_bw() +
  theme(legend.position = "inside", legend.position.inside=c(0.3,0.8)) +
  labs(
    x = "Fractional rank of number of papers",
    y = "Cumulative proportion of deaths and discharges",
    colour = "Outcome"
  )

However the concentration index has quite a large confidence interval.

ciout %>% kable

ci	ci_se	ci_lci	ci_uci
0.1328638	0.1044544	-0.0718669	0.3375944

Summary

There is some concordance between the inequality in disease attention and the disease impact, but these estimates are imprecise.

sessionInfo()

R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rineq_0.3.0       knitr_1.50        ggrepel_0.9.6     ggplot2_3.5.2    
[5] data.table_1.17.8 dplyr_1.1.4      

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5        nlme_3.1-168       cli_3.6.5          rlang_1.1.6       
 [5] xfun_0.52          generics_0.1.4     jsonlite_2.0.0     labeling_0.4.3    
 [9] glue_1.8.0         htmltools_0.5.8.1  scales_1.4.0       rmarkdown_2.29    
[13] grid_4.5.1         evaluate_1.0.4     tibble_3.3.0       fastmap_1.2.0     
[17] yaml_2.3.10        lifecycle_1.0.4    compiler_4.5.1     RColorBrewer_1.1-3
[21] Rcpp_1.1.0         htmlwidgets_1.6.4  pkgconfig_2.0.3    mgcv_1.9-3        
[25] lattice_0.22-7     farver_2.1.2       digest_0.6.37      R6_2.6.1          
[29] tidyselect_1.2.1   splines_4.5.1      pillar_1.11.0      magrittr_2.0.3    
[33] Matrix_1.7-3       withr_3.0.2        tools_4.5.1        gtable_0.3.6