The Almost Matching Exactly Lab is a joint venture of the Departments of Computer Science and Statistics of Duke University, Durham, North Carolina. The objective of the lab is to conduct cutting-edge research, and apply latest techniques in statistical machine learning and database management. The lab aims to create the highest possible quality of treatment-control matches for categorical (discrete) or continuous covariate datasets.

Multiple algorithms have been devised for different types of datasets, which have been published in top research journals. Implementations of these research papers have been done in Python and R. Tutorials have also been included. We believe that our research work will be of immense use in research in the health sciences, and social sciences, because of the high-quality interpretability of the matches produced.

Expand all Collapse all

What is matching for causal inference?

Researchers may want to replicate a randomized study environment for inferencing causal effects with observational data. Estimating causal effects in an observational setting becomes a problem of representing the available data as if it were collected from a randomized experiment in which individuals are assigned to treatment independently of their potential outcomes. This goal can often be achieved by obtaining treated and control groups with similar covariate distributions through choosing well-matched samples of the original treated and control groups. This process is called matching.

What are the drawbacks of existing matching methods?

Current matching methods involving machine learning rely less on human input, adapting to the geometry of the space and thus yielding better performance. However many of the ML models are black boxes that do not explain their predictions in a way that humans can understand. The lack of transparency and accountability of predictive models can have severe consequences to high-stakes decision making process. Earlier methods used a predetermined distance metric to find the closest matched units to a particular unit. Our various algorithms, built for different types of data, learn the proper metric to used by weighting those covariates more which directly contribute towards the treatment effect more.

How do we handle these issues?

The Almost Matching Exactly Lab is based on the core principles of interpretability, scalability of performance, and adaptability. Our algorithms compute interpretable, high quality exact (or almost exact) matches for both unit-specific continuous datasets and high-dimensional categorical datasets. Exact matching on covariates increases the interpretability and usefulness of causal analyses in several ways: It is a granular causal analysis that can provide crucial information on who benefits from treatment most, where resources should be spent for future treatments, and why some individuals are treated while others were not. Interpretable models do not necessarily lead to sacrifices in accuracy because they allow humans to troubleshoot more effectively. They can provide explanations for treatment effect estimates in a way that pure modeling methods (that do not use matching) cannot. This also helps determine what type of additional data must be collected to control for confounding.