Python Package


Please note that this package requires the installation of Python version 3 or higher. If you do not already have an updated version of Python installed on your computer, you can download the latest version here. Additionally, this package is dependent on Pandas, Scikit-learn, and Numpy. If your version of Python does not have these packages, you can install them here. We recommend installing the Anaconda package, which comes with the Python libraries our software is dependent on already pre-installed.

Installation

MALTS is currently available in Python for download on the almost-matching-exactly Github or for installation via PyPi (recommended). The package including the MALTS algorithm can be installed with the following command:

pip install pymalts2
Copied to clipboard!

Input Data Format

To begin using the MALTS algorithm, first ensure that your dataset meets the necessary requirements. The data can be stored as a CSV/Excel file or as a Python Pandas Data Frame. MALTS was designed for continuous covariates, where different covariates can be stored as float, double or even integer data types. For categorical datatypes, please consider using DAME or FLAME instead, unless the different categories have a specific binning sequence. In addition to the covariate columns, your dataset should include a column of binary integer data types which specify whether a unit is treated (1) or control (0) and a column of integer or float data types which specify unit outcomes. There are no requirements for covariate column names or the ordering of the columns. Below is a sample dataset in the required format:
x_1 (numeric) x_2 (numeric) ... x_m (numeric) treated (binary) outcome (numeric)
1.3276 2.0529 ... 4.7905 1 4.5321
2.62 3.9932 ... 7.6513 0 3.3348
... ... ... ... ... ...
2.2973 6.9321 ... 1.5848 1 6.9320

Usage

To run the algorithm, we first need to import the necessary libraries required, and import the dataset as a Pandas DataFrame.
import pymalts2 as pymalts
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(0)
sns.set(0)
df = pd.read_csv("data.csv", index_col=0)
To set up the model, we call the malts_mf function from the pymalts module, and specify which columns are the treatment variable and the outcome variable. The MG_matrix denotes the matrix with rows denoting query units and the columns denoting matched units. The weight ( the value in the cell (i,j) ) corresponds to the numbers of times a unit is included in a matched group across M-folds.
m = pymalts.malts_mf( outcome='outcome', treatment='treated', data=df)
print (m.MG_matrix)
We can use this MG_Matrix to visualize the weights for the matched group for a particular unit, and plot that on bar charts.
MG1 = m.MG_matrix.loc[1] 
MG1[MG1>1].sort_values(ascending=False).plot(kind='bar',figsize=(20,5))
To find the conditional average treatment effect CATE and the average treatment effect ATE, we can output the CATE_df dataframe, storing the data. The mean value of the avg.CATE column gives the value of ATE for the dataset. We can visualize the ATE and the probability density function of the CATE in graphs.
ATE = m.CATE_df['avg.CATE'].mean()
print (ATE)
fig = plt.figure(figsize=(10,5))
sns.kdeplot(m.CATE_df['avg.CATE'],shade=True)
plt.axvline(ATE,c='black')
plt.text(ATE-4,0.04,'$\hat{ATE}$',rotation=90)

MALTS - Parameters and Defaults

malts_mf(outcome,treatment,data,discrete=[],C=1,k_tr=15,k_est=50, estimator="linear",smooth_cate=True,reweight=False,n_splits=5, n_repeats=1,output_format="brief")
Expand all Collapse all

Required Parameters

data:
file, Dataframe, required
The data to be matched. Preferably, the data should be in the form of a Python Pandas DataFrame.
outcome:
string, required
The column name containing the name of the outcome variable, which itself is numeric.
treated:
string, required
The column name denoting whether the unit is treated or control for the matching procedure.

Optional Parameters

discrete:
list, default = [ ]
The list of columns that have been dummified (discrete).
C:
integer, default = 1
The regularization constant used in the objective method with the matrix.
k_tr:
integer, default = 15
The size of the matched group in the training step.
k_est:
integer, default = 50
The size of the matched group in the estimation step.
estimator:
string, default = 'linear'
The method used to estimate the CATE value inside a matched group. The possible options are 'linear', 'mean' or 'RF', which use ridge regression, mean regression, and Random Forest regression, respectively.
smooth_cate:
boolean, default = True
Boolean to specify whether the CATE estimates should be smoothened by using a regression model to obtain a fit.
reweight:
boolean, default = False
Boolean to specify if treatment and control groups should be reweighted as per their sample sizes in the training step.
n_splits:
integer, default = 5
The number of splits of the data when n_split-fold procedure is used.
n_repeats:
integer, default = 1
The number of times the whole procedure is repeated.
output_format:
string, default = 'brief'
The style in which the output CATE dataframe is to be displayed. Possible options are 'brief' and 'full'. If 'full' is chosen, the entire dataframe is displayed, if 'brief' is chosen, only the columns 'avg_CATE', 'std_CATE', 'outcome', and 'treatment' are displayed.

Examining a Matched Group Further

We can further examine within a particular matched group, by using the lmplot and regplot methods within seaborn. We can plot two different covariates for matched units for a particular unit, and find best-fit lines for the treatment units, and the control units among these matched units. For this, we need to obtain the data for the indices which are matched to a unit with a given index. For example, here is the code for plotting the X_1 and X_2 marginal, for the matched group for Unit-0.
MG0 = m.MG_matrix.loc[0]
matched_units_idx = MG0[MG0!=0].index 
matched_units = df.loc[matched_units_idx]
sns.lmplot(x='X1', y='X2', hue='treated', data=matched_units,palette="Set1")
plt.scatter(x=[df.loc[0,'X1']],y=[df.loc[0,'X2']],c='black',s=100)
plt.title('Matched Group for Unit-0')

Plotting CATE against a Covariate

We can also plot the CATE for different units against the values for a particular covariate. The following code plots CATE against Covariate X1. We can also obtain a best-fit polynomial; in the code below, we use a degree 2 polynomial.
data_w_cate=pd.concat([df, m.CATE_df], axis=1)
data_w_cate = data_w_cate.drop(columns=['outcome','treated']) 
sns.regplot( x='X1', y='avg.CATE', data=data_w_cate, scatter_kws={'alpha':0.5,'s':2}, line_kws={'color':'black'}, order=2 )