Please note that this package requires the installation of Python version 3 or higher.
If you do not already have an updated version of Python installed on your computer, you can download the latest version here.
Additionally, this package is dependent on Pandas, Scikit-learn, and Numpy. If your version of Python does not have these packages, you can install them here.
We recommend installing the Anaconda package, which comes with the Python libraries our software is dependent on already pre-installed.

MALTS is currently available in Python for download on the almost-matching-exactly Github or for installation via PyPi (recommended). The package including the MALTS algorithm can be installed with the following command:

`pip install pymalts2`

Copied to clipboard!

To begin using the MALTS algorithm, first ensure that your dataset meets the necessary requirements. The data can be stored as a CSV/Excel file or as a **Python Pandas Data Frame**.
MALTS was designed for continuous covariates, where different covariates can be stored as *float*, *double* or even *integer* data types. For categorical datatypes, please consider using **DAME** or **FLAME** instead, unless the different categories have a specific binning sequence.
In addition to the covariate columns, your dataset should include a column of binary *integer* data types which specify whether a unit is treated (1) or control (0)
and a column of *integer* or *float* data types which specify unit outcomes. There are no requirements for covariate column names or the ordering of the columns.
Below is a sample dataset in the required format:

x_1 (numeric) | x_2 (numeric) | ... | x_m (numeric) | treated (binary) | outcome (numeric) |
---|---|---|---|---|---|

1.3276 | 2.0529 | ... | 4.7905 | 1 | 4.5321 |

2.62 | 3.9932 | ... | 7.6513 | 0 | 3.3348 |

... | ... | ... | ... | ... | ... |

2.2973 | 6.9321 | ... | 1.5848 | 1 | 6.9320 |

To run the algorithm, we first need to import the necessary libraries required, and import the dataset as a Pandas DataFrame.
To set up the model, we call the malts_mf function from the pymalts module, and specify which columns are the treatment variable and the outcome variable. The MG_matrix denotes
the matrix with rows denoting query units and the columns denoting matched units. The weight ( the value in the cell (i,j) ) corresponds to the numbers of times a unit is included in a matched group across M-folds.
We can use this MG_Matrix to visualize the weights for the matched group for a particular unit, and plot that on bar charts.
To find the conditional average treatment effect CATE and the average treatment effect ATE, we can output the CATE_df dataframe, storing the data. The mean value of the avg.CATE column
gives the value of ATE for the dataset. We can visualize the ATE and the probability density function of the CATE in graphs.

import pymalts2 as pymalts import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns np.random.seed(0) sns.set(0) df = pd.read_csv("data.csv", index_col=0)

m = pymalts.malts_mf( outcome='outcome', treatment='treated', data=df) print (m.MG_matrix)

MG1 = m.MG_matrix.loc[1] MG1[MG1>1].sort_values(ascending=False).plot(kind='bar',figsize=(20,5))

ATE = m.CATE_df['avg.CATE'].mean() print (ATE) fig = plt.figure(figsize=(10,5)) sns.kdeplot(m.CATE_df['avg.CATE'],shade=True) plt.axvline(ATE,c='black') plt.text(ATE-4,0.04,'$\hat{ATE}$',rotation=90)

data:file, Dataframe, required |
The data to be matched. Preferably, the data should be in the form of a Python Pandas DataFrame. |

outcome:string, required |
The column name containing the name of the outcome variable, which itself is numeric. |

treated:string, required |
The column name denoting whether the unit is treated or control for the matching procedure. |

discrete:list, default = [ ] |
The list of columns that have been dummified (discrete). |

C:integer, default = 1 |
The regularization constant used in the objective method with the matrix. |

k_tr:integer, default = 15 |
The size of the matched group in the training step. |

k_est:integer, default = 50 |
The size of the matched group in the estimation step. |

estimator:string, default = 'linear' |
The method used to estimate the CATE value inside a matched group. The possible options are 'linear', 'mean' or 'RF', which use ridge regression, mean regression, and Random Forest regression, respectively. |

smooth_boolean, default = True |
Boolean to specify whether the CATE estimates should be smoothened by using a regression model to obtain a fit. |

reweight:boolean, default = False |
Boolean to specify if treatment and control groups should be reweighted as per their sample sizes in the training step. |

n_splits:integer, default = 5 |
The number of splits of the data when n_split-fold procedure is used. |

n_repeats:integer, default = 1 |
The number of times the whole procedure is repeated. |

output_string, default = 'brief' |
The style in which the output CATE dataframe is to be displayed. Possible options are 'brief' and 'full'. If 'full' is chosen, the entire dataframe is displayed, if 'brief' is chosen, only the columns 'avg_CATE', 'std_CATE', 'outcome', and 'treatment' are displayed. |

We can further examine within a particular matched group, by using the lmplot and regplot methods within seaborn. We can plot two different covariates for matched units for a particular unit, and find best-fit lines for the treatment units, and the control units among these matched units.
For this, we need to obtain the data for the indices which are matched to a unit with a given index. For example, here is the code for plotting the X_1 and X_2 marginal, for the matched group for Unit-0.

MG0 = m.MG_matrix.loc[0] matched_units_idx = MG0[MG0!=0].index matched_units = df.loc[matched_units_idx] sns.lmplot(x='X1', y='X2', hue='treated', data=matched_units,palette="Set1") plt.scatter(x=[df.loc[0,'X1']],y=[df.loc[0,'X2']],c='black',s=100) plt.title('Matched Group for Unit-0')

We can also plot the CATE for different units against the values for a particular covariate. The following code plots CATE against Covariate X1. We can also obtain a best-fit polynomial; in the code below, we use a degree 2 polynomial.

data_w_cate=pd.concat([df, m.CATE_df], axis=1) data_w_cate = data_w_cate.drop(columns=['outcome','treated']) sns.regplot( x='X1', y='avg.CATE', data=data_w_cate, scatter_kws={'alpha':0.5,'s':2}, line_kws={'color':'black'}, order=2 )