Python Package


Please note that this package requires the installation of Python version 3 or higher. If you do not already have an updated version of Python installed on your computer, you can download the latest version here. Additionally, this package is dependent on Pandas, Scikit-learn, and Numpy. If your version of Python does not have these packages, you can install them here. We recommend installing the Anaconda package, which comes with the Python libraries our software is dependent on already pre-installed.

Installation

DAME is currently available in Python for download on the almost-matching-exactly Github or for installation via PyPi (recommended). The package includes both the DAME and FLAME algorithms and can be installed with the following command:

pip install dame-flame
Copied to clipboard!

Input Data Format

To begin using the DAME algorithm, first ensure that your dataset meets the necessary requirements. The data can be stored as a CSV/Excel file or as a Python Pandas Data Frame. Remember, all covariates should be catagorical and expressed as integer data types. If there are continuous covariates, please consider regrouping. In addition to the covariate columns, your dataset should include a column of binary integer data types which specify whether a unit is treated (1) or control (0) and a column of integer or float data types which specify unit outcomes. There are no requirements for covariate column names or the ordering of the columns. Below is a sample dataset in the required format:
x_1 (integer) x_2 (integer) ... x_m (integer) treated (binary) outcome (numeric)
1 2 ... 4 1 4.5
2 3 ... 7 0 3.33
... ... ... ... ... ...
2 6 ... 1 1 6

Usage

To run the algorithm, use the DAME function as shown in line 5. Although the DAME function can operate on a file or a dataframe, the additional functions included in the package require the latter format. As such, it is recommended to first convert your data to a Python Pandas Data Data frame as shown in line 4:
import pandas as pd
import dame_flame

df = pd.read_csv("data.csv")
result = dame_flame.DAME_FLAME.DAME(input_data=df, treatment_column_name="treated", outcome_column_name="outcome")
print(result[0])
The object result is by default a list of one entry, the first and only element of which is a dataframe containing all of the units that were matched and the covariates that they were matched on. The covariates that each unit was not matched on are denoted with asterisks. The output of the algorithm may contain additional values based on optional parameters which are detailed in the next section.

To find the main matched group of a particular unit after running DAME, use the function mmg_of_unit as shown below:
mmg = dame_flame.DAME_FLAME.mmg_of_unit(return_df=result[0], unit_id=0, input_data=df)
print(mmg)
To find the estimated treatment effect on a particular unit, use the function te_of_unit as shown below:
te = dame_flame.DAME_FLAME.te_of_unit(return_df=result[0], unit_id=0, input_data=df, treatment_column_name="treated", outcome_column_name="outcome")
print(te)

DAME - Parameters and Defaults

DAME(input_data, treatment_column_name="treated", weight_array=False, outcome_column_name="outcome", adaptive_weights="ridge", alpha=0.1, holdout_data=False, repeats=True, verbose=2, want_pe=False, want_bf=False, early_stop_iterations=False, stop_unmatched_c=False, stop_unmatched_t=False, early_stop_un_c_frac=0.1, early_stop_un_t_frac=0.1, early_stop_pe=False, early_stop_bf=False, early_stop_pe_frac=0.01, early_stop_bf_frac=0.01, missing_indicator=numpy.nan, missing_holdout_replace=0, missing_data_replace=0, missing_holdout_imputations=10, missing_data_imputations=0)
Expand all Collapse all

Key Parameters

* denotes DAME-specific parameters
input_data:
file, Dataframe, required
The data to be matched.
treatment_column_name:
string, optional (default = 'treated')
The name of the column which specifies whether a unit is treated or control.
*weight_array:
array, optional (default = False)
Array of weights for all covariates in input_data. Only needed if adaptive_weights = False.
outcome_column_name:
string, optional (default = 'outcome')
The name of the column which specifies each unit outcome.
adaptive_weights:
string, optional (default = 'ridge')
The weight dropping method to be used.
If False, implements no weight dropping method.
If 'ridge', implement ridge regression.
If 'ridgeCV', implement ridge regression with cross-validation.
If 'decision tree', implement decision tree regression.
alpha:
float, optional (default = 0.1)
This is the alpha for ridge regression. We use the scikit package for ridge regression, so it is "regularization strength". Larger values specify stronger regularization. Must be positive float.
holdout_data:
file, Dataframe, optional (default = False)
The data used in the holdout training set. Only required if doing an adaptive_weights version. If False, 10% of the input data will be randomly selected for the training set.
repeats:
bool, optional (default = True)
Specifies whether values for which a main matched group has been found can be used again and placed in an auxiliary matched group.
verbose:
integer, optional (default = 2):
Controls how progress is displayed while the algorithm is running.
If 0, prints nothing.
If 1, prints iteration number.
If 2, prints the iteration number and number of units left to match on every 10th iteration.
If 3, prints iteration number and number of units left to match on every iteration.
want_pe:
bool, optional (default = False)
Specifies whether the output will include the predictive error of the covariate sets matched on in each iteration.
want_bf:
bool, optional (default = False)
Specifies whether the output will include the balancing factor of each iteration.

Early Stopping Parameters

early_stop_iterations:
integer, optional (default = False):
If provided, specifies the number of iterations after which to hard-stop the algorithm.
stop_unmatched_c:
bools, optional (default = False)
Specifies whether the algorithm stops when there are no control units remaining to match.
stop_unmatched_t:
bools, optional (default = False)
Specifies whether the algorithm stops when there are no treatment units remaining to match.
early_stop_un_c_frac:
float, optional (default = 0.1)
Minimum acceptable proportion of unmatched treatment units after which the algorithm will stop iterating. Must be between 0 and 1.
early_stop_un_t_frac:
float, optional (default = 0.1)
Minimum acceptable proportion of unmatched treatment units after which the algorithm will stop iterating. Must be between 0 and 1.
early_stop_pe:
bool, optional (default = False)
If True, then the algorithm will hard-stop once the covariate set chosen for matching has a predictive error greater than the threshold provided in early_stop_pe_frac
early_stop_bf:
bool, optional (default = False)
If True, then the algorithm will hard-stop once the covariate set chosen for matching has a balancing factor lower than the threshold provided in early_stop_bf_frac
early_stop_pe_frac:
float, optional (default = 0.01)
If early_stop_pe is True, then the algorithm will hard-stop once the covariate set chosen to match on has a predictive error greater than this value.
early_stop_bf_frac:
float, optional (default = 0.01)
If early_stop_bf is True, then the algorithm will hard-stop once the covariate set chosen to match on has a balancing factor lower than this value.

Missing Data Parameters

missing_indicator:
string, integer, np.nan, optional (default = np.nan)
This is the indicator for missing data in the dataset. For example, if missing values are denoted with "NA", set this paramater equal to "NA".
missing_holdout_replace:
integer, optional (default = 0):
If 0, assume no missingness in holdout data.
If 1, drop all units with missingness from holdout dataset.
If 2, impute missing values using MICE on holdout dataset for the number of imputations specified by missing_holdout_imputations
missing_data_replace:
integer, optional (default = 0):
If 0, assume no missingness in matching data.
If 1, drop all units with missingness from matching dataset.
If 2, do not match units on covariates that are missing.
If 3, impute missing values using MICE on matching dataset for the number of imputations specified by missing_data_imputations.
missing_holdout_imputations:
integer, optional (default = 10)
If missing_holdout_replace=2, specifies the number of imputations on the holdout set.
missing_data_imputations:
integer, optional (default = 1)
If missing_data_replace=3, specifies the number of imputations on the matching set.

Additional Functions - Parameters and Defaults

#returns main matched group of specified unit mmg_of_unit(return_df, unit_id, input_data, output_style=1) #returns treatment effect on specified unit te_of_unit(return_df, unit_id, input_data, treatment_column_name, outcome_column_name) #returns main matched group and treatment effect for specified unit and includes fancy version of above functions mmg_and_te_of_unit(return_df, unit_id, input_data, treatment_column_name, outcome_column_name, return_vals=1)

Key Parameters

return_df:
Dataframe, required
The dataframe containing all of the matches, obtained from the first element in the output of the algorithm.
unit_id:
integer, required
The index identification for the unit of interest.
input_data:
Dataframe, required
Dataframe containing the original dataset which has already been matched using DAME.
treatment_column_name:
string, required
The name of the column in input_data which specifies whether a unit is treated or control.
outcome_column_name:
string, required
The name of the column in input_data which specifies each unit outcome.
output_style:
integer, optional (default = 1)
If 1, include only the covariates matched on in the output.
If 2, include all attributes in the output.
return_vals:
integer, optional (default = 1)
If 0, print the main matched group and treatment effect for the specified unit (fancy version).
Else, return the main matched group and treatment effect for the specified unit with no print statement (default version).