Fit logistic regression model#
based on different imputation methods
baseline: reference
model: any other selected imputation method
Parameters#
Default and set parameters for the notebook.
folder_data: str = '' # specify data directory if needed
fn_clinical_data = "data/ALD_study/processed/ald_metadata_cli.csv"
folder_experiment = "runs/appl_ald_data/plasma/proteinGroups"
model_key = 'VAE'
target = 'kleiner'
sample_id_col = 'Sample ID'
cutoff_target: int = 2 # => for binarization target >= cutoff_target
file_format = "csv"
out_folder = 'diff_analysis'
fn_qc_samples = '' # 'data/ALD_study/processed/qc_plasma_proteinGroups.pkl'
baseline = 'RSN' # default is RSN, as this was used in the original ALD Niu. et. al 2022
template_pred = 'pred_real_na_{}.csv' # fixed, do not change
# Parameters
cutoff_target = 0.5
folder_experiment = "runs/alzheimer_study"
target = "AD"
baseline = "PI"
model_key = "VAE"
out_folder = "diff_analysis"
fn_clinical_data = "runs/alzheimer_study/data/clinical_data.csv"
root - INFO Removed from global namespace: folder_data
root - INFO Removed from global namespace: fn_clinical_data
root - INFO Removed from global namespace: folder_experiment
root - INFO Removed from global namespace: model_key
root - INFO Removed from global namespace: target
root - INFO Removed from global namespace: sample_id_col
root - INFO Removed from global namespace: cutoff_target
root - INFO Removed from global namespace: file_format
root - INFO Removed from global namespace: out_folder
root - INFO Removed from global namespace: fn_qc_samples
root - INFO Removed from global namespace: baseline
root - INFO Removed from global namespace: template_pred
root - INFO Already set attribute: folder_experiment has value runs/alzheimer_study
root - INFO Already set attribute: out_folder has value diff_analysis
{'baseline': 'PI',
'cutoff_target': 0.5,
'data': PosixPath('runs/alzheimer_study/data'),
'file_format': 'csv',
'fn_clinical_data': 'runs/alzheimer_study/data/clinical_data.csv',
'fn_qc_samples': '',
'folder_data': '',
'folder_experiment': PosixPath('runs/alzheimer_study'),
'model_key': 'VAE',
'out_figures': PosixPath('runs/alzheimer_study/figures'),
'out_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE'),
'out_metrics': PosixPath('runs/alzheimer_study'),
'out_models': PosixPath('runs/alzheimer_study'),
'out_preds': PosixPath('runs/alzheimer_study/preds'),
'sample_id_col': 'Sample ID',
'target': 'AD',
'template_pred': 'pred_real_na_{}.csv'}
Load data#
Load target#
target = pd.read_csv(args.fn_clinical_data,
index_col=0,
usecols=[args.sample_id_col, args.target])
target = target.dropna()
target
| AD | |
|---|---|
| Sample ID | |
| Sample_000 | 0 |
| Sample_001 | 1 |
| Sample_002 | 1 |
| Sample_003 | 1 |
| Sample_004 | 1 |
| ... | ... |
| Sample_205 | 1 |
| Sample_206 | 0 |
| Sample_207 | 0 |
| Sample_208 | 0 |
| Sample_209 | 0 |
210 rows × 1 columns
MS proteomics or specified omics data#
Aggregated from data splits of the imputation workflow run before.
pimmslearn.io.datasplits - INFO Loaded 'train_X' from file: runs/alzheimer_study/data/train_X.csv
pimmslearn.io.datasplits - INFO Loaded 'val_y' from file: runs/alzheimer_study/data/val_y.csv
pimmslearn.io.datasplits - INFO Loaded 'test_y' from file: runs/alzheimer_study/data/test_y.csv
Sample ID protein groups
Sample_170 P22792 16.491
Sample_172 A0A0C4DH25 21.055
Sample_028 P14151;P14151-2 16.571
Sample_161 P02787 26.734
Sample_018 A6NC48;Q10588;Q10588-2 16.629
Name: intensity, dtype: float64
Get overlap between independent features and target
Select by ALD criteria#
Use parameters as specified in ALD study.
root - INFO Initally: N samples: 210, M feat: 1421
root - INFO Dropped features quantified in less than 126 samples.
root - INFO After feat selection: N samples: 210, M feat: 1213
root - INFO Min No. of Protein-Groups in single sample: 754
root - INFO Finally: N samples: 210, M feat: 1213
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | A0A075B6J9 | ... | Q9Y653;Q9Y653-2;Q9Y653-3 | Q9Y696 | Q9Y6C2 | Q9Y6N6 | Q9Y6N7;Q9Y6N7-2;Q9Y6N7-4 | Q9Y6R7 | Q9Y6X5 | Q9Y6Y8;Q9Y6Y8-2 | Q9Y6Y9 | S4R3U6 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | NaN | ... | 16.012 | 15.178 | NaN | 15.050 | 16.842 | 19.863 | NaN | 19.563 | 12.837 | 12.805 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 19.941 | 18.786 | 17.144 | NaN | 19.067 | 16.188 | ... | 15.528 | 15.576 | NaN | 14.833 | 16.597 | 20.299 | 15.556 | 19.386 | 13.970 | 12.442 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | NaN | ... | 15.229 | 14.728 | 13.757 | 15.118 | 17.440 | 19.598 | 15.735 | 20.447 | 12.636 | 12.505 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | 13.438 | ... | 15.495 | 14.590 | 14.682 | 15.140 | 17.356 | 19.429 | NaN | 20.216 | 12.627 | 12.445 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | 14.495 | ... | 14.757 | 15.094 | 14.048 | 15.256 | 17.075 | 19.582 | 15.328 | 19.867 | 13.145 | 12.235 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 17.705 | 17.039 | NaN | 16.413 | 19.102 | 16.064 | ... | 15.235 | 15.684 | 14.236 | 15.415 | 17.551 | 17.922 | 16.340 | 19.928 | 12.929 | 11.802 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | 15.288 | ... | 15.422 | 16.106 | NaN | 15.345 | 17.084 | 18.708 | 14.249 | 19.433 | NaN | NaN |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | 17.580 | ... | 15.808 | 16.098 | 14.403 | 15.715 | 16.586 | 18.725 | 16.138 | 19.599 | 13.637 | 11.174 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.908 | 17.530 | NaN | 16.119 | 18.368 | 15.202 | ... | 15.157 | 16.712 | NaN | 14.640 | 16.533 | 19.411 | 15.807 | 19.545 | 13.216 | NaN |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | 16.532 | ... | 15.237 | 15.652 | 15.211 | 14.205 | 16.749 | 19.275 | 15.732 | 19.577 | 11.042 | 11.791 |
210 rows × 1213 columns
Number of complete cases which can be used:
Samples available both in proteomics data and for target: 210
Load imputations from specified model#
missing values pred. by VAE: runs/alzheimer_study/preds/pred_real_na_VAE.csv
Sample ID protein groups
Sample_077 P35968;P35968-2;P35968-3 15.665
Sample_050 P02100 19.109
Sample_119 Q9NZU1 12.266
Name: intensity, dtype: float64
Load imputations from baseline model#
Sample ID protein groups
Sample_000 A0A075B6J9 11.915
A0A075B6Q5 13.301
A0A075B6R2 11.133
A0A075B6S5 12.923
A0A087WSY4 14.332
...
Sample_209 Q9P1W8;Q9P1W8-2;Q9P1W8-4 11.945
Q9UI40;Q9UI40-2 12.911
Q9UIW2 13.315
Q9UMX0;Q9UMX0-2;Q9UMX0-4 11.697
Q9UP79 13.540
Name: intensity, Length: 46401, dtype: float64
Modeling setup#
General approach:
use one train, test split of the data
select best 10 features from training data
X_train,y_trainbefore binarization of targetdichotomize (binarize) data into to groups (zero and 1)
evaluate model on the test data
X_test,y_test
Repeat general approach for
all original ald data: all features justed in original ALD study
all model data: all features available my using the self supervised deep learning model
newly available feat only: the subset of features available from the self supervised deep learning model which were newly retained using the new approach
All data:
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H7 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | ... | Q9Y653;Q9Y653-2;Q9Y653-3 | Q9Y696 | Q9Y6C2 | Q9Y6N6 | Q9Y6N7;Q9Y6N7-2;Q9Y6N7-4 | Q9Y6R7 | Q9Y6X5 | Q9Y6Y8;Q9Y6Y8-2 | Q9Y6Y9 | S4R3U6 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 17.301 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | ... | 16.012 | 15.178 | 14.073 | 15.050 | 16.842 | 19.863 | 15.886 | 19.563 | 12.837 | 12.805 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 13.796 | 19.941 | 18.786 | 17.144 | 16.834 | 19.067 | ... | 15.528 | 15.576 | 13.971 | 14.833 | 16.597 | 20.299 | 15.556 | 19.386 | 13.970 | 12.442 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 18.175 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | ... | 15.229 | 14.728 | 13.757 | 15.118 | 17.440 | 19.598 | 15.735 | 20.447 | 12.636 | 12.505 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 15.963 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | ... | 15.495 | 14.590 | 14.682 | 15.140 | 17.356 | 19.429 | 15.911 | 20.216 | 12.627 | 12.445 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 15.473 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | ... | 14.757 | 15.094 | 14.048 | 15.256 | 17.075 | 19.582 | 15.328 | 19.867 | 13.145 | 12.235 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 15.765 | 17.705 | 17.039 | 16.255 | 16.413 | 19.102 | ... | 15.235 | 15.684 | 14.236 | 15.415 | 17.551 | 17.922 | 16.340 | 19.928 | 12.929 | 11.802 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 15.594 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | ... | 15.422 | 16.106 | 14.310 | 15.345 | 17.084 | 18.708 | 14.249 | 19.433 | 11.340 | 11.015 |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 14.865 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | ... | 15.808 | 16.098 | 14.403 | 15.715 | 16.586 | 18.725 | 16.138 | 19.599 | 13.637 | 11.174 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.356 | 14.908 | 17.530 | 16.982 | 16.119 | 18.368 | ... | 15.157 | 16.712 | 14.390 | 14.640 | 16.533 | 19.411 | 15.807 | 19.545 | 13.216 | 11.081 |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 14.809 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | ... | 15.237 | 15.652 | 15.211 | 14.205 | 16.749 | 19.275 | 15.732 | 19.577 | 11.042 | 11.791 |
210 rows × 1421 columns
Subset of data by ALD criteria#
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | A0A075B6K4 | ... | O14793 | O95479;R4GMU1 | P01282;P01282-2 | P10619;P10619-2;X6R5C5;X6R8A1 | P21810 | Q14956;Q14956-2 | Q6ZMP0;Q6ZMP0-2 | Q9HBW1 | Q9NY15 | P17050 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | 16.148 | ... | 12.796 | 12.072 | 12.224 | 13.002 | 13.754 | 13.997 | 12.142 | 13.610 | 11.158 | 14.105 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 19.941 | 18.786 | 17.144 | 13.300 | 19.067 | 16.127 | ... | 12.937 | 13.228 | 13.141 | 12.546 | 12.103 | 12.394 | 13.008 | 13.295 | 13.054 | 12.728 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | 15.387 | ... | 12.996 | 11.668 | 13.147 | 12.657 | 12.306 | 12.865 | 11.113 | 12.154 | 13.995 | 14.586 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | 16.565 | ... | 13.793 | 12.744 | 13.246 | 12.475 | 12.893 | 12.075 | 11.584 | 13.531 | 12.526 | 14.343 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | 16.418 | ... | 13.331 | 12.979 | 13.325 | 13.773 | 12.813 | 14.709 | 13.847 | 13.119 | 12.045 | 13.734 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 17.705 | 17.039 | 12.955 | 16.413 | 19.102 | 15.350 | ... | 14.269 | 14.064 | 16.826 | 18.182 | 15.225 | 15.044 | 14.192 | 16.605 | 14.995 | 14.257 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | 16.582 | ... | 14.273 | 17.700 | 16.802 | 20.202 | 15.280 | 15.086 | 13.978 | 18.086 | 15.557 | 14.171 |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | 15.768 | ... | 14.473 | 16.882 | 16.917 | 20.105 | 15.690 | 15.135 | 13.138 | 17.066 | 15.706 | 15.690 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.908 | 17.530 | 13.800 | 16.119 | 18.368 | 17.560 | ... | 15.234 | 17.175 | 16.521 | 18.859 | 15.305 | 15.161 | 13.006 | 17.917 | 15.396 | 14.371 |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | 16.338 | ... | 14.556 | 16.656 | 16.954 | 18.493 | 15.823 | 14.626 | 13.385 | 17.767 | 15.687 | 13.573 |
210 rows × 1213 columns
Features which would not have been included using ALD criteria:
Index(['A0A075B6H7', 'A0A075B6Q5', 'A0A075B7B8', 'A0A087WSY4',
'A0A087WTT8;A0A0A0MQX5;O94779;O94779-2', 'A0A087WXB8;Q9Y274',
'A0A087WXE9;E9PQ70;Q6UXH9;Q6UXH9-2;Q6UXH9-3',
'A0A087X1Z2;C9JTV4;H0Y4Y4;Q8WYH2;Q96C19;Q9BUP0;Q9BUP0-2',
'A0A0A0MQS9;A0A0A0MTC7;Q16363;Q16363-2', 'A0A0A0MSN4;P12821;P12821-2',
...
'Q9NZ94;Q9NZ94-2;Q9NZ94-3', 'Q9NZU1', 'Q9P1W8;Q9P1W8-2;Q9P1W8-4',
'Q9UHI8', 'Q9UI40;Q9UI40-2',
'Q9UIB8;Q9UIB8-2;Q9UIB8-3;Q9UIB8-4;Q9UIB8-5;Q9UIB8-6',
'Q9UKZ4;Q9UKZ4-2', 'Q9UMX0;Q9UMX0-2;Q9UMX0-4', 'Q9Y281;Q9Y281-3',
'Q9Y490'],
dtype='object', name='protein groups', length=208)
Binarize targets, but also keep groups for stratification
| AD | 0 | 1 |
|---|---|---|
| AD | ||
| False | 122 | 0 |
| True | 0 | 88 |
Determine best number of parameters by cross validation procedure#
using subset of data by ALD criteria:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 324.64it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 9.56it/s]
100%|██████████| 2/2 [00:00<00:00, 9.53it/s]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████▋ | 2/3 [00:00<00:00, 9.97it/s]
100%|██████████| 3/3 [00:00<00:00, 6.72it/s]
100%|██████████| 3/3 [00:00<00:00, 7.17it/s]
0%| | 0/4 [00:00<?, ?it/s]
50%|█████ | 2/4 [00:00<00:00, 11.34it/s]
100%|██████████| 4/4 [00:00<00:00, 5.36it/s]
100%|██████████| 4/4 [00:00<00:00, 5.79it/s]
0%| | 0/5 [00:00<?, ?it/s]
40%|████ | 2/5 [00:00<00:00, 10.19it/s]
80%|████████ | 4/5 [00:00<00:00, 6.26it/s]
100%|██████████| 5/5 [00:00<00:00, 5.74it/s]
100%|██████████| 5/5 [00:00<00:00, 6.16it/s]
0%| | 0/6 [00:00<?, ?it/s]
33%|███▎ | 2/6 [00:00<00:00, 9.54it/s]
50%|█████ | 3/6 [00:00<00:00, 6.92it/s]
67%|██████▋ | 4/6 [00:00<00:00, 5.70it/s]
83%|████████▎ | 5/6 [00:00<00:00, 5.03it/s]
100%|██████████| 6/6 [00:01<00:00, 4.80it/s]
100%|██████████| 6/6 [00:01<00:00, 5.42it/s]
0%| | 0/7 [00:00<?, ?it/s]
29%|██▊ | 2/7 [00:00<00:00, 9.89it/s]
43%|████▎ | 3/7 [00:00<00:00, 6.96it/s]
57%|█████▋ | 4/7 [00:00<00:00, 5.70it/s]
71%|███████▏ | 5/7 [00:01<00:00, 4.12it/s]
86%|████████▌ | 6/7 [00:01<00:00, 3.65it/s]
100%|██████████| 7/7 [00:01<00:00, 3.27it/s]
100%|██████████| 7/7 [00:01<00:00, 4.08it/s]
0%| | 0/8 [00:00<?, ?it/s]
25%|██▌ | 2/8 [00:00<00:00, 9.84it/s]
38%|███▊ | 3/8 [00:00<00:00, 6.81it/s]
50%|█████ | 4/8 [00:00<00:00, 5.74it/s]
62%|██████▎ | 5/8 [00:00<00:00, 5.27it/s]
75%|███████▌ | 6/8 [00:01<00:00, 5.02it/s]
88%|████████▊ | 7/8 [00:01<00:00, 4.95it/s]
100%|██████████| 8/8 [00:01<00:00, 4.90it/s]
100%|██████████| 8/8 [00:01<00:00, 5.38it/s]
0%| | 0/9 [00:00<?, ?it/s]
22%|██▏ | 2/9 [00:00<00:00, 9.75it/s]
33%|███▎ | 3/9 [00:00<00:00, 6.36it/s]
44%|████▍ | 4/9 [00:00<00:00, 5.58it/s]
56%|█████▌ | 5/9 [00:00<00:00, 5.02it/s]
67%|██████▋ | 6/9 [00:01<00:00, 4.63it/s]
78%|███████▊ | 7/9 [00:01<00:00, 4.61it/s]
89%|████████▉ | 8/9 [00:01<00:00, 4.66it/s]
100%|██████████| 9/9 [00:01<00:00, 4.78it/s]
100%|██████████| 9/9 [00:01<00:00, 5.09it/s]
0%| | 0/10 [00:00<?, ?it/s]
20%|██ | 2/10 [00:00<00:00, 10.01it/s]
40%|████ | 4/10 [00:00<00:00, 6.45it/s]
50%|█████ | 5/10 [00:00<00:00, 5.85it/s]
60%|██████ | 6/10 [00:01<00:00, 5.40it/s]
70%|███████ | 7/10 [00:01<00:00, 5.26it/s]
80%|████████ | 8/10 [00:01<00:00, 5.10it/s]
90%|█████████ | 9/10 [00:01<00:00, 4.93it/s]
100%|██████████| 10/10 [00:01<00:00, 4.89it/s]
100%|██████████| 10/10 [00:01<00:00, 5.39it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:01, 7.08it/s]
27%|██▋ | 3/11 [00:00<00:01, 5.08it/s]
36%|███▋ | 4/11 [00:00<00:01, 4.38it/s]
45%|████▌ | 5/11 [00:01<00:01, 3.96it/s]
55%|█████▍ | 6/11 [00:01<00:01, 3.50it/s]
64%|██████▎ | 7/11 [00:01<00:01, 3.95it/s]
73%|███████▎ | 8/11 [00:01<00:00, 4.04it/s]
82%|████████▏ | 9/11 [00:02<00:00, 4.04it/s]
91%|█████████ | 10/11 [00:02<00:00, 4.23it/s]
100%|██████████| 11/11 [00:02<00:00, 4.66it/s]
100%|██████████| 11/11 [00:02<00:00, 4.34it/s]
0%| | 0/12 [00:00<?, ?it/s]
17%|█▋ | 2/12 [00:00<00:01, 5.87it/s]
25%|██▌ | 3/12 [00:00<00:01, 4.80it/s]
33%|███▎ | 4/12 [00:00<00:01, 4.33it/s]
42%|████▏ | 5/12 [00:01<00:01, 4.62it/s]
50%|█████ | 6/12 [00:01<00:01, 4.66it/s]
58%|█████▊ | 7/12 [00:01<00:01, 4.85it/s]
67%|██████▋ | 8/12 [00:01<00:00, 4.98it/s]
75%|███████▌ | 9/12 [00:01<00:00, 4.92it/s]
83%|████████▎ | 10/12 [00:02<00:00, 4.95it/s]
92%|█████████▏| 11/12 [00:02<00:00, 4.97it/s]
100%|██████████| 12/12 [00:02<00:00, 5.00it/s]
100%|██████████| 12/12 [00:02<00:00, 4.89it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:01, 8.90it/s]
23%|██▎ | 3/13 [00:00<00:01, 6.04it/s]
31%|███ | 4/13 [00:00<00:01, 5.37it/s]
38%|███▊ | 5/13 [00:00<00:01, 4.62it/s]
46%|████▌ | 6/13 [00:01<00:01, 4.47it/s]
54%|█████▍ | 7/13 [00:01<00:01, 4.51it/s]
62%|██████▏ | 8/13 [00:01<00:01, 4.51it/s]
69%|██████▉ | 9/13 [00:01<00:00, 4.44it/s]
77%|███████▋ | 10/13 [00:02<00:00, 4.83it/s]
85%|████████▍ | 11/13 [00:02<00:00, 4.96it/s]
92%|█████████▏| 12/13 [00:02<00:00, 3.95it/s]
100%|██████████| 13/13 [00:02<00:00, 3.55it/s]
100%|██████████| 13/13 [00:02<00:00, 4.42it/s]
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:01, 7.98it/s]
21%|██▏ | 3/14 [00:00<00:01, 5.94it/s]
29%|██▊ | 4/14 [00:00<00:01, 5.12it/s]
36%|███▌ | 5/14 [00:00<00:01, 5.06it/s]
43%|████▎ | 6/14 [00:01<00:01, 4.45it/s]
50%|█████ | 7/14 [00:01<00:01, 3.78it/s]
57%|█████▋ | 8/14 [00:01<00:01, 3.30it/s]
64%|██████▍ | 9/14 [00:02<00:01, 3.27it/s]
71%|███████▏ | 10/14 [00:02<00:01, 2.96it/s]
79%|███████▊ | 11/14 [00:02<00:00, 3.16it/s]
86%|████████▌ | 12/14 [00:03<00:00, 3.24it/s]
93%|█████████▎| 13/14 [00:03<00:00, 3.36it/s]
100%|██████████| 14/14 [00:03<00:00, 3.47it/s]
100%|██████████| 14/14 [00:03<00:00, 3.73it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:02, 6.42it/s]
20%|██ | 3/15 [00:00<00:02, 4.19it/s]
27%|██▋ | 4/15 [00:00<00:02, 4.50it/s]
33%|███▎ | 5/15 [00:01<00:02, 4.11it/s]
40%|████ | 6/15 [00:01<00:02, 4.05it/s]
47%|████▋ | 7/15 [00:01<00:01, 4.04it/s]
53%|█████▎ | 8/15 [00:01<00:01, 4.03it/s]
60%|██████ | 9/15 [00:02<00:01, 3.80it/s]
67%|██████▋ | 10/15 [00:02<00:01, 4.00it/s]
73%|███████▎ | 11/15 [00:02<00:00, 4.22it/s]
80%|████████ | 12/15 [00:02<00:00, 4.15it/s]
87%|████████▋ | 13/15 [00:03<00:00, 4.34it/s]
93%|█████████▎| 14/15 [00:03<00:00, 4.44it/s]
100%|██████████| 15/15 [00:03<00:00, 4.34it/s]
100%|██████████| 15/15 [00:03<00:00, 4.24it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.004 | 0.002 | 0.041 | 0.018 | 0.766 | 0.340 | 0.120 | 0.083 | 0.201 | 0.127 | 0.552 | 0.041 | 0.854 | 0.063 | 0.824 | 0.087 | 210.000 | 0.000 |
| 2 | 0.003 | 0.001 | 0.040 | 0.011 | 0.609 | 0.128 | 0.465 | 0.118 | 0.517 | 0.100 | 0.619 | 0.067 | 0.699 | 0.083 | 0.648 | 0.098 | 210.000 | 0.000 |
| 3 | 0.003 | 0.001 | 0.035 | 0.008 | 0.803 | 0.078 | 0.748 | 0.094 | 0.770 | 0.069 | 0.806 | 0.054 | 0.919 | 0.040 | 0.906 | 0.042 | 210.000 | 0.000 |
| 4 | 0.004 | 0.002 | 0.038 | 0.017 | 0.799 | 0.074 | 0.758 | 0.093 | 0.774 | 0.066 | 0.808 | 0.053 | 0.918 | 0.040 | 0.905 | 0.042 | 210.000 | 0.000 |
| 5 | 0.004 | 0.001 | 0.040 | 0.011 | 0.807 | 0.083 | 0.816 | 0.102 | 0.807 | 0.071 | 0.835 | 0.059 | 0.923 | 0.040 | 0.911 | 0.043 | 210.000 | 0.000 |
| 6 | 0.002 | 0.001 | 0.026 | 0.008 | 0.813 | 0.080 | 0.813 | 0.104 | 0.808 | 0.070 | 0.836 | 0.059 | 0.922 | 0.042 | 0.911 | 0.044 | 210.000 | 0.000 |
| 7 | 0.004 | 0.001 | 0.038 | 0.007 | 0.817 | 0.080 | 0.815 | 0.102 | 0.812 | 0.070 | 0.839 | 0.059 | 0.922 | 0.042 | 0.911 | 0.043 | 210.000 | 0.000 |
| 8 | 0.004 | 0.002 | 0.047 | 0.015 | 0.815 | 0.083 | 0.825 | 0.094 | 0.815 | 0.066 | 0.842 | 0.056 | 0.920 | 0.041 | 0.911 | 0.041 | 210.000 | 0.000 |
| 9 | 0.003 | 0.001 | 0.027 | 0.007 | 0.816 | 0.082 | 0.826 | 0.093 | 0.816 | 0.061 | 0.842 | 0.052 | 0.919 | 0.041 | 0.909 | 0.042 | 210.000 | 0.000 |
| 10 | 0.004 | 0.002 | 0.042 | 0.012 | 0.813 | 0.072 | 0.828 | 0.096 | 0.817 | 0.065 | 0.843 | 0.056 | 0.925 | 0.042 | 0.915 | 0.042 | 210.000 | 0.000 |
| 11 | 0.002 | 0.000 | 0.020 | 0.000 | 0.815 | 0.075 | 0.824 | 0.100 | 0.816 | 0.069 | 0.843 | 0.059 | 0.924 | 0.044 | 0.914 | 0.044 | 210.000 | 0.000 |
| 12 | 0.005 | 0.002 | 0.047 | 0.017 | 0.817 | 0.066 | 0.836 | 0.091 | 0.823 | 0.061 | 0.849 | 0.052 | 0.924 | 0.044 | 0.911 | 0.047 | 210.000 | 0.000 |
| 13 | 0.004 | 0.002 | 0.047 | 0.015 | 0.831 | 0.072 | 0.816 | 0.085 | 0.820 | 0.058 | 0.846 | 0.048 | 0.925 | 0.040 | 0.912 | 0.042 | 210.000 | 0.000 |
| 14 | 0.004 | 0.002 | 0.037 | 0.011 | 0.832 | 0.070 | 0.820 | 0.084 | 0.823 | 0.059 | 0.848 | 0.050 | 0.923 | 0.041 | 0.910 | 0.044 | 210.000 | 0.000 |
| 15 | 0.004 | 0.001 | 0.040 | 0.011 | 0.831 | 0.073 | 0.815 | 0.081 | 0.820 | 0.058 | 0.846 | 0.049 | 0.920 | 0.041 | 0.906 | 0.045 | 210.000 | 0.000 |
Using all data:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 498.14it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 5.17it/s]
100%|██████████| 2/2 [00:00<00:00, 5.12it/s]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████▋ | 2/3 [00:00<00:00, 12.03it/s]
100%|██████████| 3/3 [00:00<00:00, 9.10it/s]
0%| | 0/4 [00:00<?, ?it/s]
50%|█████ | 2/4 [00:00<00:00, 9.37it/s]
75%|███████▌ | 3/4 [00:00<00:00, 6.90it/s]
100%|██████████| 4/4 [00:00<00:00, 5.98it/s]
100%|██████████| 4/4 [00:00<00:00, 6.48it/s]
0%| | 0/5 [00:00<?, ?it/s]
40%|████ | 2/5 [00:00<00:00, 8.60it/s]
60%|██████ | 3/5 [00:00<00:00, 6.31it/s]
80%|████████ | 4/5 [00:00<00:00, 5.57it/s]
100%|██████████| 5/5 [00:00<00:00, 5.28it/s]
100%|██████████| 5/5 [00:00<00:00, 5.72it/s]
0%| | 0/6 [00:00<?, ?it/s]
33%|███▎ | 2/6 [00:00<00:00, 11.96it/s]
67%|██████▋ | 4/6 [00:00<00:00, 4.54it/s]
83%|████████▎ | 5/6 [00:01<00:00, 3.62it/s]
100%|██████████| 6/6 [00:01<00:00, 3.48it/s]
100%|██████████| 6/6 [00:01<00:00, 3.94it/s]
0%| | 0/7 [00:00<?, ?it/s]
29%|██▊ | 2/7 [00:00<00:00, 11.49it/s]
57%|█████▋ | 4/7 [00:00<00:00, 7.70it/s]
71%|███████▏ | 5/7 [00:00<00:00, 7.26it/s]
86%|████████▌ | 6/7 [00:00<00:00, 6.92it/s]
100%|██████████| 7/7 [00:00<00:00, 6.71it/s]
100%|██████████| 7/7 [00:00<00:00, 7.21it/s]
0%| | 0/8 [00:00<?, ?it/s]
25%|██▌ | 2/8 [00:00<00:00, 10.23it/s]
50%|█████ | 4/8 [00:00<00:00, 6.35it/s]
62%|██████▎ | 5/8 [00:00<00:00, 5.78it/s]
75%|███████▌ | 6/8 [00:01<00:00, 5.45it/s]
88%|████████▊ | 7/8 [00:01<00:00, 5.55it/s]
100%|██████████| 8/8 [00:01<00:00, 5.75it/s]
100%|██████████| 8/8 [00:01<00:00, 5.95it/s]
0%| | 0/9 [00:00<?, ?it/s]
22%|██▏ | 2/9 [00:00<00:00, 9.86it/s]
33%|███▎ | 3/9 [00:00<00:00, 6.61it/s]
44%|████▍ | 4/9 [00:00<00:00, 5.78it/s]
56%|█████▌ | 5/9 [00:00<00:00, 5.37it/s]
67%|██████▋ | 6/9 [00:01<00:00, 5.21it/s]
78%|███████▊ | 7/9 [00:01<00:00, 5.04it/s]
89%|████████▉ | 8/9 [00:01<00:00, 4.39it/s]
100%|██████████| 9/9 [00:01<00:00, 4.62it/s]
100%|██████████| 9/9 [00:01<00:00, 5.16it/s]
0%| | 0/10 [00:00<?, ?it/s]
20%|██ | 2/10 [00:00<00:00, 14.46it/s]
40%|████ | 4/10 [00:00<00:00, 8.14it/s]
50%|█████ | 5/10 [00:00<00:00, 7.39it/s]
60%|██████ | 6/10 [00:00<00:00, 6.97it/s]
70%|███████ | 7/10 [00:00<00:00, 6.80it/s]
80%|████████ | 8/10 [00:01<00:00, 6.70it/s]
90%|█████████ | 9/10 [00:01<00:00, 6.62it/s]
100%|██████████| 10/10 [00:01<00:00, 6.51it/s]
100%|██████████| 10/10 [00:01<00:00, 7.08it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:00, 11.80it/s]
36%|███▋ | 4/11 [00:00<00:00, 7.76it/s]
45%|████▌ | 5/11 [00:00<00:00, 7.15it/s]
55%|█████▍ | 6/11 [00:00<00:00, 6.76it/s]
64%|██████▎ | 7/11 [00:00<00:00, 6.65it/s]
73%|███████▎ | 8/11 [00:01<00:00, 6.57it/s]
82%|████████▏ | 9/11 [00:01<00:00, 6.53it/s]
91%|█████████ | 10/11 [00:01<00:00, 6.21it/s]
100%|██████████| 11/11 [00:01<00:00, 5.42it/s]
100%|██████████| 11/11 [00:01<00:00, 6.44it/s]
0%| | 0/12 [00:00<?, ?it/s]
17%|█▋ | 2/12 [00:00<00:00, 12.54it/s]
33%|███▎ | 4/12 [00:00<00:01, 7.79it/s]
42%|████▏ | 5/12 [00:00<00:00, 7.15it/s]
50%|█████ | 6/12 [00:00<00:00, 6.91it/s]
58%|█████▊ | 7/12 [00:00<00:00, 6.76it/s]
67%|██████▋ | 8/12 [00:01<00:00, 6.66it/s]
75%|███████▌ | 9/12 [00:01<00:00, 6.49it/s]
83%|████████▎ | 10/12 [00:01<00:00, 5.48it/s]
92%|█████████▏| 11/12 [00:01<00:00, 5.69it/s]
100%|██████████| 12/12 [00:01<00:00, 5.81it/s]
100%|██████████| 12/12 [00:01<00:00, 6.47it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:00, 12.51it/s]
31%|███ | 4/13 [00:00<00:01, 7.69it/s]
38%|███▊ | 5/13 [00:00<00:01, 7.27it/s]
46%|████▌ | 6/13 [00:00<00:01, 6.99it/s]
54%|█████▍ | 7/13 [00:00<00:00, 6.81it/s]
62%|██████▏ | 8/13 [00:01<00:00, 6.57it/s]
69%|██████▉ | 9/13 [00:01<00:00, 5.37it/s]
77%|███████▋ | 10/13 [00:01<00:00, 5.47it/s]
85%|████████▍ | 11/13 [00:01<00:00, 5.56it/s]
92%|█████████▏| 12/13 [00:01<00:00, 5.87it/s]
100%|██████████| 13/13 [00:02<00:00, 5.85it/s]
100%|██████████| 13/13 [00:02<00:00, 6.32it/s]
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:00, 12.22it/s]
29%|██▊ | 4/14 [00:00<00:01, 7.96it/s]
36%|███▌ | 5/14 [00:00<00:01, 7.43it/s]
43%|████▎ | 6/14 [00:00<00:01, 7.00it/s]
50%|█████ | 7/14 [00:00<00:01, 6.80it/s]
57%|█████▋ | 8/14 [00:01<00:00, 6.19it/s]
64%|██████▍ | 9/14 [00:01<00:00, 5.57it/s]
71%|███████▏ | 10/14 [00:01<00:00, 5.74it/s]
79%|███████▊ | 11/14 [00:01<00:00, 5.59it/s]
86%|████████▌ | 12/14 [00:01<00:00, 5.77it/s]
93%|█████████▎| 13/14 [00:02<00:00, 5.87it/s]
100%|██████████| 14/14 [00:02<00:00, 5.95it/s]
100%|██████████| 14/14 [00:02<00:00, 6.34it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:01, 12.08it/s]
27%|██▋ | 4/15 [00:00<00:01, 7.94it/s]
33%|███▎ | 5/15 [00:00<00:01, 7.42it/s]
40%|████ | 6/15 [00:00<00:01, 7.08it/s]
47%|████▋ | 7/15 [00:00<00:01, 6.87it/s]
53%|█████▎ | 8/15 [00:01<00:01, 5.45it/s]
60%|██████ | 9/15 [00:01<00:01, 5.67it/s]
67%|██████▋ | 10/15 [00:01<00:00, 5.55it/s]
73%|███████▎ | 11/15 [00:01<00:00, 5.53it/s]
80%|████████ | 12/15 [00:01<00:00, 5.71it/s]
87%|████████▋ | 13/15 [00:02<00:00, 5.84it/s]
93%|█████████▎| 14/15 [00:02<00:00, 5.85it/s]
100%|██████████| 15/15 [00:02<00:00, 5.83it/s]
100%|██████████| 15/15 [00:02<00:00, 6.20it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.003 | 0.001 | 0.037 | 0.011 | 0.010 | 0.071 | 0.001 | 0.008 | 0.002 | 0.015 | 0.497 | 0.008 | 0.855 | 0.064 | 0.825 | 0.088 | 210.000 | 0.000 |
| 2 | 0.003 | 0.001 | 0.031 | 0.007 | 0.674 | 0.120 | 0.481 | 0.133 | 0.545 | 0.095 | 0.648 | 0.056 | 0.707 | 0.082 | 0.624 | 0.096 | 210.000 | 0.000 |
| 3 | 0.003 | 0.002 | 0.035 | 0.019 | 0.717 | 0.094 | 0.671 | 0.112 | 0.686 | 0.080 | 0.736 | 0.062 | 0.850 | 0.052 | 0.814 | 0.061 | 210.000 | 0.000 |
| 4 | 0.002 | 0.000 | 0.022 | 0.003 | 0.704 | 0.097 | 0.666 | 0.114 | 0.679 | 0.086 | 0.729 | 0.068 | 0.845 | 0.053 | 0.808 | 0.063 | 210.000 | 0.000 |
| 5 | 0.003 | 0.001 | 0.026 | 0.009 | 0.704 | 0.084 | 0.663 | 0.106 | 0.678 | 0.078 | 0.728 | 0.061 | 0.843 | 0.054 | 0.801 | 0.067 | 210.000 | 0.000 |
| 6 | 0.003 | 0.001 | 0.030 | 0.008 | 0.704 | 0.089 | 0.669 | 0.111 | 0.680 | 0.079 | 0.730 | 0.063 | 0.844 | 0.054 | 0.804 | 0.066 | 210.000 | 0.000 |
| 7 | 0.004 | 0.002 | 0.035 | 0.014 | 0.761 | 0.096 | 0.735 | 0.125 | 0.741 | 0.085 | 0.781 | 0.068 | 0.877 | 0.053 | 0.845 | 0.067 | 210.000 | 0.000 |
| 8 | 0.002 | 0.000 | 0.023 | 0.006 | 0.757 | 0.102 | 0.724 | 0.129 | 0.732 | 0.089 | 0.774 | 0.070 | 0.875 | 0.054 | 0.842 | 0.067 | 210.000 | 0.000 |
| 9 | 0.003 | 0.001 | 0.025 | 0.007 | 0.752 | 0.089 | 0.729 | 0.108 | 0.734 | 0.072 | 0.774 | 0.057 | 0.874 | 0.053 | 0.844 | 0.065 | 210.000 | 0.000 |
| 10 | 0.002 | 0.001 | 0.023 | 0.008 | 0.803 | 0.093 | 0.808 | 0.113 | 0.800 | 0.079 | 0.829 | 0.066 | 0.909 | 0.045 | 0.886 | 0.053 | 210.000 | 0.000 |
| 11 | 0.002 | 0.000 | 0.021 | 0.002 | 0.795 | 0.096 | 0.808 | 0.112 | 0.796 | 0.079 | 0.826 | 0.066 | 0.909 | 0.045 | 0.886 | 0.054 | 210.000 | 0.000 |
| 12 | 0.002 | 0.001 | 0.022 | 0.009 | 0.798 | 0.090 | 0.789 | 0.103 | 0.787 | 0.068 | 0.819 | 0.056 | 0.916 | 0.043 | 0.896 | 0.049 | 210.000 | 0.000 |
| 13 | 0.002 | 0.001 | 0.023 | 0.005 | 0.802 | 0.086 | 0.789 | 0.103 | 0.790 | 0.069 | 0.821 | 0.057 | 0.916 | 0.042 | 0.897 | 0.050 | 210.000 | 0.000 |
| 14 | 0.002 | 0.001 | 0.024 | 0.009 | 0.803 | 0.087 | 0.796 | 0.098 | 0.795 | 0.066 | 0.825 | 0.056 | 0.919 | 0.043 | 0.899 | 0.051 | 210.000 | 0.000 |
| 15 | 0.002 | 0.001 | 0.024 | 0.005 | 0.798 | 0.088 | 0.795 | 0.091 | 0.792 | 0.064 | 0.822 | 0.056 | 0.918 | 0.043 | 0.895 | 0.053 | 210.000 | 0.000 |
Using only new features:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 1028.77it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 51.75it/s]
0%| | 0/3 [00:00<?, ?it/s]
100%|██████████| 3/3 [00:00<00:00, 39.53it/s]
0%| | 0/4 [00:00<?, ?it/s]
100%|██████████| 4/4 [00:00<00:00, 44.97it/s]
0%| | 0/5 [00:00<?, ?it/s]
80%|████████ | 4/5 [00:00<00:00, 34.17it/s]
100%|██████████| 5/5 [00:00<00:00, 26.52it/s]
0%| | 0/6 [00:00<?, ?it/s]
50%|█████ | 3/6 [00:00<00:00, 25.35it/s]
100%|██████████| 6/6 [00:00<00:00, 24.59it/s]
100%|██████████| 6/6 [00:00<00:00, 24.64it/s]
0%| | 0/7 [00:00<?, ?it/s]
43%|████▎ | 3/7 [00:00<00:00, 23.83it/s]
86%|████████▌ | 6/7 [00:00<00:00, 20.43it/s]
100%|██████████| 7/7 [00:00<00:00, 19.99it/s]
0%| | 0/8 [00:00<?, ?it/s]
62%|██████▎ | 5/8 [00:00<00:00, 37.02it/s]
100%|██████████| 8/8 [00:00<00:00, 34.30it/s]
0%| | 0/9 [00:00<?, ?it/s]
56%|█████▌ | 5/9 [00:00<00:00, 43.52it/s]
100%|██████████| 9/9 [00:00<00:00, 39.46it/s]
0%| | 0/10 [00:00<?, ?it/s]
50%|█████ | 5/10 [00:00<00:00, 44.16it/s]
100%|██████████| 10/10 [00:00<00:00, 38.34it/s]
100%|██████████| 10/10 [00:00<00:00, 39.04it/s]
0%| | 0/11 [00:00<?, ?it/s]
45%|████▌ | 5/11 [00:00<00:00, 42.71it/s]
91%|█████████ | 10/11 [00:00<00:00, 37.87it/s]
100%|██████████| 11/11 [00:00<00:00, 36.66it/s]
0%| | 0/12 [00:00<?, ?it/s]
42%|████▏ | 5/12 [00:00<00:00, 42.67it/s]
83%|████████▎ | 10/12 [00:00<00:00, 36.52it/s]
100%|██████████| 12/12 [00:00<00:00, 36.93it/s]
0%| | 0/13 [00:00<?, ?it/s]
38%|███▊ | 5/13 [00:00<00:00, 38.95it/s]
69%|██████▉ | 9/13 [00:00<00:00, 36.95it/s]
100%|██████████| 13/13 [00:00<00:00, 36.30it/s]
100%|██████████| 13/13 [00:00<00:00, 36.64it/s]
0%| | 0/14 [00:00<?, ?it/s]
36%|███▌ | 5/14 [00:00<00:00, 42.90it/s]
71%|███████▏ | 10/14 [00:00<00:00, 37.75it/s]
100%|██████████| 14/14 [00:00<00:00, 36.78it/s]
100%|██████████| 14/14 [00:00<00:00, 37.48it/s]
0%| | 0/15 [00:00<?, ?it/s]
33%|███▎ | 5/15 [00:00<00:00, 43.85it/s]
67%|██████▋ | 10/15 [00:00<00:00, 38.52it/s]
93%|█████████▎| 14/15 [00:00<00:00, 37.18it/s]
100%|██████████| 15/15 [00:00<00:00, 37.78it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.002 | 0.000 | 0.023 | 0.005 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.500 | 0.000 | 0.739 | 0.068 | 0.690 | 0.084 | 210.000 | 0.000 |
| 2 | 0.002 | 0.000 | 0.021 | 0.002 | 0.422 | 0.386 | 0.082 | 0.086 | 0.128 | 0.124 | 0.519 | 0.030 | 0.590 | 0.086 | 0.568 | 0.076 | 210.000 | 0.000 |
| 3 | 0.002 | 0.001 | 0.022 | 0.004 | 0.524 | 0.308 | 0.130 | 0.083 | 0.198 | 0.112 | 0.521 | 0.042 | 0.560 | 0.082 | 0.531 | 0.082 | 210.000 | 0.000 |
| 4 | 0.003 | 0.002 | 0.034 | 0.015 | 0.511 | 0.253 | 0.145 | 0.077 | 0.217 | 0.101 | 0.518 | 0.042 | 0.537 | 0.079 | 0.510 | 0.074 | 210.000 | 0.000 |
| 5 | 0.003 | 0.001 | 0.029 | 0.008 | 0.558 | 0.239 | 0.186 | 0.074 | 0.268 | 0.094 | 0.531 | 0.047 | 0.519 | 0.080 | 0.510 | 0.072 | 210.000 | 0.000 |
| 6 | 0.003 | 0.001 | 0.036 | 0.014 | 0.642 | 0.098 | 0.557 | 0.115 | 0.591 | 0.094 | 0.664 | 0.072 | 0.748 | 0.063 | 0.684 | 0.072 | 210.000 | 0.000 |
| 7 | 0.002 | 0.001 | 0.022 | 0.004 | 0.643 | 0.078 | 0.639 | 0.116 | 0.636 | 0.085 | 0.690 | 0.066 | 0.788 | 0.064 | 0.734 | 0.078 | 210.000 | 0.000 |
| 8 | 0.002 | 0.000 | 0.022 | 0.005 | 0.635 | 0.079 | 0.622 | 0.110 | 0.623 | 0.080 | 0.680 | 0.064 | 0.788 | 0.065 | 0.733 | 0.083 | 210.000 | 0.000 |
| 9 | 0.002 | 0.000 | 0.020 | 0.000 | 0.639 | 0.078 | 0.613 | 0.106 | 0.621 | 0.078 | 0.679 | 0.062 | 0.786 | 0.063 | 0.731 | 0.079 | 210.000 | 0.000 |
| 10 | 0.002 | 0.000 | 0.020 | 0.001 | 0.659 | 0.087 | 0.632 | 0.109 | 0.641 | 0.086 | 0.696 | 0.068 | 0.794 | 0.059 | 0.751 | 0.074 | 210.000 | 0.000 |
| 11 | 0.002 | 0.000 | 0.020 | 0.001 | 0.650 | 0.091 | 0.625 | 0.105 | 0.632 | 0.084 | 0.688 | 0.068 | 0.787 | 0.063 | 0.743 | 0.077 | 210.000 | 0.000 |
| 12 | 0.002 | 0.000 | 0.020 | 0.000 | 0.659 | 0.092 | 0.631 | 0.111 | 0.638 | 0.083 | 0.693 | 0.064 | 0.789 | 0.064 | 0.740 | 0.079 | 210.000 | 0.000 |
| 13 | 0.002 | 0.000 | 0.020 | 0.000 | 0.656 | 0.085 | 0.631 | 0.112 | 0.636 | 0.080 | 0.692 | 0.060 | 0.786 | 0.064 | 0.738 | 0.079 | 210.000 | 0.000 |
| 14 | 0.002 | 0.000 | 0.020 | 0.000 | 0.650 | 0.089 | 0.625 | 0.108 | 0.631 | 0.080 | 0.688 | 0.060 | 0.782 | 0.065 | 0.734 | 0.080 | 210.000 | 0.000 |
| 15 | 0.002 | 0.000 | 0.020 | 0.000 | 0.656 | 0.089 | 0.630 | 0.121 | 0.637 | 0.091 | 0.694 | 0.068 | 0.788 | 0.065 | 0.737 | 0.080 | 210.000 | 0.000 |
Best number of features by subset of the data:#
| ald | all | new | |
|---|---|---|---|
| fit_time | 12 | 7 | 6 |
| score_time | 12 | 1 | 6 |
| test_precision | 14 | 14 | 10 |
| test_recall | 12 | 11 | 7 |
| test_f1 | 12 | 10 | 10 |
| test_balanced_accuracy | 12 | 10 | 10 |
| test_roc_auc | 10 | 14 | 10 |
| test_average_precision | 10 | 14 | 10 |
| n_observations | 1 | 1 | 1 |
Train, test split#
Show number of cases in train and test data
| train | test | |
|---|---|---|
| False | 98 | 24 |
| True | 70 | 18 |
Results#
run_modelreturns dataclasses with the further needed resultsadd mrmr selection of data (select best number of features to use instead of fixing it)
Save results for final model on entire data, new features and ALD study criteria selected data.
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:00, 15.91it/s]
29%|██▊ | 4/14 [00:00<00:00, 10.76it/s]
43%|████▎ | 6/14 [00:00<00:00, 9.75it/s]
57%|█████▋ | 8/14 [00:00<00:00, 9.32it/s]
64%|██████▍ | 9/14 [00:00<00:00, 9.17it/s]
71%|███████▏ | 10/14 [00:01<00:00, 9.06it/s]
79%|███████▊ | 11/14 [00:01<00:00, 8.94it/s]
86%|████████▌ | 12/14 [00:01<00:00, 8.86it/s]
93%|█████████▎| 13/14 [00:01<00:00, 8.84it/s]
100%|██████████| 14/14 [00:01<00:00, 8.80it/s]
100%|██████████| 14/14 [00:01<00:00, 9.32it/s]
0%| | 0/10 [00:00<?, ?it/s]
50%|█████ | 5/10 [00:00<00:00, 43.32it/s]
100%|██████████| 10/10 [00:00<00:00, 37.54it/s]
100%|██████████| 10/10 [00:00<00:00, 38.23it/s]
0%| | 0/10 [00:00<?, ?it/s]
20%|██ | 2/10 [00:00<00:00, 19.84it/s]
40%|████ | 4/10 [00:00<00:00, 12.51it/s]
60%|██████ | 6/10 [00:00<00:00, 11.15it/s]
80%|████████ | 8/10 [00:00<00:00, 10.63it/s]
100%|██████████| 10/10 [00:00<00:00, 10.37it/s]
100%|██████████| 10/10 [00:00<00:00, 11.02it/s]
ROC-AUC on test split#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve.pdf
Data used to plot ROC:
| ALD study all | VAE all | VAE new | ||||
|---|---|---|---|---|---|---|
| fpr | tpr | fpr | tpr | fpr | tpr | |
| 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 1 | 0.000 | 0.056 | 0.000 | 0.056 | 0.042 | 0.000 |
| 2 | 0.000 | 0.389 | 0.000 | 0.556 | 0.083 | 0.000 |
| 3 | 0.042 | 0.389 | 0.083 | 0.556 | 0.083 | 0.222 |
| 4 | 0.042 | 0.444 | 0.083 | 0.667 | 0.208 | 0.222 |
| 5 | 0.083 | 0.444 | 0.208 | 0.667 | 0.208 | 0.333 |
| 6 | 0.083 | 0.778 | 0.208 | 0.722 | 0.250 | 0.333 |
| 7 | 0.167 | 0.778 | 0.250 | 0.722 | 0.250 | 0.444 |
| 8 | 0.167 | 0.833 | 0.250 | 0.778 | 0.333 | 0.444 |
| 9 | 0.542 | 0.833 | 0.333 | 0.778 | 0.333 | 0.500 |
| 10 | 0.542 | 0.889 | 0.333 | 0.833 | 0.417 | 0.500 |
| 11 | 0.583 | 0.889 | 0.417 | 0.833 | 0.417 | 0.722 |
| 12 | 0.583 | 1.000 | 0.417 | 0.944 | 0.458 | 0.722 |
| 13 | 1.000 | 1.000 | 0.500 | 0.944 | 0.458 | 0.778 |
| 14 | NaN | NaN | 0.500 | 1.000 | 0.500 | 0.778 |
| 15 | NaN | NaN | 1.000 | 1.000 | 0.500 | 0.833 |
| 16 | NaN | NaN | NaN | NaN | 0.625 | 0.833 |
| 17 | NaN | NaN | NaN | NaN | 0.625 | 0.889 |
| 18 | NaN | NaN | NaN | NaN | 0.708 | 0.889 |
| 19 | NaN | NaN | NaN | NaN | 0.708 | 1.000 |
| 20 | NaN | NaN | NaN | NaN | 1.000 | 1.000 |
Features selected for final models#
| ALD study all | VAE all | VAE new | |
|---|---|---|---|
| rank | |||
| 0 | P10636-2;P10636-6 | P10636-2;P10636-6 | Q14894 |
| 1 | Q8NCL4 | P22676 | P51688 |
| 2 | J3KNE3;P68402 | Q0P6D2 | A0A087WXB8;Q9Y274 |
| 3 | Q02818 | Q14894 | P31321 |
| 4 | P61981 | P63104 | F8WBF9;Q5TH30;Q9UGV2;Q9UGV2-2;Q9UGV2-3 |
| 5 | P04075 | Q9Y2T3;Q9Y2T3-3 | A0A075B7B8 |
| 6 | P14174 | P51688 | Q96GD0 |
| 7 | Q9Y2T3;Q9Y2T3-3 | P00492 | A0A0C4DGV4;E9PLX3;O43504;R4GMU8 |
| 8 | P00338;P00338-3 | P61981 | Q9NUQ9 |
| 9 | C9JF17;P05090 | P04075 | A0A1W2PQ94;B4DS77;B4DS77-2;B4DS77-3 |
| 10 | None | P14174 | None |
| 11 | None | P00338;P00338-3 | None |
| 12 | None | C9JF17;P05090 | None |
| 13 | None | A0A0C4DGY8;D6RA00;Q9UHY7 | None |
Precision-Recall plot on test data#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve.pdf
Data used to plot PRC:
| ALD study all | VAE all | VAE new | ||||
|---|---|---|---|---|---|---|
| precision | tpr | precision | tpr | precision | tpr | |
| 0 | 0.429 | 1.000 | 0.429 | 1.000 | 0.429 | 1.000 |
| 1 | 0.439 | 1.000 | 0.439 | 1.000 | 0.439 | 1.000 |
| 2 | 0.450 | 1.000 | 0.450 | 1.000 | 0.450 | 1.000 |
| 3 | 0.462 | 1.000 | 0.462 | 1.000 | 0.462 | 1.000 |
| 4 | 0.474 | 1.000 | 0.474 | 1.000 | 0.474 | 1.000 |
| 5 | 0.486 | 1.000 | 0.486 | 1.000 | 0.486 | 1.000 |
| 6 | 0.500 | 1.000 | 0.500 | 1.000 | 0.500 | 1.000 |
| 7 | 0.514 | 1.000 | 0.514 | 1.000 | 0.514 | 1.000 |
| 8 | 0.529 | 1.000 | 0.529 | 1.000 | 0.500 | 0.944 |
| 9 | 0.545 | 1.000 | 0.545 | 1.000 | 0.485 | 0.889 |
| 10 | 0.562 | 1.000 | 0.562 | 1.000 | 0.500 | 0.889 |
| 11 | 0.548 | 0.944 | 0.581 | 1.000 | 0.516 | 0.889 |
| 12 | 0.533 | 0.889 | 0.600 | 1.000 | 0.500 | 0.833 |
| 13 | 0.552 | 0.889 | 0.586 | 0.944 | 0.517 | 0.833 |
| 14 | 0.536 | 0.833 | 0.607 | 0.944 | 0.536 | 0.833 |
| 15 | 0.556 | 0.833 | 0.630 | 0.944 | 0.556 | 0.833 |
| 16 | 0.577 | 0.833 | 0.615 | 0.889 | 0.538 | 0.778 |
| 17 | 0.600 | 0.833 | 0.600 | 0.833 | 0.560 | 0.778 |
| 18 | 0.625 | 0.833 | 0.625 | 0.833 | 0.542 | 0.722 |
| 19 | 0.652 | 0.833 | 0.652 | 0.833 | 0.565 | 0.722 |
| 20 | 0.682 | 0.833 | 0.636 | 0.778 | 0.545 | 0.667 |
| 21 | 0.714 | 0.833 | 0.667 | 0.778 | 0.524 | 0.611 |
| 22 | 0.750 | 0.833 | 0.700 | 0.778 | 0.500 | 0.556 |
| 23 | 0.789 | 0.833 | 0.684 | 0.722 | 0.474 | 0.500 |
| 24 | 0.778 | 0.778 | 0.722 | 0.722 | 0.500 | 0.500 |
| 25 | 0.824 | 0.778 | 0.706 | 0.667 | 0.529 | 0.500 |
| 26 | 0.875 | 0.778 | 0.750 | 0.667 | 0.500 | 0.444 |
| 27 | 0.867 | 0.722 | 0.800 | 0.667 | 0.533 | 0.444 |
| 28 | 0.857 | 0.667 | 0.857 | 0.667 | 0.571 | 0.444 |
| 29 | 0.846 | 0.611 | 0.846 | 0.611 | 0.538 | 0.389 |
| 30 | 0.833 | 0.556 | 0.833 | 0.556 | 0.500 | 0.333 |
| 31 | 0.818 | 0.500 | 0.909 | 0.556 | 0.545 | 0.333 |
| 32 | 0.800 | 0.444 | 1.000 | 0.556 | 0.500 | 0.278 |
| 33 | 0.889 | 0.444 | 1.000 | 0.500 | 0.444 | 0.222 |
| 34 | 0.875 | 0.389 | 1.000 | 0.444 | 0.500 | 0.222 |
| 35 | 1.000 | 0.389 | 1.000 | 0.389 | 0.571 | 0.222 |
| 36 | 1.000 | 0.333 | 1.000 | 0.333 | 0.667 | 0.222 |
| 37 | 1.000 | 0.278 | 1.000 | 0.278 | 0.600 | 0.167 |
| 38 | 1.000 | 0.222 | 1.000 | 0.222 | 0.500 | 0.111 |
| 39 | 1.000 | 0.167 | 1.000 | 0.167 | 0.333 | 0.056 |
| 40 | 1.000 | 0.111 | 1.000 | 0.111 | 0.000 | 0.000 |
| 41 | 1.000 | 0.056 | 1.000 | 0.056 | 0.000 | 0.000 |
| 42 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Train data plots#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve_train.pdf
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve_train.pdf
Output files:
{'results_VAE all.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_VAE all.pkl'),
'results_VAE new.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_VAE new.pkl'),
'results_ALD study all.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_ALD study all.pkl'),
'auc_roc_curve.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve.pdf'),
'mrmr_feat_by_model.xlsx': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/mrmr_feat_by_model.xlsx'),
'prec_recall_curve.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve.pdf'),
'prec_recall_curve_train.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve_train.pdf'),
'auc_roc_curve_train.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve_train.pdf')}