Fit logistic regression model#
based on different imputation methods
baseline: reference
model: any other selected imputation method
Parameters#
Default and set parameters for the notebook.
folder_data: str = '' # specify data directory if needed
fn_clinical_data = "data/ALD_study/processed/ald_metadata_cli.csv"
folder_experiment = "runs/appl_ald_data/plasma/proteinGroups"
model_key = 'VAE'
target = 'kleiner'
sample_id_col = 'Sample ID'
cutoff_target: int = 2 # => for binarization target >= cutoff_target
file_format = "csv"
out_folder = 'diff_analysis'
fn_qc_samples = '' # 'data/ALD_study/processed/qc_plasma_proteinGroups.pkl'
baseline = 'RSN' # default is RSN, as this was used in the original ALD Niu. et. al 2022
template_pred = 'pred_real_na_{}.csv' # fixed, do not change
# Parameters
cutoff_target = 0.5
folder_experiment = "runs/alzheimer_study"
target = "AD"
baseline = "PI"
model_key = "VAE"
out_folder = "diff_analysis"
fn_clinical_data = "runs/alzheimer_study/data/clinical_data.csv"
root - INFO Removed from global namespace: folder_data
root - INFO Removed from global namespace: fn_clinical_data
root - INFO Removed from global namespace: folder_experiment
root - INFO Removed from global namespace: model_key
root - INFO Removed from global namespace: target
root - INFO Removed from global namespace: sample_id_col
root - INFO Removed from global namespace: cutoff_target
root - INFO Removed from global namespace: file_format
root - INFO Removed from global namespace: out_folder
root - INFO Removed from global namespace: fn_qc_samples
root - INFO Removed from global namespace: baseline
root - INFO Removed from global namespace: template_pred
root - INFO Already set attribute: folder_experiment has value runs/alzheimer_study
root - INFO Already set attribute: out_folder has value diff_analysis
{'baseline': 'PI',
'cutoff_target': 0.5,
'data': PosixPath('runs/alzheimer_study/data'),
'file_format': 'csv',
'fn_clinical_data': 'runs/alzheimer_study/data/clinical_data.csv',
'fn_qc_samples': '',
'folder_data': '',
'folder_experiment': PosixPath('runs/alzheimer_study'),
'model_key': 'VAE',
'out_figures': PosixPath('runs/alzheimer_study/figures'),
'out_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE'),
'out_metrics': PosixPath('runs/alzheimer_study'),
'out_models': PosixPath('runs/alzheimer_study'),
'out_preds': PosixPath('runs/alzheimer_study/preds'),
'sample_id_col': 'Sample ID',
'target': 'AD',
'template_pred': 'pred_real_na_{}.csv'}
Load data#
Load target#
target = pd.read_csv(args.fn_clinical_data,
index_col=0,
usecols=[args.sample_id_col, args.target])
target = target.dropna()
target
| AD | |
|---|---|
| Sample ID | |
| Sample_000 | 0 |
| Sample_001 | 1 |
| Sample_002 | 1 |
| Sample_003 | 1 |
| Sample_004 | 1 |
| ... | ... |
| Sample_205 | 1 |
| Sample_206 | 0 |
| Sample_207 | 0 |
| Sample_208 | 0 |
| Sample_209 | 0 |
210 rows × 1 columns
MS proteomics or specified omics data#
Aggregated from data splits of the imputation workflow run before.
pimmslearn.io.datasplits - INFO Loaded 'train_X' from file: runs/alzheimer_study/data/train_X.csv
pimmslearn.io.datasplits - INFO Loaded 'val_y' from file: runs/alzheimer_study/data/val_y.csv
pimmslearn.io.datasplits - INFO Loaded 'test_y' from file: runs/alzheimer_study/data/test_y.csv
Sample ID protein groups
Sample_088 P00352 12.754
Sample_153 C9J712;P35080 14.273
Sample_194 Q9HCU0 16.407
Sample_074 Q8WY21;Q8WY21-2;Q8WY21-3;Q8WY21-4 15.936
Sample_161 P04275 14.484
Name: intensity, dtype: float64
Get overlap between independent features and target
Select by ALD criteria#
Use parameters as specified in ALD study.
root - INFO Initally: N samples: 210, M feat: 1421
root - INFO Dropped features quantified in less than 126 samples.
root - INFO After feat selection: N samples: 210, M feat: 1213
root - INFO Min No. of Protein-Groups in single sample: 754
root - INFO Finally: N samples: 210, M feat: 1213
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | A0A075B6J9 | ... | Q9Y653;Q9Y653-2;Q9Y653-3 | Q9Y696 | Q9Y6C2 | Q9Y6N6 | Q9Y6N7;Q9Y6N7-2;Q9Y6N7-4 | Q9Y6R7 | Q9Y6X5 | Q9Y6Y8;Q9Y6Y8-2 | Q9Y6Y9 | S4R3U6 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | NaN | ... | 16.012 | 15.178 | NaN | 15.050 | 16.842 | 19.863 | NaN | 19.563 | 12.837 | 12.805 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 19.941 | 18.786 | 17.144 | NaN | 19.067 | 16.188 | ... | 15.528 | 15.576 | NaN | 14.833 | 16.597 | 20.299 | 15.556 | 19.386 | 13.970 | 12.442 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | NaN | ... | 15.229 | 14.728 | 13.757 | 15.118 | 17.440 | 19.598 | 15.735 | 20.447 | 12.636 | 12.505 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | 13.438 | ... | 15.495 | 14.590 | 14.682 | 15.140 | 17.356 | 19.429 | NaN | 20.216 | 12.627 | 12.445 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | 14.495 | ... | 14.757 | 15.094 | 14.048 | 15.256 | 17.075 | 19.582 | 15.328 | 19.867 | 13.145 | 12.235 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 17.705 | 17.039 | NaN | 16.413 | 19.102 | 16.064 | ... | 15.235 | 15.684 | 14.236 | 15.415 | 17.551 | 17.922 | 16.340 | 19.928 | 12.929 | 11.802 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | 15.288 | ... | 15.422 | 16.106 | NaN | 15.345 | 17.084 | 18.708 | 14.249 | 19.433 | NaN | NaN |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | 17.580 | ... | 15.808 | 16.098 | 14.403 | 15.715 | 16.586 | 18.725 | 16.138 | 19.599 | 13.637 | 11.174 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.908 | 17.530 | NaN | 16.119 | 18.368 | 15.202 | ... | 15.157 | 16.712 | NaN | 14.640 | 16.533 | 19.411 | 15.807 | 19.545 | 13.216 | NaN |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | 16.532 | ... | 15.237 | 15.652 | 15.211 | 14.205 | 16.749 | 19.275 | 15.732 | 19.577 | 11.042 | 11.791 |
210 rows × 1213 columns
Number of complete cases which can be used:
Samples available both in proteomics data and for target: 210
Load imputations from specified model#
missing values pred. by VAE: runs/alzheimer_study/preds/pred_real_na_VAE.csv
Sample ID protein groups
Sample_042 B8ZZL8;P61604 14.378
Sample_147 Q15768 15.976
Sample_095 B7ZKR5;P13497;P13497-2;P13497-3;P13497-4;P13497-5;P13497-6;Q3MIM8 16.267
Name: intensity, dtype: float64
Load imputations from baseline model#
Sample ID protein groups
Sample_000 A0A075B6J9 13.412
A0A075B6Q5 13.967
A0A075B6R2 12.053
A0A075B6S5 13.419
A0A087WSY4 14.256
...
Sample_209 Q9P1W8;Q9P1W8-2;Q9P1W8-4 12.540
Q9UI40;Q9UI40-2 11.810
Q9UIW2 12.704
Q9UMX0;Q9UMX0-2;Q9UMX0-4 11.626
Q9UP79 12.553
Name: intensity, Length: 46401, dtype: float64
Modeling setup#
General approach:
use one train, test split of the data
select best 10 features from training data
X_train,y_trainbefore binarization of targetdichotomize (binarize) data into to groups (zero and 1)
evaluate model on the test data
X_test,y_test
Repeat general approach for
all original ald data: all features justed in original ALD study
all model data: all features available my using the self supervised deep learning model
newly available feat only: the subset of features available from the self supervised deep learning model which were newly retained using the new approach
All data:
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H7 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | ... | Q9Y653;Q9Y653-2;Q9Y653-3 | Q9Y696 | Q9Y6C2 | Q9Y6N6 | Q9Y6N7;Q9Y6N7-2;Q9Y6N7-4 | Q9Y6R7 | Q9Y6X5 | Q9Y6Y8;Q9Y6Y8-2 | Q9Y6Y9 | S4R3U6 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 17.301 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | ... | 16.012 | 15.178 | 14.230 | 15.050 | 16.842 | 19.863 | 15.901 | 19.563 | 12.837 | 12.805 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 13.796 | 19.941 | 18.786 | 17.144 | 16.804 | 19.067 | ... | 15.528 | 15.576 | 14.213 | 14.833 | 16.597 | 20.299 | 15.556 | 19.386 | 13.970 | 12.442 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 18.175 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | ... | 15.229 | 14.728 | 13.757 | 15.118 | 17.440 | 19.598 | 15.735 | 20.447 | 12.636 | 12.505 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 15.963 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | ... | 15.495 | 14.590 | 14.682 | 15.140 | 17.356 | 19.429 | 15.929 | 20.216 | 12.627 | 12.445 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 15.473 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | ... | 14.757 | 15.094 | 14.048 | 15.256 | 17.075 | 19.582 | 15.328 | 19.867 | 13.145 | 12.235 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 15.475 | 17.705 | 17.039 | 15.823 | 16.413 | 19.102 | ... | 15.235 | 15.684 | 14.236 | 15.415 | 17.551 | 17.922 | 16.340 | 19.928 | 12.929 | 11.802 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 15.550 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | ... | 15.422 | 16.106 | 14.435 | 15.345 | 17.084 | 18.708 | 14.249 | 19.433 | 11.297 | 11.077 |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 15.130 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | ... | 15.808 | 16.098 | 14.403 | 15.715 | 16.586 | 18.725 | 16.138 | 19.599 | 13.637 | 11.174 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.350 | 14.908 | 17.530 | 17.442 | 16.119 | 18.368 | ... | 15.157 | 16.712 | 14.415 | 14.640 | 16.533 | 19.411 | 15.807 | 19.545 | 13.216 | 11.124 |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 14.782 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | ... | 15.237 | 15.652 | 15.211 | 14.205 | 16.749 | 19.275 | 15.732 | 19.577 | 11.042 | 11.791 |
210 rows × 1421 columns
Subset of data by ALD criteria#
| protein groups | A0A024QZX5;A0A087X1N8;P35237 | A0A024R0T9;K7ER74;P02655 | A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | A0A075B6H9 | A0A075B6I0 | A0A075B6I1 | A0A075B6I6 | A0A075B6I9 | A0A075B6K4 | ... | O14793 | O95479;R4GMU1 | P01282;P01282-2 | P10619;P10619-2;X6R5C5;X6R8A1 | P21810 | Q14956;Q14956-2 | Q6ZMP0;Q6ZMP0-2 | Q9HBW1 | Q9NY15 | P17050 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID | |||||||||||||||||||||
| Sample_000 | 15.912 | 16.852 | 15.570 | 16.481 | 20.246 | 16.764 | 17.584 | 16.988 | 20.054 | 16.148 | ... | 12.449 | 13.051 | 13.220 | 12.867 | 13.424 | 13.300 | 13.211 | 14.187 | 13.768 | 11.609 |
| Sample_001 | 15.936 | 16.874 | 15.519 | 16.387 | 19.941 | 18.786 | 17.144 | 13.309 | 19.067 | 16.127 | ... | 12.691 | 12.670 | 13.043 | 12.063 | 12.694 | 12.414 | 12.431 | 11.873 | 13.834 | 12.740 |
| Sample_002 | 16.111 | 14.523 | 15.935 | 16.416 | 19.251 | 16.832 | 15.671 | 17.012 | 18.569 | 15.387 | ... | 14.611 | 11.361 | 14.329 | 12.444 | 13.242 | 13.526 | 12.610 | 12.567 | 12.622 | 12.551 |
| Sample_003 | 16.107 | 17.032 | 15.802 | 16.979 | 19.628 | 17.852 | 18.877 | 14.182 | 18.985 | 16.565 | ... | 12.778 | 12.245 | 14.173 | 12.847 | 12.345 | 13.331 | 13.855 | 13.287 | 11.289 | 13.354 |
| Sample_004 | 15.603 | 15.331 | 15.375 | 16.679 | 20.450 | 18.682 | 17.081 | 14.140 | 19.686 | 16.418 | ... | 13.803 | 14.433 | 12.759 | 13.537 | 13.614 | 12.396 | 14.186 | 14.139 | 11.627 | 11.973 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Sample_205 | 15.682 | 16.886 | 14.910 | 16.482 | 17.705 | 17.039 | 13.688 | 16.413 | 19.102 | 15.350 | ... | 14.269 | 14.064 | 16.826 | 18.182 | 15.225 | 15.044 | 14.192 | 16.605 | 14.995 | 14.257 |
| Sample_206 | 15.798 | 17.554 | 15.600 | 15.938 | 18.154 | 18.152 | 16.503 | 16.860 | 18.538 | 16.582 | ... | 14.273 | 17.700 | 16.802 | 20.202 | 15.280 | 15.086 | 13.978 | 18.086 | 15.557 | 14.171 |
| Sample_207 | 15.739 | 16.877 | 15.469 | 16.898 | 18.636 | 17.950 | 16.321 | 16.401 | 18.849 | 15.768 | ... | 14.473 | 16.882 | 16.917 | 20.105 | 15.690 | 15.135 | 13.138 | 17.066 | 15.706 | 15.690 |
| Sample_208 | 15.477 | 16.779 | 14.995 | 16.132 | 14.908 | 17.530 | 13.150 | 16.119 | 18.368 | 17.560 | ... | 15.234 | 17.175 | 16.521 | 18.859 | 15.305 | 15.161 | 13.006 | 17.917 | 15.396 | 14.371 |
| Sample_209 | 15.727 | 17.261 | 15.175 | 16.235 | 17.893 | 17.744 | 16.371 | 15.780 | 18.806 | 16.338 | ... | 14.556 | 16.656 | 16.954 | 18.493 | 15.823 | 14.626 | 13.385 | 17.767 | 15.687 | 13.573 |
210 rows × 1213 columns
Features which would not have been included using ALD criteria:
Index(['A0A075B6H7', 'A0A075B6Q5', 'A0A075B7B8', 'A0A087WSY4',
'A0A087WTT8;A0A0A0MQX5;O94779;O94779-2', 'A0A087WXB8;Q9Y274',
'A0A087WXE9;E9PQ70;Q6UXH9;Q6UXH9-2;Q6UXH9-3',
'A0A087X1Z2;C9JTV4;H0Y4Y4;Q8WYH2;Q96C19;Q9BUP0;Q9BUP0-2',
'A0A0A0MQS9;A0A0A0MTC7;Q16363;Q16363-2', 'A0A0A0MSN4;P12821;P12821-2',
...
'Q9NZ94;Q9NZ94-2;Q9NZ94-3', 'Q9NZU1', 'Q9P1W8;Q9P1W8-2;Q9P1W8-4',
'Q9UHI8', 'Q9UI40;Q9UI40-2',
'Q9UIB8;Q9UIB8-2;Q9UIB8-3;Q9UIB8-4;Q9UIB8-5;Q9UIB8-6',
'Q9UKZ4;Q9UKZ4-2', 'Q9UMX0;Q9UMX0-2;Q9UMX0-4', 'Q9Y281;Q9Y281-3',
'Q9Y490'],
dtype='object', name='protein groups', length=208)
Binarize targets, but also keep groups for stratification
| AD | 0 | 1 |
|---|---|---|
| AD | ||
| False | 122 | 0 |
| True | 0 | 88 |
Determine best number of parameters by cross validation procedure#
using subset of data by ALD criteria:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 112.79it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 7.12it/s]
100%|██████████| 2/2 [00:00<00:00, 7.06it/s]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████▋ | 2/3 [00:00<00:00, 7.11it/s]
100%|██████████| 3/3 [00:00<00:00, 5.06it/s]
100%|██████████| 3/3 [00:00<00:00, 5.35it/s]
0%| | 0/4 [00:00<?, ?it/s]
50%|█████ | 2/4 [00:00<00:00, 7.42it/s]
75%|███████▌ | 3/4 [00:00<00:00, 5.26it/s]
100%|██████████| 4/4 [00:00<00:00, 4.50it/s]
100%|██████████| 4/4 [00:00<00:00, 4.90it/s]
0%| | 0/5 [00:00<?, ?it/s]
40%|████ | 2/5 [00:00<00:00, 7.93it/s]
60%|██████ | 3/5 [00:00<00:00, 4.99it/s]
80%|████████ | 4/5 [00:00<00:00, 4.50it/s]
100%|██████████| 5/5 [00:01<00:00, 4.13it/s]
100%|██████████| 5/5 [00:01<00:00, 4.56it/s]
0%| | 0/6 [00:00<?, ?it/s]
33%|███▎ | 2/6 [00:00<00:00, 7.76it/s]
50%|█████ | 3/6 [00:00<00:00, 5.10it/s]
67%|██████▋ | 4/6 [00:00<00:00, 4.57it/s]
83%|████████▎ | 5/6 [00:01<00:00, 4.34it/s]
100%|██████████| 6/6 [00:01<00:00, 4.18it/s]
100%|██████████| 6/6 [00:01<00:00, 4.56it/s]
0%| | 0/7 [00:00<?, ?it/s]
29%|██▊ | 2/7 [00:00<00:00, 7.44it/s]
43%|████▎ | 3/7 [00:00<00:00, 5.01it/s]
57%|█████▋ | 4/7 [00:00<00:00, 4.50it/s]
71%|███████▏ | 5/7 [00:01<00:00, 4.09it/s]
86%|████████▌ | 6/7 [00:01<00:00, 4.02it/s]
100%|██████████| 7/7 [00:01<00:00, 4.01it/s]
100%|██████████| 7/7 [00:01<00:00, 4.33it/s]
0%| | 0/8 [00:00<?, ?it/s]
25%|██▌ | 2/8 [00:00<00:00, 6.98it/s]
38%|███▊ | 3/8 [00:00<00:01, 4.90it/s]
50%|█████ | 4/8 [00:00<00:00, 4.55it/s]
62%|██████▎ | 5/8 [00:01<00:00, 4.34it/s]
75%|███████▌ | 6/8 [00:01<00:00, 3.90it/s]
88%|████████▊ | 7/8 [00:01<00:00, 3.86it/s]
100%|██████████| 8/8 [00:01<00:00, 3.91it/s]
100%|██████████| 8/8 [00:01<00:00, 4.21it/s]
0%| | 0/9 [00:00<?, ?it/s]
22%|██▏ | 2/9 [00:00<00:01, 6.32it/s]
33%|███▎ | 3/9 [00:00<00:01, 4.38it/s]
44%|████▍ | 4/9 [00:00<00:01, 3.79it/s]
56%|█████▌ | 5/9 [00:01<00:01, 3.67it/s]
67%|██████▋ | 6/9 [00:01<00:00, 3.57it/s]
78%|███████▊ | 7/9 [00:01<00:00, 3.67it/s]
89%|████████▉ | 8/9 [00:02<00:00, 3.77it/s]
100%|██████████| 9/9 [00:02<00:00, 3.74it/s]
100%|██████████| 9/9 [00:02<00:00, 3.86it/s]
0%| | 0/10 [00:00<?, ?it/s]
20%|██ | 2/10 [00:00<00:01, 7.51it/s]
30%|███ | 3/10 [00:00<00:01, 5.29it/s]
40%|████ | 4/10 [00:00<00:01, 4.44it/s]
50%|█████ | 5/10 [00:01<00:01, 4.30it/s]
60%|██████ | 6/10 [00:01<00:00, 4.04it/s]
70%|███████ | 7/10 [00:01<00:00, 3.78it/s]
80%|████████ | 8/10 [00:01<00:00, 3.70it/s]
90%|█████████ | 9/10 [00:02<00:00, 3.52it/s]
100%|██████████| 10/10 [00:02<00:00, 3.75it/s]
100%|██████████| 10/10 [00:02<00:00, 4.05it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:01, 7.05it/s]
27%|██▋ | 3/11 [00:00<00:01, 5.04it/s]
36%|███▋ | 4/11 [00:00<00:01, 4.42it/s]
45%|████▌ | 5/11 [00:01<00:01, 4.11it/s]
55%|█████▍ | 6/11 [00:01<00:01, 3.98it/s]
64%|██████▎ | 7/11 [00:01<00:01, 3.87it/s]
73%|███████▎ | 8/11 [00:01<00:00, 3.86it/s]
82%|████████▏ | 9/11 [00:02<00:00, 3.87it/s]
91%|█████████ | 10/11 [00:02<00:00, 3.92it/s]
100%|██████████| 11/11 [00:02<00:00, 4.09it/s]
100%|██████████| 11/11 [00:02<00:00, 4.17it/s]
0%| | 0/12 [00:00<?, ?it/s]
17%|█▋ | 2/12 [00:00<00:01, 7.07it/s]
25%|██▌ | 3/12 [00:00<00:01, 5.15it/s]
33%|███▎ | 4/12 [00:00<00:01, 4.69it/s]
42%|████▏ | 5/12 [00:01<00:01, 4.37it/s]
50%|█████ | 6/12 [00:01<00:01, 4.24it/s]
58%|█████▊ | 7/12 [00:01<00:01, 4.25it/s]
67%|██████▋ | 8/12 [00:01<00:01, 3.96it/s]
75%|███████▌ | 9/12 [00:02<00:00, 4.19it/s]
83%|████████▎ | 10/12 [00:02<00:00, 3.35it/s]
92%|█████████▏| 11/12 [00:02<00:00, 3.48it/s]
100%|██████████| 12/12 [00:02<00:00, 3.83it/s]
100%|██████████| 12/12 [00:02<00:00, 4.09it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:01, 6.87it/s]
23%|██▎ | 3/13 [00:00<00:01, 5.05it/s]
31%|███ | 4/13 [00:00<00:01, 4.57it/s]
38%|███▊ | 5/13 [00:01<00:01, 4.13it/s]
46%|████▌ | 6/13 [00:01<00:01, 4.02it/s]
54%|█████▍ | 7/13 [00:01<00:01, 3.97it/s]
62%|██████▏ | 8/13 [00:01<00:01, 3.92it/s]
69%|██████▉ | 9/13 [00:02<00:01, 3.99it/s]
77%|███████▋ | 10/13 [00:02<00:00, 4.05it/s]
85%|████████▍ | 11/13 [00:02<00:00, 3.98it/s]
92%|█████████▏| 12/13 [00:02<00:00, 4.09it/s]
100%|██████████| 13/13 [00:03<00:00, 4.27it/s]
100%|██████████| 13/13 [00:03<00:00, 4.24it/s]
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:02, 5.54it/s]
21%|██▏ | 3/14 [00:00<00:03, 3.63it/s]
29%|██▊ | 4/14 [00:01<00:03, 3.06it/s]
36%|███▌ | 5/14 [00:01<00:03, 2.86it/s]
43%|████▎ | 6/14 [00:01<00:02, 2.72it/s]
50%|█████ | 7/14 [00:02<00:02, 2.62it/s]
57%|█████▋ | 8/14 [00:02<00:02, 2.56it/s]
64%|██████▍ | 9/14 [00:03<00:02, 2.49it/s]
71%|███████▏ | 10/14 [00:03<00:01, 2.70it/s]
79%|███████▊ | 11/14 [00:03<00:01, 2.97it/s]
86%|████████▌ | 12/14 [00:04<00:00, 3.19it/s]
93%|█████████▎| 13/14 [00:04<00:00, 3.39it/s]
100%|██████████| 14/14 [00:04<00:00, 3.63it/s]
100%|██████████| 14/14 [00:04<00:00, 3.09it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:02, 5.93it/s]
20%|██ | 3/15 [00:00<00:02, 4.94it/s]
27%|██▋ | 4/15 [00:00<00:02, 4.51it/s]
33%|███▎ | 5/15 [00:01<00:02, 4.13it/s]
40%|████ | 6/15 [00:01<00:02, 3.86it/s]
47%|████▋ | 7/15 [00:01<00:02, 3.82it/s]
53%|█████▎ | 8/15 [00:01<00:01, 3.68it/s]
60%|██████ | 9/15 [00:02<00:01, 3.42it/s]
67%|██████▋ | 10/15 [00:02<00:01, 3.08it/s]
73%|███████▎ | 11/15 [00:03<00:01, 2.80it/s]
80%|████████ | 12/15 [00:03<00:01, 2.71it/s]
87%|████████▋ | 13/15 [00:03<00:00, 2.72it/s]
93%|█████████▎| 14/15 [00:04<00:00, 2.89it/s]
100%|██████████| 15/15 [00:04<00:00, 3.00it/s]
100%|██████████| 15/15 [00:04<00:00, 3.33it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.003 | 0.001 | 0.036 | 0.006 | 0.899 | 0.158 | 0.169 | 0.089 | 0.274 | 0.124 | 0.576 | 0.043 | 0.856 | 0.060 | 0.823 | 0.086 | 210.000 | 0.000 |
| 2 | 0.003 | 0.000 | 0.037 | 0.001 | 0.629 | 0.134 | 0.431 | 0.141 | 0.497 | 0.115 | 0.618 | 0.071 | 0.693 | 0.083 | 0.633 | 0.095 | 210.000 | 0.000 |
| 3 | 0.003 | 0.000 | 0.037 | 0.000 | 0.663 | 0.098 | 0.611 | 0.122 | 0.631 | 0.094 | 0.691 | 0.072 | 0.789 | 0.070 | 0.736 | 0.099 | 210.000 | 0.000 |
| 4 | 0.003 | 0.000 | 0.036 | 0.000 | 0.662 | 0.097 | 0.620 | 0.126 | 0.635 | 0.096 | 0.694 | 0.073 | 0.780 | 0.073 | 0.725 | 0.101 | 210.000 | 0.000 |
| 5 | 0.003 | 0.000 | 0.037 | 0.000 | 0.754 | 0.102 | 0.716 | 0.099 | 0.728 | 0.077 | 0.768 | 0.065 | 0.857 | 0.057 | 0.836 | 0.063 | 210.000 | 0.000 |
| 6 | 0.003 | 0.000 | 0.037 | 0.001 | 0.810 | 0.078 | 0.829 | 0.095 | 0.815 | 0.062 | 0.842 | 0.054 | 0.908 | 0.046 | 0.889 | 0.053 | 210.000 | 0.000 |
| 7 | 0.003 | 0.000 | 0.036 | 0.000 | 0.810 | 0.079 | 0.827 | 0.100 | 0.814 | 0.066 | 0.841 | 0.057 | 0.906 | 0.048 | 0.887 | 0.055 | 210.000 | 0.000 |
| 8 | 0.004 | 0.001 | 0.039 | 0.010 | 0.825 | 0.087 | 0.825 | 0.098 | 0.820 | 0.065 | 0.846 | 0.055 | 0.909 | 0.047 | 0.882 | 0.063 | 210.000 | 0.000 |
| 9 | 0.004 | 0.000 | 0.037 | 0.000 | 0.817 | 0.083 | 0.807 | 0.104 | 0.806 | 0.066 | 0.835 | 0.055 | 0.908 | 0.048 | 0.885 | 0.059 | 210.000 | 0.000 |
| 10 | 0.003 | 0.001 | 0.032 | 0.012 | 0.843 | 0.088 | 0.824 | 0.105 | 0.828 | 0.072 | 0.853 | 0.061 | 0.919 | 0.049 | 0.906 | 0.056 | 210.000 | 0.000 |
| 11 | 0.004 | 0.000 | 0.035 | 0.002 | 0.835 | 0.087 | 0.817 | 0.108 | 0.821 | 0.075 | 0.848 | 0.064 | 0.920 | 0.049 | 0.907 | 0.056 | 210.000 | 0.000 |
| 12 | 0.003 | 0.000 | 0.034 | 0.003 | 0.829 | 0.085 | 0.830 | 0.098 | 0.825 | 0.073 | 0.851 | 0.063 | 0.920 | 0.050 | 0.909 | 0.055 | 210.000 | 0.000 |
| 13 | 0.004 | 0.002 | 0.037 | 0.018 | 0.831 | 0.089 | 0.828 | 0.099 | 0.825 | 0.073 | 0.850 | 0.063 | 0.919 | 0.050 | 0.908 | 0.055 | 210.000 | 0.000 |
| 14 | 0.005 | 0.002 | 0.047 | 0.020 | 0.821 | 0.086 | 0.825 | 0.092 | 0.819 | 0.066 | 0.845 | 0.057 | 0.918 | 0.049 | 0.908 | 0.053 | 210.000 | 0.000 |
| 15 | 0.004 | 0.001 | 0.038 | 0.006 | 0.828 | 0.089 | 0.825 | 0.092 | 0.822 | 0.069 | 0.848 | 0.059 | 0.919 | 0.049 | 0.911 | 0.051 | 210.000 | 0.000 |
Using all data:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 487.37it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 5.20it/s]
100%|██████████| 2/2 [00:00<00:00, 5.17it/s]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████▋ | 2/3 [00:00<00:00, 6.77it/s]
100%|██████████| 3/3 [00:00<00:00, 4.29it/s]
100%|██████████| 3/3 [00:00<00:00, 4.59it/s]
0%| | 0/4 [00:00<?, ?it/s]
50%|█████ | 2/4 [00:00<00:00, 5.73it/s]
75%|███████▌ | 3/4 [00:00<00:00, 4.18it/s]
100%|██████████| 4/4 [00:01<00:00, 3.65it/s]
100%|██████████| 4/4 [00:01<00:00, 3.95it/s]
0%| | 0/5 [00:00<?, ?it/s]
40%|████ | 2/5 [00:00<00:00, 6.35it/s]
60%|██████ | 3/5 [00:00<00:00, 4.39it/s]
80%|████████ | 4/5 [00:00<00:00, 4.03it/s]
100%|██████████| 5/5 [00:01<00:00, 3.10it/s]
100%|██████████| 5/5 [00:01<00:00, 3.61it/s]
0%| | 0/6 [00:00<?, ?it/s]
33%|███▎ | 2/6 [00:00<00:00, 6.47it/s]
50%|█████ | 3/6 [00:00<00:00, 5.06it/s]
67%|██████▋ | 4/6 [00:00<00:00, 4.49it/s]
83%|████████▎ | 5/6 [00:01<00:00, 4.08it/s]
100%|██████████| 6/6 [00:01<00:00, 3.17it/s]
100%|██████████| 6/6 [00:01<00:00, 3.80it/s]
0%| | 0/7 [00:00<?, ?it/s]
29%|██▊ | 2/7 [00:00<00:00, 5.87it/s]
43%|████▎ | 3/7 [00:00<00:00, 4.42it/s]
57%|█████▋ | 4/7 [00:00<00:00, 4.00it/s]
71%|███████▏ | 5/7 [00:01<00:00, 3.76it/s]
86%|████████▌ | 6/7 [00:01<00:00, 3.75it/s]
100%|██████████| 7/7 [00:01<00:00, 3.72it/s]
100%|██████████| 7/7 [00:01<00:00, 3.95it/s]
0%| | 0/8 [00:00<?, ?it/s]
25%|██▌ | 2/8 [00:00<00:00, 6.89it/s]
38%|███▊ | 3/8 [00:00<00:01, 4.96it/s]
50%|█████ | 4/8 [00:00<00:00, 4.23it/s]
62%|██████▎ | 5/8 [00:01<00:00, 3.97it/s]
75%|███████▌ | 6/8 [00:01<00:00, 3.78it/s]
88%|████████▊ | 7/8 [00:01<00:00, 3.79it/s]
100%|██████████| 8/8 [00:02<00:00, 3.14it/s]
100%|██████████| 8/8 [00:02<00:00, 3.74it/s]
0%| | 0/9 [00:00<?, ?it/s]
22%|██▏ | 2/9 [00:00<00:01, 6.75it/s]
33%|███▎ | 3/9 [00:00<00:01, 4.91it/s]
44%|████▍ | 4/9 [00:00<00:01, 4.04it/s]
56%|█████▌ | 5/9 [00:01<00:01, 3.98it/s]
67%|██████▋ | 6/9 [00:01<00:00, 3.79it/s]
78%|███████▊ | 7/9 [00:01<00:00, 3.71it/s]
89%|████████▉ | 8/9 [00:01<00:00, 3.79it/s]
100%|██████████| 9/9 [00:02<00:00, 3.69it/s]
100%|██████████| 9/9 [00:02<00:00, 3.97it/s]
0%| | 0/10 [00:00<?, ?it/s]
20%|██ | 2/10 [00:00<00:01, 6.61it/s]
30%|███ | 3/10 [00:00<00:01, 4.70it/s]
40%|████ | 4/10 [00:00<00:01, 3.91it/s]
50%|█████ | 5/10 [00:01<00:01, 3.71it/s]
60%|██████ | 6/10 [00:01<00:01, 3.63it/s]
70%|███████ | 7/10 [00:01<00:00, 3.46it/s]
80%|████████ | 8/10 [00:02<00:00, 3.43it/s]
90%|█████████ | 9/10 [00:02<00:00, 3.40it/s]
100%|██████████| 10/10 [00:02<00:00, 3.56it/s]
100%|██████████| 10/10 [00:02<00:00, 3.73it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:01, 5.44it/s]
27%|██▋ | 3/11 [00:00<00:02, 3.68it/s]
36%|███▋ | 4/11 [00:01<00:02, 3.30it/s]
45%|████▌ | 5/11 [00:01<00:01, 3.18it/s]
55%|█████▍ | 6/11 [00:01<00:01, 3.12it/s]
64%|██████▎ | 7/11 [00:02<00:01, 3.27it/s]
73%|███████▎ | 8/11 [00:02<00:00, 3.31it/s]
82%|████████▏ | 9/11 [00:02<00:00, 3.29it/s]
91%|█████████ | 10/11 [00:02<00:00, 3.41it/s]
100%|██████████| 11/11 [00:03<00:00, 3.34it/s]
100%|██████████| 11/11 [00:03<00:00, 3.39it/s]
0%| | 0/12 [00:00<?, ?it/s]
17%|█▋ | 2/12 [00:00<00:01, 6.39it/s]
25%|██▌ | 3/12 [00:00<00:02, 4.45it/s]
33%|███▎ | 4/12 [00:00<00:02, 3.79it/s]
42%|████▏ | 5/12 [00:01<00:01, 3.52it/s]
50%|█████ | 6/12 [00:01<00:01, 3.34it/s]
58%|█████▊ | 7/12 [00:01<00:01, 3.31it/s]
67%|██████▋ | 8/12 [00:02<00:01, 3.40it/s]
75%|███████▌ | 9/12 [00:02<00:00, 3.34it/s]
83%|████████▎ | 10/12 [00:02<00:00, 3.13it/s]
92%|█████████▏| 11/12 [00:03<00:00, 2.81it/s]
100%|██████████| 12/12 [00:03<00:00, 2.78it/s]
100%|██████████| 12/12 [00:03<00:00, 3.26it/s]
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:01, 5.94it/s]
23%|██▎ | 3/13 [00:00<00:02, 4.61it/s]
31%|███ | 4/13 [00:00<00:02, 4.10it/s]
38%|███▊ | 5/13 [00:01<00:02, 3.85it/s]
46%|████▌ | 6/13 [00:01<00:01, 3.64it/s]
54%|█████▍ | 7/13 [00:01<00:01, 3.51it/s]
62%|██████▏ | 8/13 [00:02<00:01, 3.46it/s]
69%|██████▉ | 9/13 [00:02<00:01, 3.55it/s]
77%|███████▋ | 10/13 [00:02<00:00, 3.60it/s]
85%|████████▍ | 11/13 [00:02<00:00, 3.64it/s]
92%|█████████▏| 12/13 [00:03<00:00, 3.62it/s]
100%|██████████| 13/13 [00:03<00:00, 3.11it/s]
100%|██████████| 13/13 [00:03<00:00, 3.59it/s]
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:02, 4.48it/s]
21%|██▏ | 3/14 [00:00<00:03, 3.14it/s]
29%|██▊ | 4/14 [00:01<00:03, 2.71it/s]
36%|███▌ | 5/14 [00:01<00:03, 2.89it/s]
43%|████▎ | 6/14 [00:02<00:02, 2.82it/s]
50%|█████ | 7/14 [00:02<00:02, 2.98it/s]
57%|█████▋ | 8/14 [00:02<00:01, 3.14it/s]
64%|██████▍ | 9/14 [00:02<00:01, 3.23it/s]
71%|███████▏ | 10/14 [00:03<00:01, 3.36it/s]
79%|███████▊ | 11/14 [00:03<00:00, 3.42it/s]
86%|████████▌ | 12/14 [00:03<00:00, 3.52it/s]
93%|█████████▎| 13/14 [00:03<00:00, 3.58it/s]
100%|██████████| 14/14 [00:04<00:00, 3.63it/s]
100%|██████████| 14/14 [00:04<00:00, 3.30it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:02, 6.01it/s]
20%|██ | 3/15 [00:00<00:02, 4.42it/s]
27%|██▋ | 4/15 [00:01<00:03, 3.61it/s]
33%|███▎ | 5/15 [00:01<00:03, 2.67it/s]
40%|████ | 6/15 [00:02<00:03, 2.43it/s]
47%|████▋ | 7/15 [00:02<00:03, 2.34it/s]
53%|█████▎ | 8/15 [00:02<00:03, 2.24it/s]
60%|██████ | 9/15 [00:03<00:02, 2.55it/s]
67%|██████▋ | 10/15 [00:03<00:01, 2.70it/s]
73%|███████▎ | 11/15 [00:03<00:01, 2.80it/s]
80%|████████ | 12/15 [00:04<00:01, 2.98it/s]
87%|████████▋ | 13/15 [00:04<00:00, 3.14it/s]
93%|█████████▎| 14/15 [00:04<00:00, 3.27it/s]
100%|██████████| 15/15 [00:05<00:00, 3.44it/s]
100%|██████████| 15/15 [00:05<00:00, 2.99it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.004 | 0.003 | 0.046 | 0.016 | 0.013 | 0.094 | 0.002 | 0.017 | 0.004 | 0.028 | 0.497 | 0.010 | 0.854 | 0.065 | 0.821 | 0.089 | 210.000 | 0.000 |
| 2 | 0.004 | 0.001 | 0.046 | 0.015 | 0.697 | 0.092 | 0.506 | 0.113 | 0.578 | 0.090 | 0.671 | 0.053 | 0.711 | 0.072 | 0.698 | 0.075 | 210.000 | 0.000 |
| 3 | 0.004 | 0.002 | 0.046 | 0.015 | 0.690 | 0.092 | 0.515 | 0.129 | 0.580 | 0.097 | 0.671 | 0.058 | 0.735 | 0.071 | 0.686 | 0.084 | 210.000 | 0.000 |
| 4 | 0.006 | 0.002 | 0.063 | 0.019 | 0.763 | 0.113 | 0.586 | 0.113 | 0.655 | 0.088 | 0.723 | 0.061 | 0.781 | 0.067 | 0.755 | 0.083 | 210.000 | 0.000 |
| 5 | 0.006 | 0.002 | 0.062 | 0.020 | 0.714 | 0.086 | 0.626 | 0.109 | 0.661 | 0.079 | 0.720 | 0.057 | 0.789 | 0.066 | 0.750 | 0.086 | 210.000 | 0.000 |
| 6 | 0.005 | 0.003 | 0.053 | 0.024 | 0.700 | 0.085 | 0.620 | 0.109 | 0.652 | 0.077 | 0.712 | 0.057 | 0.784 | 0.066 | 0.744 | 0.086 | 210.000 | 0.000 |
| 7 | 0.006 | 0.003 | 0.064 | 0.025 | 0.708 | 0.096 | 0.625 | 0.123 | 0.658 | 0.090 | 0.717 | 0.068 | 0.792 | 0.073 | 0.749 | 0.094 | 210.000 | 0.000 |
| 8 | 0.005 | 0.002 | 0.057 | 0.023 | 0.798 | 0.090 | 0.751 | 0.098 | 0.768 | 0.066 | 0.803 | 0.054 | 0.880 | 0.057 | 0.856 | 0.068 | 210.000 | 0.000 |
| 9 | 0.005 | 0.002 | 0.054 | 0.020 | 0.799 | 0.094 | 0.755 | 0.102 | 0.771 | 0.073 | 0.805 | 0.059 | 0.880 | 0.058 | 0.854 | 0.073 | 210.000 | 0.000 |
| 10 | 0.005 | 0.002 | 0.052 | 0.019 | 0.807 | 0.097 | 0.778 | 0.118 | 0.785 | 0.080 | 0.818 | 0.066 | 0.908 | 0.049 | 0.873 | 0.068 | 210.000 | 0.000 |
| 11 | 0.007 | 0.003 | 0.069 | 0.019 | 0.811 | 0.098 | 0.791 | 0.127 | 0.794 | 0.088 | 0.826 | 0.073 | 0.911 | 0.048 | 0.878 | 0.066 | 210.000 | 0.000 |
| 12 | 0.006 | 0.003 | 0.062 | 0.025 | 0.823 | 0.102 | 0.788 | 0.123 | 0.799 | 0.089 | 0.830 | 0.073 | 0.913 | 0.048 | 0.883 | 0.064 | 210.000 | 0.000 |
| 13 | 0.006 | 0.002 | 0.062 | 0.019 | 0.826 | 0.090 | 0.794 | 0.117 | 0.804 | 0.082 | 0.834 | 0.068 | 0.914 | 0.048 | 0.887 | 0.061 | 210.000 | 0.000 |
| 14 | 0.007 | 0.004 | 0.066 | 0.030 | 0.825 | 0.089 | 0.799 | 0.118 | 0.806 | 0.081 | 0.836 | 0.067 | 0.914 | 0.047 | 0.886 | 0.062 | 210.000 | 0.000 |
| 15 | 0.005 | 0.002 | 0.045 | 0.016 | 0.822 | 0.094 | 0.792 | 0.113 | 0.800 | 0.079 | 0.831 | 0.065 | 0.913 | 0.047 | 0.889 | 0.058 | 210.000 | 0.000 |
Using only new features:
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 481.22it/s]
0%| | 0/2 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<00:00, 27.63it/s]
0%| | 0/3 [00:00<?, ?it/s]
100%|██████████| 3/3 [00:00<00:00, 16.75it/s]
100%|██████████| 3/3 [00:00<00:00, 16.65it/s]
0%| | 0/4 [00:00<?, ?it/s]
50%|█████ | 2/4 [00:00<00:00, 17.86it/s]
100%|██████████| 4/4 [00:00<00:00, 12.45it/s]
100%|██████████| 4/4 [00:00<00:00, 12.99it/s]
0%| | 0/5 [00:00<?, ?it/s]
60%|██████ | 3/5 [00:00<00:00, 20.32it/s]
100%|██████████| 5/5 [00:00<00:00, 18.48it/s]
0%| | 0/6 [00:00<?, ?it/s]
50%|█████ | 3/6 [00:00<00:00, 19.22it/s]
83%|████████▎ | 5/6 [00:00<00:00, 16.91it/s]
100%|██████████| 6/6 [00:00<00:00, 15.28it/s]
0%| | 0/7 [00:00<?, ?it/s]
43%|████▎ | 3/7 [00:00<00:00, 26.23it/s]
86%|████████▌ | 6/7 [00:00<00:00, 19.19it/s]
100%|██████████| 7/7 [00:00<00:00, 17.78it/s]
0%| | 0/8 [00:00<?, ?it/s]
38%|███▊ | 3/8 [00:00<00:00, 29.69it/s]
75%|███████▌ | 6/8 [00:00<00:00, 17.80it/s]
100%|██████████| 8/8 [00:00<00:00, 16.78it/s]
0%| | 0/9 [00:00<?, ?it/s]
33%|███▎ | 3/9 [00:00<00:00, 22.59it/s]
67%|██████▋ | 6/9 [00:00<00:00, 17.06it/s]
89%|████████▉ | 8/9 [00:00<00:00, 15.47it/s]
100%|██████████| 9/9 [00:00<00:00, 15.55it/s]
0%| | 0/10 [00:00<?, ?it/s]
30%|███ | 3/10 [00:00<00:00, 23.61it/s]
60%|██████ | 6/10 [00:00<00:00, 13.83it/s]
80%|████████ | 8/10 [00:00<00:00, 11.87it/s]
100%|██████████| 10/10 [00:00<00:00, 12.47it/s]
100%|██████████| 10/10 [00:00<00:00, 13.10it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:00, 18.99it/s]
36%|███▋ | 4/11 [00:00<00:00, 12.57it/s]
55%|█████▍ | 6/11 [00:00<00:00, 10.53it/s]
73%|███████▎ | 8/11 [00:00<00:00, 8.83it/s]
82%|████████▏ | 9/11 [00:00<00:00, 8.51it/s]
91%|█████████ | 10/11 [00:01<00:00, 8.13it/s]
100%|██████████| 11/11 [00:01<00:00, 7.60it/s]
100%|██████████| 11/11 [00:01<00:00, 8.82it/s]
0%| | 0/12 [00:00<?, ?it/s]
25%|██▌ | 3/12 [00:00<00:00, 22.37it/s]
50%|█████ | 6/12 [00:00<00:00, 12.83it/s]
67%|██████▋ | 8/12 [00:00<00:00, 13.49it/s]
83%|████████▎ | 10/12 [00:00<00:00, 13.75it/s]
100%|██████████| 12/12 [00:00<00:00, 14.00it/s]
100%|██████████| 12/12 [00:00<00:00, 14.10it/s]
0%| | 0/13 [00:00<?, ?it/s]
23%|██▎ | 3/13 [00:00<00:00, 17.59it/s]
38%|███▊ | 5/13 [00:00<00:00, 13.94it/s]
54%|█████▍ | 7/13 [00:00<00:00, 12.09it/s]
69%|██████▉ | 9/13 [00:00<00:00, 11.44it/s]
85%|████████▍ | 11/13 [00:00<00:00, 11.50it/s]
100%|██████████| 13/13 [00:01<00:00, 11.63it/s]
100%|██████████| 13/13 [00:01<00:00, 12.07it/s]
0%| | 0/14 [00:00<?, ?it/s]
14%|█▍ | 2/14 [00:00<00:00, 19.05it/s]
29%|██▊ | 4/14 [00:00<00:00, 10.48it/s]
43%|████▎ | 6/14 [00:00<00:00, 10.40it/s]
57%|█████▋ | 8/14 [00:00<00:00, 9.41it/s]
71%|███████▏ | 10/14 [00:00<00:00, 9.90it/s]
86%|████████▌ | 12/14 [00:01<00:00, 11.42it/s]
100%|██████████| 14/14 [00:01<00:00, 11.32it/s]
100%|██████████| 14/14 [00:01<00:00, 10.93it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:00, 19.10it/s]
27%|██▋ | 4/15 [00:00<00:01, 10.52it/s]
40%|████ | 6/15 [00:00<00:01, 8.95it/s]
53%|█████▎ | 8/15 [00:00<00:00, 8.74it/s]
60%|██████ | 9/15 [00:00<00:00, 8.55it/s]
67%|██████▋ | 10/15 [00:01<00:00, 8.38it/s]
73%|███████▎ | 11/15 [00:01<00:00, 8.52it/s]
80%|████████ | 12/15 [00:01<00:00, 8.35it/s]
87%|████████▋ | 13/15 [00:01<00:00, 8.44it/s]
93%|█████████▎| 14/15 [00:01<00:00, 8.15it/s]
100%|██████████| 15/15 [00:01<00:00, 8.48it/s]
100%|██████████| 15/15 [00:01<00:00, 8.80it/s]
| fit_time | score_time | test_precision | test_recall | test_f1 | test_balanced_accuracy | test_roc_auc | test_average_precision | n_observations | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | mean | std | |
| n_features | ||||||||||||||||||
| 1 | 0.004 | 0.002 | 0.046 | 0.020 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.500 | 0.000 | 0.748 | 0.066 | 0.688 | 0.084 | 210.000 | 0.000 |
| 2 | 0.005 | 0.003 | 0.056 | 0.022 | 0.681 | 0.147 | 0.469 | 0.104 | 0.550 | 0.108 | 0.652 | 0.072 | 0.718 | 0.074 | 0.689 | 0.083 | 210.000 | 0.000 |
| 3 | 0.004 | 0.001 | 0.039 | 0.010 | 0.669 | 0.143 | 0.459 | 0.102 | 0.539 | 0.106 | 0.645 | 0.070 | 0.713 | 0.072 | 0.684 | 0.082 | 210.000 | 0.000 |
| 4 | 0.005 | 0.003 | 0.053 | 0.022 | 0.634 | 0.117 | 0.513 | 0.115 | 0.561 | 0.100 | 0.647 | 0.072 | 0.757 | 0.070 | 0.702 | 0.088 | 210.000 | 0.000 |
| 5 | 0.005 | 0.002 | 0.052 | 0.020 | 0.634 | 0.098 | 0.547 | 0.113 | 0.581 | 0.086 | 0.656 | 0.063 | 0.785 | 0.066 | 0.720 | 0.087 | 210.000 | 0.000 |
| 6 | 0.004 | 0.002 | 0.043 | 0.015 | 0.627 | 0.086 | 0.533 | 0.104 | 0.571 | 0.080 | 0.650 | 0.058 | 0.787 | 0.065 | 0.733 | 0.079 | 210.000 | 0.000 |
| 7 | 0.005 | 0.002 | 0.050 | 0.018 | 0.681 | 0.094 | 0.623 | 0.118 | 0.644 | 0.089 | 0.704 | 0.067 | 0.790 | 0.062 | 0.734 | 0.077 | 210.000 | 0.000 |
| 8 | 0.005 | 0.002 | 0.045 | 0.016 | 0.684 | 0.095 | 0.614 | 0.120 | 0.639 | 0.089 | 0.701 | 0.066 | 0.788 | 0.064 | 0.731 | 0.077 | 210.000 | 0.000 |
| 9 | 0.005 | 0.003 | 0.058 | 0.025 | 0.682 | 0.092 | 0.608 | 0.114 | 0.636 | 0.084 | 0.698 | 0.062 | 0.784 | 0.064 | 0.726 | 0.078 | 210.000 | 0.000 |
| 10 | 0.004 | 0.000 | 0.038 | 0.005 | 0.679 | 0.095 | 0.605 | 0.107 | 0.633 | 0.080 | 0.695 | 0.061 | 0.779 | 0.065 | 0.721 | 0.080 | 210.000 | 0.000 |
| 11 | 0.008 | 0.003 | 0.078 | 0.021 | 0.665 | 0.103 | 0.594 | 0.115 | 0.620 | 0.090 | 0.684 | 0.067 | 0.775 | 0.068 | 0.716 | 0.080 | 210.000 | 0.000 |
| 12 | 0.004 | 0.001 | 0.043 | 0.012 | 0.666 | 0.104 | 0.591 | 0.114 | 0.620 | 0.093 | 0.685 | 0.069 | 0.772 | 0.068 | 0.711 | 0.080 | 210.000 | 0.000 |
| 13 | 0.005 | 0.002 | 0.050 | 0.019 | 0.657 | 0.094 | 0.584 | 0.097 | 0.612 | 0.076 | 0.677 | 0.062 | 0.779 | 0.064 | 0.726 | 0.079 | 210.000 | 0.000 |
| 14 | 0.004 | 0.001 | 0.040 | 0.009 | 0.660 | 0.086 | 0.588 | 0.098 | 0.616 | 0.075 | 0.681 | 0.059 | 0.785 | 0.061 | 0.729 | 0.076 | 210.000 | 0.000 |
| 15 | 0.006 | 0.003 | 0.062 | 0.023 | 0.646 | 0.085 | 0.606 | 0.107 | 0.620 | 0.081 | 0.680 | 0.063 | 0.798 | 0.062 | 0.742 | 0.076 | 210.000 | 0.000 |
Best number of features by subset of the data:#
| ald | all | new | |
|---|---|---|---|
| fit_time | 14 | 11 | 11 |
| score_time | 14 | 11 | 11 |
| test_precision | 1 | 13 | 8 |
| test_recall | 12 | 14 | 7 |
| test_f1 | 10 | 14 | 7 |
| test_balanced_accuracy | 10 | 14 | 7 |
| test_roc_auc | 11 | 13 | 15 |
| test_average_precision | 15 | 15 | 15 |
| n_observations | 1 | 1 | 1 |
Train, test split#
Show number of cases in train and test data
| train | test | |
|---|---|---|
| False | 98 | 24 |
| True | 70 | 18 |
Results#
run_modelreturns dataclasses with the further needed resultsadd mrmr selection of data (select best number of features to use instead of fixing it)
Save results for final model on entire data, new features and ALD study criteria selected data.
0%| | 0/13 [00:00<?, ?it/s]
15%|█▌ | 2/13 [00:00<00:02, 5.02it/s]
23%|██▎ | 3/13 [00:00<00:02, 3.35it/s]
31%|███ | 4/13 [00:01<00:02, 3.15it/s]
38%|███▊ | 5/13 [00:01<00:02, 3.13it/s]
46%|████▌ | 6/13 [00:01<00:02, 2.98it/s]
54%|█████▍ | 7/13 [00:02<00:02, 3.00it/s]
62%|██████▏ | 8/13 [00:02<00:01, 2.73it/s]
69%|██████▉ | 9/13 [00:03<00:01, 2.58it/s]
77%|███████▋ | 10/13 [00:03<00:01, 2.41it/s]
85%|████████▍ | 11/13 [00:04<00:00, 2.35it/s]
92%|█████████▏| 12/13 [00:04<00:00, 2.60it/s]
100%|██████████| 13/13 [00:04<00:00, 2.33it/s]
100%|██████████| 13/13 [00:04<00:00, 2.69it/s]
0%| | 0/15 [00:00<?, ?it/s]
13%|█▎ | 2/15 [00:00<00:00, 18.68it/s]
27%|██▋ | 4/15 [00:00<00:00, 12.91it/s]
40%|████ | 6/15 [00:00<00:00, 11.17it/s]
53%|█████▎ | 8/15 [00:00<00:00, 12.86it/s]
67%|██████▋ | 10/15 [00:00<00:00, 11.41it/s]
80%|████████ | 12/15 [00:01<00:00, 11.45it/s]
93%|█████████▎| 14/15 [00:01<00:00, 9.92it/s]
100%|██████████| 15/15 [00:01<00:00, 10.97it/s]
0%| | 0/11 [00:00<?, ?it/s]
18%|█▊ | 2/11 [00:00<00:01, 4.58it/s]
27%|██▋ | 3/11 [00:00<00:02, 3.80it/s]
36%|███▋ | 4/11 [00:01<00:01, 3.69it/s]
45%|████▌ | 5/11 [00:01<00:01, 4.03it/s]
55%|█████▍ | 6/11 [00:01<00:01, 4.22it/s]
64%|██████▎ | 7/11 [00:01<00:00, 4.12it/s]
73%|███████▎ | 8/11 [00:01<00:00, 4.24it/s]
82%|████████▏ | 9/11 [00:02<00:00, 4.37it/s]
91%|█████████ | 10/11 [00:02<00:00, 4.55it/s]
100%|██████████| 11/11 [00:02<00:00, 4.57it/s]
100%|██████████| 11/11 [00:02<00:00, 4.27it/s]
ROC-AUC on test split#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve.pdf
Data used to plot ROC:
| ALD study all | VAE all | VAE new | ||||
|---|---|---|---|---|---|---|
| fpr | tpr | fpr | tpr | fpr | tpr | |
| 0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 1 | 0.000 | 0.056 | 0.000 | 0.056 | 0.000 | 0.056 |
| 2 | 0.000 | 0.611 | 0.000 | 0.556 | 0.042 | 0.056 |
| 3 | 0.042 | 0.611 | 0.083 | 0.556 | 0.042 | 0.167 |
| 4 | 0.042 | 0.833 | 0.083 | 0.611 | 0.083 | 0.167 |
| 5 | 0.167 | 0.833 | 0.250 | 0.611 | 0.083 | 0.222 |
| 6 | 0.167 | 0.889 | 0.250 | 0.722 | 0.125 | 0.222 |
| 7 | 0.542 | 0.889 | 0.417 | 0.722 | 0.125 | 0.278 |
| 8 | 0.542 | 0.944 | 0.417 | 0.833 | 0.167 | 0.278 |
| 9 | 0.583 | 0.944 | 0.458 | 0.833 | 0.167 | 0.389 |
| 10 | 0.583 | 1.000 | 0.458 | 0.889 | 0.250 | 0.389 |
| 11 | 1.000 | 1.000 | 0.583 | 0.889 | 0.250 | 0.500 |
| 12 | NaN | NaN | 0.583 | 0.944 | 0.292 | 0.500 |
| 13 | NaN | NaN | 0.667 | 0.944 | 0.292 | 0.556 |
| 14 | NaN | NaN | 0.667 | 1.000 | 0.375 | 0.556 |
| 15 | NaN | NaN | 1.000 | 1.000 | 0.375 | 0.722 |
| 16 | NaN | NaN | NaN | NaN | 0.458 | 0.722 |
| 17 | NaN | NaN | NaN | NaN | 0.458 | 0.778 |
| 18 | NaN | NaN | NaN | NaN | 0.500 | 0.778 |
| 19 | NaN | NaN | NaN | NaN | 0.500 | 0.833 |
| 20 | NaN | NaN | NaN | NaN | 0.583 | 0.833 |
| 21 | NaN | NaN | NaN | NaN | 0.583 | 0.889 |
| 22 | NaN | NaN | NaN | NaN | 0.667 | 0.889 |
| 23 | NaN | NaN | NaN | NaN | 0.667 | 0.944 |
| 24 | NaN | NaN | NaN | NaN | 0.750 | 0.944 |
| 25 | NaN | NaN | NaN | NaN | 0.750 | 1.000 |
| 26 | NaN | NaN | NaN | NaN | 1.000 | 1.000 |
Features selected for final models#
| ALD study all | VAE all | VAE new | |
|---|---|---|---|
| rank | |||
| 0 | P10636-2;P10636-6 | P10636-2;P10636-6 | Q14894 |
| 1 | K7ER15;Q9H0R4;Q9H0R4-2 | Q8NBI6 | P04040 |
| 2 | P02741 | Q16674;W4VSR3 | P42262;P42262-2;P42262-3 |
| 3 | P61981 | P61981 | P51688 |
| 4 | P04075 | Q9Y2T3;Q9Y2T3-3 | P31321 |
| 5 | P14174 | P15151-2 | F8WBF9;Q5TH30;Q9UGV2;Q9UGV2-2;Q9UGV2-3 |
| 6 | Q9Y2T3;Q9Y2T3-3 | P04075 | A0A0C4DGV4;E9PLX3;O43504;R4GMU8 |
| 7 | P08294 | P14174 | Q96GD0 |
| 8 | P00338;P00338-3 | Q14894 | Q9NUQ9 |
| 9 | P14618 | P63104 | A0A075B7B8 |
| 10 | Q6EMK4 | P00492 | O95297;O95297-2;O95297-3;O95297-4;Q9UEL6 |
| 11 | None | P00338;P00338-3 | E9PK25;G3V1A4;P23528 |
| 12 | None | Q6EMK4 | F2Z2C8;Q9BVH7 |
| 13 | None | None | P01704 |
| 14 | None | None | O95497 |
Precision-Recall plot on test data#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve.pdf
Data used to plot PRC:
| ALD study all | VAE all | VAE new | ||||
|---|---|---|---|---|---|---|
| precision | tpr | precision | tpr | precision | tpr | |
| 0 | 0.429 | 1.000 | 0.429 | 1.000 | 0.429 | 1.000 |
| 1 | 0.439 | 1.000 | 0.439 | 1.000 | 0.439 | 1.000 |
| 2 | 0.450 | 1.000 | 0.450 | 1.000 | 0.450 | 1.000 |
| 3 | 0.462 | 1.000 | 0.462 | 1.000 | 0.462 | 1.000 |
| 4 | 0.474 | 1.000 | 0.474 | 1.000 | 0.474 | 1.000 |
| 5 | 0.486 | 1.000 | 0.486 | 1.000 | 0.486 | 1.000 |
| 6 | 0.500 | 1.000 | 0.500 | 1.000 | 0.500 | 1.000 |
| 7 | 0.514 | 1.000 | 0.514 | 1.000 | 0.486 | 0.944 |
| 8 | 0.529 | 1.000 | 0.529 | 1.000 | 0.500 | 0.944 |
| 9 | 0.545 | 1.000 | 0.515 | 0.944 | 0.515 | 0.944 |
| 10 | 0.562 | 1.000 | 0.531 | 0.944 | 0.500 | 0.889 |
| 11 | 0.548 | 0.944 | 0.548 | 0.944 | 0.516 | 0.889 |
| 12 | 0.567 | 0.944 | 0.533 | 0.889 | 0.533 | 0.889 |
| 13 | 0.552 | 0.889 | 0.552 | 0.889 | 0.517 | 0.833 |
| 14 | 0.571 | 0.889 | 0.571 | 0.889 | 0.536 | 0.833 |
| 15 | 0.593 | 0.889 | 0.593 | 0.889 | 0.556 | 0.833 |
| 16 | 0.615 | 0.889 | 0.577 | 0.833 | 0.538 | 0.778 |
| 17 | 0.640 | 0.889 | 0.600 | 0.833 | 0.560 | 0.778 |
| 18 | 0.667 | 0.889 | 0.583 | 0.778 | 0.542 | 0.722 |
| 19 | 0.696 | 0.889 | 0.565 | 0.722 | 0.565 | 0.722 |
| 20 | 0.727 | 0.889 | 0.591 | 0.722 | 0.591 | 0.722 |
| 21 | 0.762 | 0.889 | 0.619 | 0.722 | 0.571 | 0.667 |
| 22 | 0.800 | 0.889 | 0.650 | 0.722 | 0.550 | 0.611 |
| 23 | 0.789 | 0.833 | 0.684 | 0.722 | 0.526 | 0.556 |
| 24 | 0.833 | 0.833 | 0.667 | 0.667 | 0.556 | 0.556 |
| 25 | 0.882 | 0.833 | 0.647 | 0.611 | 0.588 | 0.556 |
| 26 | 0.938 | 0.833 | 0.688 | 0.611 | 0.562 | 0.500 |
| 27 | 0.933 | 0.778 | 0.733 | 0.611 | 0.600 | 0.500 |
| 28 | 0.929 | 0.722 | 0.786 | 0.611 | 0.571 | 0.444 |
| 29 | 0.923 | 0.667 | 0.846 | 0.611 | 0.538 | 0.389 |
| 30 | 0.917 | 0.611 | 0.833 | 0.556 | 0.583 | 0.389 |
| 31 | 1.000 | 0.611 | 0.909 | 0.556 | 0.636 | 0.389 |
| 32 | 1.000 | 0.556 | 1.000 | 0.556 | 0.600 | 0.333 |
| 33 | 1.000 | 0.500 | 1.000 | 0.500 | 0.556 | 0.278 |
| 34 | 1.000 | 0.444 | 1.000 | 0.444 | 0.625 | 0.278 |
| 35 | 1.000 | 0.389 | 1.000 | 0.389 | 0.571 | 0.222 |
| 36 | 1.000 | 0.333 | 1.000 | 0.333 | 0.667 | 0.222 |
| 37 | 1.000 | 0.278 | 1.000 | 0.278 | 0.600 | 0.167 |
| 38 | 1.000 | 0.222 | 1.000 | 0.222 | 0.750 | 0.167 |
| 39 | 1.000 | 0.167 | 1.000 | 0.167 | 0.667 | 0.111 |
| 40 | 1.000 | 0.111 | 1.000 | 0.111 | 0.500 | 0.056 |
| 41 | 1.000 | 0.056 | 1.000 | 0.056 | 1.000 | 0.056 |
| 42 | 1.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 |
Train data plots#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve_train.pdf
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve_train.pdf
Output files:
{'results_VAE all.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_VAE all.pkl'),
'results_VAE new.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_VAE new.pkl'),
'results_ALD study all.pkl': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/results_ALD study all.pkl'),
'auc_roc_curve.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve.pdf'),
'mrmr_feat_by_model.xlsx': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/mrmr_feat_by_model.xlsx'),
'prec_recall_curve.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve.pdf'),
'prec_recall_curve_train.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/prec_recall_curve_train.pdf'),
'auc_roc_curve_train.pdf': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/auc_roc_curve_train.pdf')}