Compare predictions between model and RSN#
see differences in imputation for diverging cases
dumps top5
Parameters#
folder_experiment = 'runs/appl_ald_data/plasma/proteinGroups'
fn_clinical_data = "data/ALD_study/processed/ald_metadata_cli.csv"
make_plots = True # create histograms and swarmplots of diverging results
model_key = 'VAE'
sample_id_col = 'Sample ID'
target = 'kleiner'
cutoff_target: int = 2 # => for binarization target >= cutoff_target
out_folder = 'diff_analysis'
file_format = 'csv'
baseline = 'RSN' # default is RSN, but could be any other trained model
template_pred = 'pred_real_na_{}.csv' # fixed, do not change
ref_method_score = None # filepath to reference method score
# Parameters
cutoff_target = 0.5
make_plots = False
ref_method_score = None
folder_experiment = "runs/alzheimer_study"
target = "AD"
baseline = "PI"
out_folder = "diff_analysis"
fn_clinical_data = "runs/alzheimer_study/data/clinical_data.csv"
root - INFO Removed from global namespace: folder_experiment
root - INFO Removed from global namespace: fn_clinical_data
root - INFO Removed from global namespace: make_plots
root - INFO Removed from global namespace: model_key
root - INFO Removed from global namespace: sample_id_col
root - INFO Removed from global namespace: target
root - INFO Removed from global namespace: cutoff_target
root - INFO Removed from global namespace: out_folder
root - INFO Removed from global namespace: file_format
root - INFO Removed from global namespace: baseline
root - INFO Removed from global namespace: template_pred
root - INFO Removed from global namespace: ref_method_score
root - INFO Already set attribute: folder_experiment has value runs/alzheimer_study
root - INFO Already set attribute: out_folder has value diff_analysis
{'baseline': 'PI',
'cutoff_target': 0.5,
'data': PosixPath('runs/alzheimer_study/data'),
'file_format': 'csv',
'fn_clinical_data': 'runs/alzheimer_study/data/clinical_data.csv',
'folder_experiment': PosixPath('runs/alzheimer_study'),
'folder_scores': PosixPath('runs/alzheimer_study/diff_analysis/AD/scores'),
'make_plots': False,
'model_key': 'VAE',
'out_figures': PosixPath('runs/alzheimer_study/figures'),
'out_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD'),
'out_metrics': PosixPath('runs/alzheimer_study'),
'out_models': PosixPath('runs/alzheimer_study'),
'out_preds': PosixPath('runs/alzheimer_study/preds'),
'ref_method_score': None,
'sample_id_col': 'Sample ID',
'target': 'AD',
'template_pred': 'pred_real_na_{}.csv'}
Write outputs to excel
root - INFO Writing to excel file: runs/alzheimer_study/diff_analysis/AD/diff_analysis_compare_DA.xlsx
Load scores#
List dump of scores:
[PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_VAE.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_RF.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_DAE.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_QRILC.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_TRKNN.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_PI.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_Median.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_CF.pkl'),
PosixPath('runs/alzheimer_study/diff_analysis/AD/scores/diff_analysis_scores_None.pkl')]
Load scores from dumps:
| model | VAE | RF | ... | CF | None | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| var | SS | DF | F | p-unc | np2 | -Log10 pvalue | qvalue | rejected | SS | DF | ... | qvalue | rejected | SS | DF | F | p-unc | np2 | -Log10 pvalue | qvalue | rejected | |
| protein groups | Source | |||||||||||||||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 1.029 | 1 | 7.334 | 0.007 | 0.037 | 2.132 | 0.021 | True | 0.989 | 1 | ... | 0.019 | True | 0.834 | 1.000 | 6.088 | 0.015 | 0.033 | 1.837 | 0.043 | True |
| age | 0.011 | 1 | 0.080 | 0.777 | 0.000 | 0.109 | 0.851 | False | 0.003 | 1 | ... | 0.937 | False | 0.002 | 1.000 | 0.015 | 0.903 | 0.000 | 0.044 | 0.943 | False | |
| Kiel | 0.318 | 1 | 2.267 | 0.134 | 0.012 | 0.873 | 0.229 | False | 0.210 | 1 | ... | 0.308 | False | 0.145 | 1.000 | 1.061 | 0.304 | 0.006 | 0.517 | 0.461 | False | |
| Magdeburg | 0.534 | 1 | 3.806 | 0.053 | 0.020 | 1.280 | 0.108 | False | 0.389 | 1 | ... | 0.139 | False | 0.273 | 1.000 | 1.996 | 0.159 | 0.011 | 0.797 | 0.286 | False | |
| Sweden | 1.814 | 1 | 12.934 | 0.000 | 0.063 | 3.386 | 0.002 | True | 1.494 | 1 | ... | 0.002 | True | 1.209 | 1.000 | 8.827 | 0.003 | 0.047 | 2.472 | 0.013 | True | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| S4R3U6 | AD | 1.769 | 1 | 3.573 | 0.060 | 0.018 | 1.220 | 0.121 | False | 1.173 | 1 | ... | 0.190 | False | 0.095 | 1.000 | 0.151 | 0.698 | 0.001 | 0.156 | 0.803 | False |
| age | 0.624 | 1 | 1.260 | 0.263 | 0.007 | 0.580 | 0.389 | False | 0.686 | 1 | ... | 0.265 | False | 1.370 | 1.000 | 2.171 | 0.143 | 0.018 | 0.844 | 0.265 | False | |
| Kiel | 2.754 | 1 | 5.562 | 0.019 | 0.028 | 1.713 | 0.047 | True | 2.153 | 1 | ... | 0.142 | False | 1.396 | 1.000 | 2.213 | 0.139 | 0.018 | 0.856 | 0.259 | False | |
| Magdeburg | 2.388 | 1 | 4.821 | 0.029 | 0.025 | 1.533 | 0.066 | False | 1.711 | 1 | ... | 0.217 | False | 0.556 | 1.000 | 0.882 | 0.350 | 0.007 | 0.456 | 0.507 | False | |
| Sweden | 19.067 | 1 | 38.503 | 0.000 | 0.168 | 8.479 | 0.000 | True | 14.171 | 1 | ... | 0.000 | True | 8.519 | 1.000 | 13.502 | 0.000 | 0.101 | 3.447 | 0.002 | True | |
7105 rows × 72 columns
If reference dump is provided, add it to the scores
Load frequencies of observed features#
| data | |
|---|---|
| frequency | |
| protein groups | |
| A0A024QZX5;A0A087X1N8;P35237 | 186 |
| A0A024R0T9;K7ER74;P02655 | 195 |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | 174 |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | 196 |
| A0A075B6H7 | 91 |
| ... | ... |
| Q9Y6R7 | 197 |
| Q9Y6X5 | 173 |
| Q9Y6Y8;Q9Y6Y8-2 | 197 |
| Q9Y6Y9 | 119 |
| S4R3U6 | 126 |
1421 rows × 1 columns
Assemble qvalues#
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| qvalue | qvalue | qvalue | qvalue | qvalue | qvalue | qvalue | qvalue | qvalue | |||
| protein groups | Source | frequency | |||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 186 | 0.021 | 0.021 | 0.014 | 0.094 | 0.023 | 0.502 | 0.039 | 0.019 | 0.043 |
| A0A024R0T9;K7ER74;P02655 | AD | 195 | 0.072 | 0.069 | 0.069 | 0.071 | 0.071 | 0.109 | 0.087 | 0.076 | 0.092 |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | AD | 174 | 0.403 | 0.502 | 0.468 | 0.460 | 0.394 | 0.161 | 0.832 | 0.467 | 0.586 |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | AD | 196 | 0.383 | 0.394 | 0.375 | 0.446 | 0.396 | 0.724 | 0.418 | 0.385 | 0.404 |
| A0A075B6H7 | AD | 91 | 0.012 | 0.009 | 0.027 | 0.277 | 0.048 | 0.227 | 0.124 | 0.004 | 0.027 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Q9Y6R7 | AD | 197 | 0.283 | 0.292 | 0.282 | 0.301 | 0.289 | 0.318 | 0.315 | 0.286 | 0.307 |
| Q9Y6X5 | AD | 173 | 0.361 | 0.289 | 0.341 | 0.107 | 0.205 | 0.086 | 0.455 | 0.261 | 0.501 |
| Q9Y6Y8;Q9Y6Y8-2 | AD | 197 | 0.157 | 0.162 | 0.156 | 0.171 | 0.160 | 0.182 | 0.178 | 0.159 | 0.174 |
| Q9Y6Y9 | AD | 119 | 0.936 | 0.550 | 0.874 | 0.797 | 0.472 | 0.572 | 0.667 | 0.600 | 0.651 |
| S4R3U6 | AD | 126 | 0.121 | 0.205 | 0.049 | 0.654 | 0.080 | 0.538 | 0.829 | 0.190 | 0.803 |
1421 rows × 9 columns
Assemble pvalues#
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| p-unc | p-unc | p-unc | p-unc | p-unc | p-unc | p-unc | p-unc | p-unc | |||
| protein groups | Source | frequency | |||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 186 | 0.007 | 0.007 | 0.005 | 0.039 | 0.008 | 0.336 | 0.012 | 0.006 | 0.015 |
| A0A024R0T9;K7ER74;P02655 | AD | 195 | 0.032 | 0.029 | 0.030 | 0.028 | 0.031 | 0.043 | 0.033 | 0.034 | 0.037 |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | AD | 174 | 0.277 | 0.363 | 0.339 | 0.311 | 0.264 | 0.072 | 0.736 | 0.331 | 0.432 |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | AD | 196 | 0.258 | 0.259 | 0.252 | 0.297 | 0.266 | 0.586 | 0.259 | 0.257 | 0.254 |
| A0A075B6H7 | AD | 91 | 0.004 | 0.002 | 0.010 | 0.158 | 0.020 | 0.111 | 0.053 | 0.001 | 0.008 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Q9Y6R7 | AD | 197 | 0.175 | 0.175 | 0.175 | 0.175 | 0.175 | 0.175 | 0.175 | 0.175 | 0.175 |
| Q9Y6X5 | AD | 173 | 0.239 | 0.173 | 0.223 | 0.046 | 0.113 | 0.032 | 0.291 | 0.156 | 0.344 |
| Q9Y6Y8;Q9Y6Y8-2 | AD | 197 | 0.083 | 0.083 | 0.083 | 0.083 | 0.083 | 0.083 | 0.083 | 0.083 | 0.083 |
| Q9Y6Y9 | AD | 119 | 0.896 | 0.412 | 0.812 | 0.697 | 0.334 | 0.409 | 0.520 | 0.473 | 0.505 |
| S4R3U6 | AD | 126 | 0.060 | 0.112 | 0.021 | 0.519 | 0.036 | 0.373 | 0.730 | 0.104 | 0.698 |
1421 rows × 9 columns
Assemble rejected features#
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None | |
|---|---|---|---|---|---|---|---|---|---|
| False | 935 | 965 | 933 | 992 | 936 | 1,025 | 1,069 | 958 | 1,054 |
| True | 486 | 456 | 488 | 429 | 485 | 396 | 352 | 463 | 367 |
Tabulate rejected decisions by method:#
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None | |
|---|---|---|---|---|---|---|---|---|---|
| False | 935 | 965 | 933 | 992 | 936 | 1,025 | 1,069 | 958 | 1,054 |
| True | 486 | 456 | 488 | 429 | 485 | 396 | 352 | 463 | 367 |
Tabulate rejected decisions by method for newly included features (if available)#
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None |
|---|
Tabulate rejected decisions by method for all features#
root - INFO Written to sheet 'equality_rejected_all' in excel file.
| VAE | RF | DAE | QRILC | TRKNN | PI | Median | CF | None | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| rejected | rejected | rejected | rejected | rejected | rejected | rejected | rejected | rejected | |||
| protein groups | Source | frequency | |||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 186 | True | True | True | False | True | False | True | True | True |
| A0A024R0T9;K7ER74;P02655 | AD | 195 | False | False | False | False | False | False | False | False | False |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | AD | 174 | False | False | False | False | False | False | False | False | False |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | AD | 196 | False | False | False | False | False | False | False | False | False |
| A0A075B6H7 | AD | 91 | True | True | True | False | True | False | False | True | True |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Q9Y6R7 | AD | 197 | False | False | False | False | False | False | False | False | False |
| Q9Y6X5 | AD | 173 | False | False | False | False | False | False | False | False | False |
| Q9Y6Y8;Q9Y6Y8-2 | AD | 197 | False | False | False | False | False | False | False | False | False |
| Q9Y6Y9 | AD | 119 | False | False | False | False | False | False | False | False | False |
| S4R3U6 | AD | 126 | False | False | True | False | False | False | False | False | False |
1421 rows × 9 columns
Tabulate number of equal decison by method (True) to the ones with varying
decision depending on the method (False)
True 1,093
False 328
Name: count, dtype: int64
List frequency of features with varying decisions
| frequency | ||
|---|---|---|
| protein groups | Source | |
| A0A024QZX5;A0A087X1N8;P35237 | AD | 186 |
| A0A075B6H7 | AD | 91 |
| A0A075B6H9 | AD | 189 |
| A0A075B6I0 | AD | 194 |
| A0A075B6J9 | AD | 156 |
| ... | ... | ... |
| Q9UPU3 | AD | 163 |
| Q9UQ52 | AD | 188 |
| Q9Y281;Q9Y281-3 | AD | 51 |
| Q9Y6C2 | AD | 119 |
| S4R3U6 | AD | 126 |
328 rows × 1 columns
take only those with different decisions
No new features or no new ones (with diverging decisions.)
Plots for inspecting imputations (for diverging decisions)#
root - WARNING Not plots requested.
/home/runner/work/pimms/pimms/project/.snakemake/conda/924ec7e362d761ecf0807b9074d79999_/lib/python3.12/site-packages/IPython/core/interactiveshell.py:3707: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
An exception has occurred, use %tb to see the full traceback.
SystemExit: 0
Load target#
Measurments#
plot all of the new pgs which are at least once significant which are not already dumped.
RSN prediction are based on all samples mean and std (N=455) as in original study
VAE also trained on all samples (self supervised) One could also reduce the selected data to only the samples with a valid target marker, but this was not done in the original study which considered several different target markers.
RSN : shifted per sample, not per feature!
Load all prediction files and reshape
Once imputation, reduce to target samples only (samples with target score)
Compare with target annotation#
Saved files: