Compare outcomes from differential analysis based on different imputation methods#
load scores based on
10_1_ald_diff_analysis
Parameters#
Default and set parameters for the notebook.
folder_experiment = 'runs/appl_ald_data/plasma/proteinGroups'
target = 'kleiner'
model_key = 'VAE'
baseline = 'RSN'
out_folder = 'diff_analysis'
selected_statistics = ['p-unc', '-Log10 pvalue', 'qvalue', 'rejected']
disease_ontology = 5082 # code from https://disease-ontology.org/
# split diseases notebook? Query gene names for proteins in file from uniprot?
annotaitons_gene_col = 'PG.Genes'
# Parameters
disease_ontology = 10652
folder_experiment = "runs/alzheimer_study"
target = "AD"
baseline = "PI"
model_key = "QRILC"
out_folder = "diff_analysis"
annotaitons_gene_col = "None"
Add set parameters to configuration
root - INFO Removed from global namespace: folder_experiment
root - INFO Removed from global namespace: target
root - INFO Removed from global namespace: model_key
root - INFO Removed from global namespace: baseline
root - INFO Removed from global namespace: out_folder
root - INFO Removed from global namespace: selected_statistics
root - INFO Removed from global namespace: disease_ontology
root - INFO Removed from global namespace: annotaitons_gene_col
root - INFO Already set attribute: folder_experiment has value runs/alzheimer_study
root - INFO Already set attribute: out_folder has value diff_analysis
{'annotaitons_gene_col': 'None',
'baseline': 'PI',
'data': PosixPath('runs/alzheimer_study/data'),
'disease_ontology': 10652,
'folder_experiment': PosixPath('runs/alzheimer_study'),
'freq_features_observed': PosixPath('runs/alzheimer_study/freq_features_observed.csv'),
'model_key': 'QRILC',
'out_figures': PosixPath('runs/alzheimer_study/figures'),
'out_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_QRILC'),
'out_metrics': PosixPath('runs/alzheimer_study'),
'out_models': PosixPath('runs/alzheimer_study'),
'out_preds': PosixPath('runs/alzheimer_study/preds'),
'scores_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/scores'),
'selected_statistics': ['p-unc', '-Log10 pvalue', 'qvalue', 'rejected'],
'target': 'AD'}
Excel file for exports#
files_out = dict()
writer_args = dict(float_format='%.3f')
fname = args.out_folder / 'diff_analysis_compare_methods.xlsx'
files_out[fname.name] = fname
writer = pd.ExcelWriter(fname)
logger.info("Writing to excel file: %s", fname)
root - INFO Writing to excel file: runs/alzheimer_study/diff_analysis/AD/PI_vs_QRILC/diff_analysis_compare_methods.xlsx
Load scores#
Load baseline model scores#
Show all statistics, later use selected statistics
| model | PI | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| var | SS | DF | F | p-unc | np2 | -Log10 pvalue | qvalue | rejected | |
| protein groups | Source | ||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 0.833 | 1 | 1.390 | 0.240 | 0.007 | 0.620 | 0.397 | False |
| age | 0.147 | 1 | 0.246 | 0.620 | 0.001 | 0.207 | 0.750 | False | |
| Kiel | 2.439 | 1 | 4.072 | 0.045 | 0.021 | 1.347 | 0.112 | False | |
| Magdeburg | 4.762 | 1 | 7.949 | 0.005 | 0.040 | 2.274 | 0.020 | True | |
| Sweden | 8.268 | 1 | 13.800 | 0.000 | 0.067 | 3.574 | 0.002 | True | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| S4R3U6 | AD | 0.531 | 1 | 0.525 | 0.469 | 0.003 | 0.328 | 0.623 | False |
| age | 1.792 | 1 | 1.774 | 0.185 | 0.009 | 0.734 | 0.328 | False | |
| Kiel | 0.002 | 1 | 0.002 | 0.961 | 0.000 | 0.017 | 0.976 | False | |
| Magdeburg | 2.675 | 1 | 2.648 | 0.105 | 0.014 | 0.978 | 0.218 | False | |
| Sweden | 15.376 | 1 | 15.221 | 0.000 | 0.074 | 3.878 | 0.001 | True | |
7105 rows × 8 columns
Load selected comparison model scores#
| model | QRILC | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| var | SS | DF | F | p-unc | np2 | -Log10 pvalue | qvalue | rejected | |
| protein groups | Source | ||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 0.741 | 1 | 4.819 | 0.029 | 0.025 | 1.532 | 0.075 | False |
| age | 0.010 | 1 | 0.062 | 0.803 | 0.000 | 0.095 | 0.871 | False | |
| Kiel | 0.394 | 1 | 2.560 | 0.111 | 0.013 | 0.954 | 0.214 | False | |
| Magdeburg | 0.877 | 1 | 5.702 | 0.018 | 0.029 | 1.747 | 0.050 | True | |
| Sweden | 2.352 | 1 | 15.296 | 0.000 | 0.074 | 3.894 | 0.001 | True | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| S4R3U6 | AD | 7.824 | 1 | 3.739 | 0.055 | 0.019 | 1.262 | 0.122 | False |
| age | 0.011 | 1 | 0.005 | 0.941 | 0.000 | 0.026 | 0.965 | False | |
| Kiel | 6.913 | 1 | 3.303 | 0.071 | 0.017 | 1.150 | 0.150 | False | |
| Magdeburg | 21.543 | 1 | 10.294 | 0.002 | 0.051 | 2.805 | 0.006 | True | |
| Sweden | 0.048 | 1 | 0.023 | 0.879 | 0.000 | 0.056 | 0.926 | False | |
7105 rows × 8 columns
Combined scores#
show only selected statistics for comparsion
| model | PI | QRILC | |||||||
|---|---|---|---|---|---|---|---|---|---|
| var | p-unc | -Log10 pvalue | qvalue | rejected | p-unc | -Log10 pvalue | qvalue | rejected | |
| protein groups | Source | ||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 0.240 | 0.620 | 0.397 | False | 0.029 | 1.532 | 0.075 | False |
| Kiel | 0.045 | 1.347 | 0.112 | False | 0.111 | 0.954 | 0.214 | False | |
| Magdeburg | 0.005 | 2.274 | 0.020 | True | 0.018 | 1.747 | 0.050 | True | |
| Sweden | 0.000 | 3.574 | 0.002 | True | 0.000 | 3.894 | 0.001 | True | |
| age | 0.620 | 0.207 | 0.750 | False | 0.803 | 0.095 | 0.871 | False | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| S4R3U6 | AD | 0.469 | 0.328 | 0.623 | False | 0.055 | 1.262 | 0.122 | False |
| Kiel | 0.961 | 0.017 | 0.976 | False | 0.071 | 1.150 | 0.150 | False | |
| Magdeburg | 0.105 | 0.978 | 0.218 | False | 0.002 | 2.805 | 0.006 | True | |
| Sweden | 0.000 | 3.878 | 0.001 | True | 0.879 | 0.056 | 0.926 | False | |
| age | 0.185 | 0.734 | 0.328 | False | 0.941 | 0.026 | 0.965 | False | |
7105 rows × 8 columns
Models in comparison (name mapping)
{'PI': 'PI', 'QRILC': 'QRILC'}
Describe scores#
| model | PI | QRILC | ||||
|---|---|---|---|---|---|---|
| var | p-unc | -Log10 pvalue | qvalue | p-unc | -Log10 pvalue | qvalue |
| count | 7,105.000 | 7,105.000 | 7,105.000 | 7,105.000 | 7,105.000 | 7,105.000 |
| mean | 0.261 | 2.476 | 0.338 | 0.244 | 2.748 | 0.310 |
| std | 0.303 | 5.328 | 0.331 | 0.296 | 5.185 | 0.323 |
| min | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| 25% | 0.004 | 0.335 | 0.016 | 0.002 | 0.362 | 0.008 |
| 50% | 0.121 | 0.916 | 0.243 | 0.092 | 1.035 | 0.185 |
| 75% | 0.463 | 2.411 | 0.617 | 0.434 | 2.710 | 0.579 |
| max | 0.999 | 146.241 | 0.999 | 0.999 | 82.541 | 0.999 |
One to one comparison of by feature:#
/tmp/ipykernel_88988/3761369923.py:2: FutureWarning: Starting with pandas version 3.0 all arguments of to_excel except for the argument 'excel_writer' will be keyword-only.
scores.to_excel(writer, 'scores', **writer_args)
| model | PI | QRILC | |||||||
|---|---|---|---|---|---|---|---|---|---|
| var | p-unc | -Log10 pvalue | qvalue | rejected | p-unc | -Log10 pvalue | qvalue | rejected | |
| protein groups | Source | ||||||||
| A0A024QZX5;A0A087X1N8;P35237 | AD | 0.240 | 0.620 | 0.397 | False | 0.029 | 1.532 | 0.075 | False |
| A0A024R0T9;K7ER74;P02655 | AD | 0.059 | 1.228 | 0.139 | False | 0.035 | 1.459 | 0.085 | False |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | AD | 0.040 | 1.393 | 0.103 | False | 0.305 | 0.516 | 0.454 | False |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | AD | 0.420 | 0.377 | 0.580 | False | 0.303 | 0.518 | 0.453 | False |
| A0A075B6H7 | AD | 0.027 | 1.567 | 0.075 | False | 0.165 | 0.783 | 0.288 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Q9Y6R7 | AD | 0.175 | 0.756 | 0.316 | False | 0.175 | 0.756 | 0.302 | False |
| Q9Y6X5 | AD | 0.070 | 1.155 | 0.159 | False | 0.101 | 0.994 | 0.199 | False |
| Q9Y6Y8;Q9Y6Y8-2 | AD | 0.083 | 1.079 | 0.182 | False | 0.083 | 1.079 | 0.171 | False |
| Q9Y6Y9 | AD | 0.348 | 0.459 | 0.512 | False | 0.832 | 0.080 | 0.891 | False |
| S4R3U6 | AD | 0.469 | 0.328 | 0.623 | False | 0.055 | 1.262 | 0.122 | False |
1421 rows × 8 columns
And the descriptive statistics of the numeric values:
| model | PI | QRILC | ||||
|---|---|---|---|---|---|---|
| var | p-unc | -Log10 pvalue | qvalue | p-unc | -Log10 pvalue | qvalue |
| count | 1,421.000 | 1,421.000 | 1,421.000 | 1,421.000 | 1,421.000 | 1,421.000 |
| mean | 0.252 | 1.411 | 0.334 | 0.248 | 1.495 | 0.320 |
| std | 0.291 | 1.654 | 0.316 | 0.291 | 1.785 | 0.314 |
| min | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 |
| 25% | 0.013 | 0.368 | 0.041 | 0.009 | 0.365 | 0.028 |
| 50% | 0.121 | 0.917 | 0.242 | 0.105 | 0.978 | 0.205 |
| 75% | 0.428 | 1.899 | 0.588 | 0.431 | 2.046 | 0.576 |
| max | 0.999 | 25.104 | 0.999 | 0.999 | 25.180 | 0.999 |
and the boolean decision values
| model | PI | QRILC |
|---|---|---|
| var | rejected | rejected |
| count | 1421 | 1421 |
| unique | 2 | 2 |
| top | False | False |
| freq | 1031 | 999 |
Load frequencies of observed features#
| data | |
|---|---|
| frequency | |
| protein groups | |
| A0A024QZX5;A0A087X1N8;P35237 | 186 |
| A0A024R0T9;K7ER74;P02655 | 195 |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | 174 |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | 196 |
| A0A075B6H7 | 91 |
| ... | ... |
| Q9Y6R7 | 197 |
| Q9Y6X5 | 173 |
| Q9Y6Y8;Q9Y6Y8-2 | 197 |
| Q9Y6Y9 | 119 |
| S4R3U6 | 126 |
1421 rows × 1 columns
Plot qvalues of both models with annotated decisions#
Prepare data for plotting (qvalues)
| PI | QRILC | frequency | Differential Analysis Comparison | |
|---|---|---|---|---|
| protein groups | ||||
| A0A024QZX5;A0A087X1N8;P35237 | 0.397 | 0.075 | 186 | PI (no) - QRILC (no) |
| A0A024R0T9;K7ER74;P02655 | 0.139 | 0.085 | 195 | PI (no) - QRILC (no) |
| A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8 | 0.103 | 0.454 | 174 | PI (no) - QRILC (no) |
| A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503 | 0.580 | 0.453 | 196 | PI (no) - QRILC (no) |
| A0A075B6H7 | 0.075 | 0.288 | 91 | PI (no) - QRILC (no) |
| ... | ... | ... | ... | ... |
| Q9Y6R7 | 0.316 | 0.302 | 197 | PI (no) - QRILC (no) |
| Q9Y6X5 | 0.159 | 0.199 | 173 | PI (no) - QRILC (no) |
| Q9Y6Y8;Q9Y6Y8-2 | 0.182 | 0.171 | 197 | PI (no) - QRILC (no) |
| Q9Y6Y9 | 0.512 | 0.891 | 119 | PI (no) - QRILC (no) |
| S4R3U6 | 0.623 | 0.122 | 126 | PI (no) - QRILC (no) |
1421 rows × 4 columns
List of features with the highest difference in qvalues
| PI | QRILC | frequency | Differential Analysis Comparison | diff_qvalue | |
|---|---|---|---|---|---|
| protein groups | |||||
| J3KSJ8;Q9UD71;Q9UD71-2 | 0.844 | 0.004 | 51 | PI (no) - QRILC (yes) | 0.840 |
| E7EN89;E9PP67;E9PQ25;F2Z2Y8;Q9H0E2;Q9H0E2-2 | 0.821 | 0.016 | 86 | PI (no) - QRILC (yes) | 0.806 |
| P43004;P43004-2;P43004-3 | 0.558 | 0.030 | 89 | PI (no) - QRILC (yes) | 0.528 |
| A0A1W2PQ94;B4DS77;B4DS77-2;B4DS77-3 | 0.499 | 0.015 | 69 | PI (no) - QRILC (yes) | 0.484 |
| F6SYF8;Q9UBP4 | 0.426 | 0.006 | 196 | PI (no) - QRILC (yes) | 0.420 |
| ... | ... | ... | ... | ... | ... |
| P04080 | 0.058 | 0.040 | 143 | PI (no) - QRILC (yes) | 0.018 |
| Q8IUK8 | 0.058 | 0.045 | 191 | PI (no) - QRILC (yes) | 0.013 |
| K7ERI9;P02654 | 0.042 | 0.051 | 196 | PI (yes) - QRILC (no) | 0.009 |
| P00740;P00740-2 | 0.053 | 0.048 | 197 | PI (no) - QRILC (yes) | 0.004 |
| K7ERG9;P00746 | 0.052 | 0.048 | 197 | PI (no) - QRILC (yes) | 0.004 |
100 rows × 5 columns
Differences plotted with created annotations#
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_QRILC/diff_analysis_comparision_1_QRILC
also showing how many features were measured (“observed”) by size of circle
pimmslearn.plotting - INFO Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_QRILC/diff_analysis_comparision_2_QRILC
Only features contained in model#
this block exist due to a specific part in the ALD analysis of the paper
root - INFO No features only in new comparision model.
DISEASES DB lookup#
Query diseases database for gene associations with specified disease ontology id.
pimmslearn.databases.diseases - WARNING There are more associations available
| ENSP | score | |
|---|---|---|
| None | ||
| APP | ENSP00000284981 | 5.000 |
| PSEN2 | ENSP00000355747 | 5.000 |
| PSEN1 | ENSP00000326366 | 5.000 |
| APOE | ENSP00000252486 | 5.000 |
| TREM2 | ENSP00000362205 | 4.825 |
| ... | ... | ... |
| PTTG1 | ENSP00000377536 | 0.682 |
| ISL2 | ENSP00000290759 | 0.682 |
| hsa-miR-4433b-3p | hsa-miR-4433b-3p | 0.682 |
| NEURL1B | ENSP00000358815 | 0.681 |
| SLC26A4 | ENSP00000494017 | 0.681 |
10000 rows × 2 columns
only by model#
Only by model which were significant#
Only significant by RSN#
mask = (scores_common[(str(args.baseline), 'rejected')] & mask_different)
mask.sum()