Compare outcomes from differential analysis based on different imputation methods#

load scores based on 10_1_ald_diff_analysis

Parameters#

Default and set parameters for the notebook.

folder_experiment = 'runs/appl_ald_data/plasma/proteinGroups'

target = 'kleiner'
model_key = 'VAE'
baseline = 'RSN'
out_folder = 'diff_analysis'
selected_statistics = ['p-unc', '-Log10 pvalue', 'qvalue', 'rejected']

disease_ontology = 5082  # code from https://disease-ontology.org/
# split diseases notebook? Query gene names for proteins in file from uniprot?
annotaitons_gene_col = 'PG.Genes'

# Parameters
disease_ontology = 10652
folder_experiment = "runs/alzheimer_study"
target = "AD"
baseline = "PI"
model_key = "VAE"
out_folder = "diff_analysis"
annotaitons_gene_col = "None"

Add set parameters to configuration

root - INFO     Removed from global namespace: folder_experiment
root - INFO     Removed from global namespace: target
root - INFO     Removed from global namespace: model_key
root - INFO     Removed from global namespace: baseline
root - INFO     Removed from global namespace: out_folder
root - INFO     Removed from global namespace: selected_statistics
root - INFO     Removed from global namespace: disease_ontology
root - INFO     Removed from global namespace: annotaitons_gene_col
root - INFO     Already set attribute: folder_experiment has value runs/alzheimer_study
root - INFO     Already set attribute: out_folder has value diff_analysis

{'annotaitons_gene_col': 'None',
 'baseline': 'PI',
 'data': PosixPath('runs/alzheimer_study/data'),
 'disease_ontology': 10652,
 'folder_experiment': PosixPath('runs/alzheimer_study'),
 'freq_features_observed': PosixPath('runs/alzheimer_study/freq_features_observed.csv'),
 'model_key': 'VAE',
 'out_figures': PosixPath('runs/alzheimer_study/figures'),
 'out_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE'),
 'out_metrics': PosixPath('runs/alzheimer_study'),
 'out_models': PosixPath('runs/alzheimer_study'),
 'out_preds': PosixPath('runs/alzheimer_study/preds'),
 'scores_folder': PosixPath('runs/alzheimer_study/diff_analysis/AD/scores'),
 'selected_statistics': ['p-unc', '-Log10 pvalue', 'qvalue', 'rejected'],
 'target': 'AD'}

Excel file for exports#

files_out = dict()
writer_args = dict(float_format='%.3f')

fname = args.out_folder / 'diff_analysis_compare_methods.xlsx'
files_out[fname.name] = fname
writer = pd.ExcelWriter(fname)
logger.info("Writing to excel file: %s", fname)

root - INFO     Writing to excel file: runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/diff_analysis_compare_methods.xlsx

Load scores#

Load baseline model scores#

Show all statistics, later use selected statistics

	model	PI
	var	SS	DF	F	p-unc	np2	-Log10 pvalue	qvalue	rejected
protein groups	Source
A0A024QZX5;A0A087X1N8;P35237	AD	0.833	1	1.390	0.240	0.007	0.620	0.397	False
	age	0.147	1	0.246	0.620	0.001	0.207	0.750	False
	Kiel	2.439	1	4.072	0.045	0.021	1.347	0.112	False
	Magdeburg	4.762	1	7.949	0.005	0.040	2.274	0.020	True
	Sweden	8.268	1	13.800	0.000	0.067	3.574	0.002	True
...	...	...	...	...	...	...	...	...	...
S4R3U6	AD	0.531	1	0.525	0.469	0.003	0.328	0.623	False
	age	1.792	1	1.774	0.185	0.009	0.734	0.328	False
	Kiel	0.002	1	0.002	0.961	0.000	0.017	0.976	False
	Magdeburg	2.675	1	2.648	0.105	0.014	0.978	0.218	False
	Sweden	15.376	1	15.221	0.000	0.074	3.878	0.001	True

7105 rows × 8 columns

Load selected comparison model scores#

	model	VAE
	var	SS	DF	F	p-unc	np2	-Log10 pvalue	qvalue	rejected
protein groups	Source
A0A024QZX5;A0A087X1N8;P35237	AD	1.023	1	7.487	0.007	0.038	2.167	0.019	True
	age	0.008	1	0.056	0.814	0.000	0.089	0.877	False
	Kiel	0.270	1	1.978	0.161	0.010	0.793	0.265	False
	Magdeburg	0.463	1	3.388	0.067	0.017	1.172	0.131	False
	Sweden	1.711	1	12.519	0.001	0.062	3.296	0.002	True
...	...	...	...	...	...	...	...	...	...
S4R3U6	AD	1.735	1	3.566	0.060	0.018	1.218	0.120	False
	age	0.707	1	1.453	0.230	0.008	0.639	0.350	False
	Kiel	2.264	1	4.653	0.032	0.024	1.491	0.072	False
	Magdeburg	1.990	1	4.090	0.045	0.021	1.351	0.094	False
	Sweden	17.386	1	35.724	0.000	0.158	7.960	0.000	True

7105 rows × 8 columns

Combined scores#

show only selected statistics for comparsion

	model	PI				VAE
	var	p-unc	-Log10 pvalue	qvalue	rejected	p-unc	-Log10 pvalue	qvalue	rejected
protein groups	Source
A0A024QZX5;A0A087X1N8;P35237	AD	0.240	0.620	0.397	False	0.007	2.167	0.019	True
	Kiel	0.045	1.347	0.112	False	0.161	0.793	0.265	False
	Magdeburg	0.005	2.274	0.020	True	0.067	1.172	0.131	False
	Sweden	0.000	3.574	0.002	True	0.001	3.296	0.002	True
	age	0.620	0.207	0.750	False	0.814	0.089	0.877	False
...	...	...	...	...	...	...	...	...	...
S4R3U6	AD	0.469	0.328	0.623	False	0.060	1.218	0.120	False
	Kiel	0.961	0.017	0.976	False	0.032	1.491	0.072	False
	Magdeburg	0.105	0.978	0.218	False	0.045	1.351	0.094	False
	Sweden	0.000	3.878	0.001	True	0.000	7.960	0.000	True
	age	0.185	0.734	0.328	False	0.230	0.639	0.350	False

7105 rows × 8 columns

Models in comparison (name mapping)

{'PI': 'PI', 'VAE': 'VAE'}

Describe scores#

model	PI			VAE
var	p-unc	-Log10 pvalue	qvalue	p-unc	-Log10 pvalue	qvalue
count	7,105.000	7,105.000	7,105.000	7,105.000	7,105.000	7,105.000
mean	0.261	2.476	0.338	0.223	3.319	0.276
std	0.303	5.328	0.331	0.293	6.233	0.319
min	0.000	0.000	0.000	0.000	0.000	0.000
25%	0.004	0.335	0.016	0.000	0.412	0.002
50%	0.121	0.916	0.243	0.059	1.228	0.118
75%	0.463	2.411	0.617	0.387	3.339	0.516
max	0.999	146.241	0.999	1.000	86.727	1.000

One to one comparison of by feature:#

/tmp/ipykernel_88765/3761369923.py:2: FutureWarning: Starting with pandas version 3.0 all arguments of to_excel except for the argument 'excel_writer' will be keyword-only.
  scores.to_excel(writer, 'scores', **writer_args)

	model	PI				VAE
	var	p-unc	-Log10 pvalue	qvalue	rejected	p-unc	-Log10 pvalue	qvalue	rejected
protein groups	Source
A0A024QZX5;A0A087X1N8;P35237	AD	0.240	0.620	0.397	False	0.007	2.167	0.019	True
A0A024R0T9;K7ER74;P02655	AD	0.059	1.228	0.139	False	0.032	1.497	0.071	False
A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8	AD	0.040	1.393	0.103	False	0.266	0.574	0.392	False
A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503	AD	0.420	0.377	0.580	False	0.249	0.604	0.372	False
A0A075B6H7	AD	0.027	1.567	0.075	False	0.005	2.341	0.014	True
...	...	...	...	...	...	...	...	...	...
Q9Y6R7	AD	0.175	0.756	0.316	False	0.175	0.756	0.283	False
Q9Y6X5	AD	0.070	1.155	0.159	False	0.265	0.577	0.390	False
Q9Y6Y8;Q9Y6Y8-2	AD	0.083	1.079	0.182	False	0.083	1.079	0.156	False
Q9Y6Y9	AD	0.348	0.459	0.512	False	0.403	0.395	0.531	False
S4R3U6	AD	0.469	0.328	0.623	False	0.060	1.218	0.120	False

1421 rows × 8 columns

And the descriptive statistics of the numeric values:

model	PI			VAE
var	p-unc	-Log10 pvalue	qvalue	p-unc	-Log10 pvalue	qvalue
count	1,421.000	1,421.000	1,421.000	1,421.000	1,421.000	1,421.000
mean	0.252	1.411	0.334	0.238	1.591	0.297
std	0.291	1.654	0.316	0.292	1.837	0.315
min	0.000	0.001	0.000	0.000	0.001	0.000
25%	0.013	0.368	0.041	0.007	0.387	0.020
50%	0.121	0.917	0.242	0.089	1.049	0.165
75%	0.428	1.899	0.588	0.410	2.154	0.538
max	0.999	25.104	0.999	0.997	21.089	0.998

and the boolean decision values

model	PI	VAE
var	rejected	rejected
count	1421	1421
unique	2	2
top	False	False
freq	1031	939

Load frequencies of observed features#

	data
	frequency
protein groups
A0A024QZX5;A0A087X1N8;P35237	186
A0A024R0T9;K7ER74;P02655	195
A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8	174
A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503	196
A0A075B6H7	91
...	...
Q9Y6R7	197
Q9Y6X5	173
Q9Y6Y8;Q9Y6Y8-2	197
Q9Y6Y9	119
S4R3U6	126

1421 rows × 1 columns

Compare shared features#

	PI				VAE				data
	p-unc	-Log10 pvalue	qvalue	rejected	p-unc	-Log10 pvalue	qvalue	rejected	frequency
protein groups
A0A024QZX5;A0A087X1N8;P35237	0.240	0.620	0.397	False	0.007	2.167	0.019	True	186
A0A024R0T9;K7ER74;P02655	0.059	1.228	0.139	False	0.032	1.497	0.071	False	195
A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8	0.040	1.393	0.103	False	0.266	0.574	0.392	False	174
A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503	0.420	0.377	0.580	False	0.249	0.604	0.372	False	196
A0A075B6H7	0.027	1.567	0.075	False	0.005	2.341	0.014	True	91
...	...	...	...	...	...	...	...	...	...
Q9Y6R7	0.175	0.756	0.316	False	0.175	0.756	0.283	False	197
Q9Y6X5	0.070	1.155	0.159	False	0.265	0.577	0.390	False	173
Q9Y6Y8;Q9Y6Y8-2	0.083	1.079	0.182	False	0.083	1.079	0.156	False	197
Q9Y6Y9	0.348	0.459	0.512	False	0.403	0.395	0.531	False	119
S4R3U6	0.469	0.328	0.623	False	0.060	1.218	0.120	False	126

1421 rows × 9 columns

Annotate decisions in Confusion Table style:#

Differential Analysis Comparison
PI (no)  - VAE (no)    880
PI (yes) - VAE (yes)   331
PI (no)  - VAE (yes)   151
PI (yes) - VAE (no)     59
Name: count, dtype: int64

List different decisions between models#

/tmp/ipykernel_88765/1417621106.py:6: FutureWarning: Starting with pandas version 3.0 all arguments of to_excel except for the argument 'excel_writer' will be keyword-only.
  _to_write.to_excel(writer, 'differences', **writer_args)
root - INFO     Writen to Excel file under sheet 'differences'.

	PI				VAE				data
	p-unc	-Log10 pvalue	qvalue	rejected	p-unc	-Log10 pvalue	qvalue	rejected	frequency
protein groups
A0A024QZX5;A0A087X1N8;P35237	0.240	0.620	0.397	False	0.007	2.167	0.019	True	186
A0A075B6H7	0.027	1.567	0.075	False	0.005	2.341	0.014	True	91
A0A075B6H9	0.487	0.312	0.638	False	0.020	1.710	0.047	True	189
A0A075B6I0	0.023	1.641	0.066	False	0.001	3.179	0.003	True	194
A0A075B6J9	0.028	1.552	0.077	False	0.018	1.738	0.045	True	156
...	...	...	...	...	...	...	...	...	...
Q9UKB5	0.014	1.861	0.044	True	0.129	0.890	0.223	False	148
Q9UNW1	0.016	1.803	0.049	True	0.957	0.019	0.972	False	171
Q9UP79	0.316	0.501	0.480	False	0.000	4.324	0.000	True	135
Q9UQ52	0.034	1.473	0.089	False	0.001	3.260	0.002	True	188
Q9Y6C2	0.828	0.082	0.896	False	0.009	2.067	0.023	True	119

210 rows × 9 columns

Plot qvalues of both models with annotated decisions#

Prepare data for plotting (qvalues)

	PI	VAE	frequency	Differential Analysis Comparison
protein groups
A0A024QZX5;A0A087X1N8;P35237	0.397	0.019	186	PI (no) - VAE (yes)
A0A024R0T9;K7ER74;P02655	0.139	0.071	195	PI (no) - VAE (no)
A0A024R3W6;A0A024R412;O60462;O60462-2;O60462-3;O60462-4;O60462-5;Q7LBX6;X5D2Q8	0.103	0.392	174	PI (no) - VAE (no)
A0A024R644;A0A0A0MRU5;A0A1B0GWI2;O75503	0.580	0.372	196	PI (no) - VAE (no)
A0A075B6H7	0.075	0.014	91	PI (no) - VAE (yes)
...	...	...	...	...
Q9Y6R7	0.316	0.283	197	PI (no) - VAE (no)
Q9Y6X5	0.159	0.390	173	PI (no) - VAE (no)
Q9Y6Y8;Q9Y6Y8-2	0.182	0.156	197	PI (no) - VAE (no)
Q9Y6Y9	0.512	0.531	119	PI (no) - VAE (no)
S4R3U6	0.623	0.120	126	PI (no) - VAE (no)

1421 rows × 4 columns

List of features with the highest difference in qvalues

	PI	VAE	frequency	Differential Analysis Comparison	diff_qvalue
protein groups
P17302	0.942	0.000	135	PI (no) - VAE (yes)	0.942
D6RF35	0.976	0.035	57	PI (no) - VAE (yes)	0.941
P22692;P22692-2	0.988	0.050	170	PI (no) - VAE (yes)	0.938
A0A087WU43;A0A087WX17;A0A087WXI5;P12830;P12830-2	0.931	0.000	134	PI (no) - VAE (yes)	0.930
P52758	0.000	0.927	119	PI (yes) - VAE (no)	0.927
...	...	...	...	...	...
F5GY80;F5H7G1;P07358	0.057	0.046	197	PI (no) - VAE (yes)	0.011
K7ERI9;P02654	0.042	0.053	196	PI (yes) - VAE (no)	0.011
Q9NX62	0.056	0.045	197	PI (no) - VAE (yes)	0.011
P00740;P00740-2	0.053	0.043	197	PI (no) - VAE (yes)	0.010
K7ERG9;P00746	0.052	0.042	197	PI (no) - VAE (yes)	0.010

210 rows × 5 columns

Differences plotted with created annotations#

pimmslearn.plotting - INFO     Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/diff_analysis_comparision_1_VAE

../../../_images/136952196901cfcbc2bea08ea03f051c42f8d2917a26dee0a12b82c81d482756.png

also showing how many features were measured (“observed”) by size of circle

pimmslearn.plotting - INFO     Saved Figures to runs/alzheimer_study/diff_analysis/AD/PI_vs_VAE/diff_analysis_comparision_2_VAE

../../../_images/7f192129f15c5bd80d9f93f02e8f9b4b18317644a5c307422cfb8d521e367fb7.png

Only features contained in model#

this block exist due to a specific part in the ALD analysis of the paper

root - INFO     No features only in new comparision model.

DISEASES DB lookup#

Query diseases database for gene associations with specified disease ontology id.

pimmslearn.databases.diseases - WARNING  There are more associations available

	ENSP	score
None
APP	ENSP00000284981	5.000
PSEN2	ENSP00000355747	5.000
PSEN1	ENSP00000326366	5.000
APOE	ENSP00000252486	5.000
TREM2	ENSP00000362205	4.825
...	...	...
PTTG1	ENSP00000377536	0.682
ISL2	ENSP00000290759	0.682
hsa-miR-4433b-3p	hsa-miR-4433b-3p	0.682
NEURL1B	ENSP00000358815	0.681
SLC26A4	ENSP00000494017	0.681

10000 rows × 2 columns

Shared features#

ToDo: new script -> DISEASES DB lookup

root - INFO     No gene annotation in scores index:  ['protein groups', 'Source'] Exiting.
/home/runner/work/pimms/pimms/project/.snakemake/conda/43fbe714d68d8fe6f9b0c93f5652adb3_/lib/python3.12/site-packages/IPython/core/interactiveshell.py:3756: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

An exception has occurred, use %tb to see the full traceback.

SystemExit: 0

only by model#

Only by model which were significant#

Shared which are only significant for by model#

mask = (scores_common[(str(args.model_key), 'rejected')] & mask_different)
mask.sum()

Only significant by RSN#

mask = (scores_common[(str(args.baseline), 'rejected')] & mask_different)
mask.sum()

Compare outcomes from differential analysis based on different imputation methods

Contents

Compare outcomes from differential analysis based on different imputation methods#

Parameters#

Excel file for exports#

Load scores#

Load baseline model scores#

Load selected comparison model scores#

Combined scores#

Describe scores#

One to one comparison of by feature:#

Load frequencies of observed features#

Compare shared features#

Annotate decisions in Confusion Table style:#

List different decisions between models#

Plot qvalues of both models with annotated decisions#

Differences plotted with created annotations#

Only features contained in model#

DISEASES DB lookup#

Shared features#

only by model#

Only by model which were significant#

Shared which are only significant for by model#

Only significant by RSN#

Write to excel#

Outputs#