PAPY

Power Analysis (Python) tool Developed by Dr. Goncalo Correia and Dr Jianliang Gao Imperial College London 2016

simple usage: python pa.py TutorialData.csv 8

“TutorialData.csv” is input test data set, can be replaced by actual data set name.

“8” means the first 8 variables, which can be a range, e.g., 8-16

full usage: python pa.py TutorialData.csv 2-9 0:100:500 0.05:0.05:0.7 20 0 4

“0:100:500” (default value) means the range of sample sizes from 0 to 500 (not inclusive) with interval of 100.

“0.05:0.05:0.7” (default value) means the range of effect sizes from 0.05 to 0.7 (not inclusive) with interval of 0.05.

“20” is an integer number of repeats. Default value is 10.

“0” is the default input for working for classification only. Please choose 1 to work on regression only or 2 to work on both.

“4” is an integer number as number of CPU cores to use. By default is to use all available cores.

pa.iSurfacePlot(output, svfilename, variable, metric, correction, samplsizes, sizeeff, nreps)[source]

This is for plotting interactive 3D surface plot for mean of all variables.

Parameters:
  • output (array) – array for plotting.2D numpy array
  • svfilename (String) – filename for saving the corresponding plot.
  • variable (int) – variable index nubmer. for plotting mean of all variables, as the size is 1, therefore use 1 as input parameter.
  • metric (int) – metric (confusion matrix, ‘TP’, ‘TF’ etc) index nubmer. for plotting mean of all variables, as the size is 1, therefore use 1 as input parameter.
  • correction (int) – correction methods (no correction, Bonferroni, Benjamini-Hochberg, Benjamini-Yekutieli index nubmer. for plotting mean of all variables, as the size is 1, therefore use 1 as input parameter.
  • samplsizes (array) – sample size matrix, numpy array 1 x n
  • sizeeff (array) – effect size matrix, numpy array 1 x n
  • nreps (int) – number of repeats
Returns:

pa.iSlicesPlot(X, Y, Error_y, svfilename, plot_title, x_caption, y_caption, trace_label, trace_num)[source]

For plotting slices from surface plots. Interactive plots with error bars.

Parameters:
  • X (array) – matrix for x-axis. either sample size or effect size. Use 1 x n numpy array.
  • Y (array) – matrix of mean proportion of variables reach power>threshold.
  • Error_y (array) – a matrix contains standard deviation of proportion of variables reach power>threshold within the repeats.
  • svfilename (string) – filename for saving plots in the running folder.
  • plot_title (String) – Title for the plot
  • x_caption (String) – Caption for x-axis
  • y_caption (String) – Caption for y-axis
  • trace_label (String) – for each trace to show relevant txt.
  • trace_num (array) – for trace label when mouse cursor moving along plot to show which line is which
Returns:

pa.iSurfacePlotTPR(output, svfilename, correction, samplsizes, sizeeff, nreps)[source]

This is for plotting interactive 3D surface plot for mean of proportion of all variables with power>0.8 accross all repeats

Parameters:
  • output (array) – array for plotting.2D numpy array.
  • svfilename (String) – filename for saving the corresponding plot.
  • correction (int) – correction methods (no correction, Bonferroni, Benjamini-Hochberg, Benjamini-Yekutieli index nubmer. for plotting mean of all variables, as the size is 1, therefore use 1 as input parameter.
  • samplsizes (array) – sample size matrix, numpy array 1 x n.
  • sizeeff (array) – effect size matrix, numpy array 1 x n
  • nreps (int) – number of repeats
Returns:

pa.simulateLogNormal(data, covType, nSamples)[source]

Using log(10) of input matrix to generate normally distributed multivariate matrix

Parameters:
  • data (array) – data matrix input
  • covType (String) – Estimate a covariance matrix or Extract/construct a diagonal array from given variance.
  • nSamples (int) – number of simulated samples.
Returns:

Return simulation data and correlation matrix

Return type:

array array

pa.PCalc_Continuous(data, EffectSizes, SampSizes, SignThreshold, nSimSamp, nRepeat, cores)[source]

Calculate the confusion matrix with/without correction methods by regression.

Parameters:
  • data (array) – Test Matrix. numpy array.
  • EffectSizes (array) – effect size matrix, numpy array 1 x n
  • SampSizes (array) – Sample size matrix, numpy array 1 x n
  • SignThreshold (float) – Threshold of significance level.This tool uses 0.05.
  • nSimSamp (int) – number of simulated samples. This tool uses 5000.
  • nRepeat (int) – number of repeats in the calculation of confusion matrix using simulated data.By default it is set to 10.
  • cores (int) – number of CPU cores to be used for parallel.
Returns:

pa.f_multiproc_cont(sampSizes, signThreshold, effectSizes, numVars, nRepeats, nSampSizes, nEffSizes, Samples_seg, correlationMat_seg, cols, cores, currCore)[source]

Lauch parallel computing for PCalc_Continuous.

Parameters:
  • sampSizes (array) – Sample size matrix, numpy array 1 x n.
  • signThreshold (float) – Threshold of significance level.This tool uses 0.05.
  • effectSizes (array) – Effect size matrix, numpy array 1 x n.
  • numVars (int) – number of variables in current CPU core.
  • nRepeats (int) – number of repeats in the calculation of confusion matrix using simulated data.By default it is set to 10.
  • nSampSizes (int) – length of Sample size matrix.
  • nEffSizes (int) – length of effect size matrix.
  • Samples_seg (array) – Segment of simulated sample data. Numpy array.
  • correlationMat_seg (array) – Segment of correlation matrix on the simulated sample data. Numpy array.
  • cols (int) – number of variables to be used for calculation.
  • cores (int) – number of CPU cores to be used.
  • currCore (int) – current CPU core in use.
Returns:

pa.PCalc_2Group(data, EffectSizes, SampSizes, SignThreshold, nSimSamp, nRepeat, cores)[source]

Calculate the confusion matrix with/without correction methods by classification.

Parameters:
  • data (array) – Test Matrix. numpy array.
  • EffectSizes (array) – effect size matrix, numpy array 1 x n
  • SampSizes (array) – Sample size matrix, numpy array 1 x n
  • SignThreshold (float) – Threshold of significance level.This tool uses 0.05.
  • nSimSamp (int) – number of simulated samples. This tool uses 5000.
  • nRepeat (int) – number of repeats in the calculation of confusion matrix using simulated data.By default it is set to 10.
  • cores (int) – number of CPU cores to be used for parallel.
Returns:

pa.f_multiproc(sampSizes, signThreshold, effectSizes, numVars, nRepeats, nSampSizes, nEffSizes, Samples_seg, correlationMat_seg, cols, cores, currCore)[source]

Lauch parallel computing for PCalc_2Group.

Parameters:
  • sampSizes (array) – Sample size matrix, numpy array 1 x n.
  • signThreshold (float) – Threshold of significance level.This tool uses 0.05.
  • effectSizes (array) – Effect size matrix, numpy array 1 x n.
  • numVars (int) – number of variables in current core.
  • nRepeats (int) – number of repeats in the calculation of confusion matrix using simulated data.By default it is set to 10.
  • nSampSizes (int) – length of Sample size matrix.
  • nEffSizes (int) – length of effect size matrix.
  • Samples_seg (array) – Segment of simulated sample data. Numpy array.
  • correlationMat_seg (array) – Segment of correlation matrix on the simulated sample data. Numpy array.
  • cols (int) – number of variables to be used for calculation.
  • cores (int) – number of CPU cores to be used.
  • currCore (int) – current CPU core in use.
Returns:

pa.randperm(totalLen, subLen)[source]

This function is for random permuation and pick up the sub array according to the specified size

Parameters:
  • totalLen (int) – Total length.
  • subLen (int) – Sub length.
Returns:

pa.expecting()[source]

function expecting is for detecting the number of outputs; written by Sami Hangaslammi Return how many values the caller is expecting

Returns:
pa.fdr_bh(*args)[source]

Function false discovery rate by Benjamini-Hochberg correction

Parameters:args
Returns:
pa.calcConfMatrixUniv(p, corrVector, signThreshold, corrThresh)[source]

Function of calculating confusion matrix

Parameters:
  • p (array) – p_value matrix. numpy array
  • corrVector (array) – correlation vector.
  • signThreshold (float) – Significance threshold.
  • corrThresh (float) – power level
Returns: