API

Details about Psycho's functions.

Report

Models

Psycho.report — Method.

report(model::StatsModels.DataFrameRegressionModel{<:GLM.LinearModel};
    CI::Number=95,
    std_coefs::Bool=true,
    effect_size="cohen1988")

Describe a linear model.

Arguments

model: A LinearModel.
CI: Confidence interval level.
std_coefs: Interpret effect sizes of standardized, rather than raw, coefs.
effect_size: Can be 'cohen1988', 'sawilowsky2009', or custom set of Rules (See cohen_d). Set to nothing to omit interpretation.

Examples

using GLM, DataFrames

model = lm(@formula(y ~ Var1), DataFrame(y=[0, 1, 2, 3], Var1=[2, 3, 3.5, 4]))
report(model)

source

report(model::StatsModels.DataFrameRegressionModel{<:GLM.GeneralizedLinearModel}; CI=95)

Core

Psycho.datagrid — Method.

datagrid(df::DataFrames.DataFrame; cols=:all, n::Int=10, kwargs...)

Create a reference grid of data.

Arguments

df: A DataFrame (can be a StatsModel).
cols: The target columns. The rest will be maintained "fixed".
n: For numeric targets columns, what desired length (controls the spacing).
fix_num: How to fix the numeric variables. Can be a function or a number.
fix_fac: How to fix the factors. Should be a String indicating an existing level.

Examples

df = simulate_data_correlation([[0.2, 0.5], [0.4, 0.2]])

grid = datagrid(df, n=3)

source

Psycho.standardize — Function.

standardize(X; robust::Bool=false)

Standardize (scale and reduce, Z-score) X so that the variables are expressed in terms of standard deviation (i.e., mean = 0, SD = 1).

Arguments

X: Array or DataFrame.
robust::Bool: If true, the standardization will be based on median and mad instead of mean and sd (default).

Note

Ideas / help required:

Deal with missing values (See #4)

Examples

standardize([1, 2, 3])

# output

3-element Array{Float64,1}:
 -1.0
  0.0
  1.0

source

Psycho.perfectNormal — Function.

perfectNormal(n::Int, mean::Number=0, sd::Number=1)

Generate an almost-perfect normal distribution of size n.

Arguments

n::Int: Length of the vector.
mean::Number: Mean of the vector.
sd::Number: SD of the vector.

Examples

perfectNormal(10, 0, 1)

source

Psycho.r2_tjur — Method.

r2_tjur(model::StatsModels.DataFrameRegressionModel{<:GLM.GeneralizedLinearModel})

Compute Tjur's (2009) D (R²).

The Coefficients of Determination (D), also referred to as Tjur's R² (Tjur, 2009), is asymptotically equivalent to the classical version of R² for linear models.

Arguments

model: A GeneralizedLinearModel.

Examples

using GLM, DataFrames

model = glm(@formula(y ~ Var1), DataFrame(y=[0, 0, 1, 1], Var1=[1, 2, 2, 4]), GLM.Binomial())
r2_tjur(model)

# output
0.5

References

Tjur, T. (2009). Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. The American Statistician, 63(4), 366-372.

source

Interpret

Psycho.Rules — Type.

Rules(breakpoints::AbstractVector, labels::AbstractVector, iflower::Bool=true)

Create a container for interpretation rules of thumb. See interpret(x::Real, rules::Rules).

Arguments

breakpoints: Vector of value break points (edges defining categories).
labels: Labels associated with each category. Must contain one label more than breakpoints.
iflower: If true, each label will be given if the value is lower than its breakpoint. The contrary if false.

Examples

Rules([0.05], ["significant", "not significant"], true)

# output

Rules{Float64}([0.05], ["significant", "not significant"], true)

source

Psycho.interpret — Method.

interpret(x::Real, rules::Rules)

Interpret a value based on a set of rules of thumb.

Arguments

x: The value to interpret.
rules: A Rules object.

Examples

p_rules = Rules([0.05], ["significant", "not significant"], true)
interpret(0.04, p_rules)

# output

"significant"

source

Psycho.interpret_p — Method.

interpret_p(p::Number; alpha=::Number=0.05)

Interpret the p value based on the alpha level.

P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true (i.e., claiming "there is an effect" when there is not). For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. The traditional default for alpha is .05. However, a strong wave of criticism suggests to either justify your alpha (Lakens et al., 2018) or lower the treshold (for instance to .005; Benjamin et al., 2018).

Arguments

p: The p value.
alpha: Significance treshold.

Examples

interpret_p(0.04)

# output

"significant"

source

Psycho.format_p — Method.

format_p(p::Number; stars::Bool=false)

Format the p value according to APA standards.

Arguments

p: The p value.
stars: Add stars (*) when significant.

Examples

format_p(0.04, stars=true)

# output

"p < .05*"

source

Simulate

Data

Psycho.simulate_data_correlation — Function.

simulate_data_correlation(coefs; n::Int=100, noise::Number=0.0, groupnames=:random)

Generate a DataFrame of correlated variables.

Multiple Variables / Groups

If coefs is a vector (*e.g., [0.1, 0.2]), the DataFrame will contain length(coefs) variables (Var1, Var2, ...). Altough uncorrelated between them, they are correlated to the outcome (y) by the specified coefs.
If coefs is a vector of vectors (e.g., [[0.1], [0.2]]), it will create length(coefs) groups, *i.e., stacked DataFrames with a correlation between the variables and the outcome varying between groups. It is possible to specify the groupnames.

Arguments

coefs: Correlation coefficients. Can be a number, a vector of numbers or a vector of vectors.
n::Int: Number of observations.
noise::Number: The SD of the random gaussian noise.
groupnames::Vector: Vector of group names (default to :random).
kwargs...: Arguments to pass to other functions.

Note

Ideas / help required:

Different group sizes (See #9)
Bug in some cases (e.g., simulate_data_correlation([0.2, 0.9, 0.5])) related to failure in Cholesky factorization (See #11)

Examples

simulate_data_correlation(0.2)

source

Psycho.simulate_data_logistic — Function.

simulate_data_logistic(coefs; n::Int=100, noise::Number=0.0, groupnames=:random)

Generate a DataFrame of variables related a binary dependent variable by specified regression coefs.

Warning

This function, adapted from this thread, doesn't work as expected (See #27).

Multiple Variables / Groups

If coefs is a vector (*e.g., [0.1, 0.2]), the DataFrame will contain length(coefs) variables (Var1, Var2, ...). Altough uncorrelated between them, they are correlated to the outcome (y) by the specified coefs.
If coefs is a vector of vectors (e.g., [[0.1], [0.2]]), it will create length(coefs) groups, *i.e., stacked DataFrames with a correlation between the variables and the outcome varying between groups. It is possible to specify the groupnames.

Arguments

coefs: Regression coefficients. Can be a number, a vector of numbers or a vector of vectors.
n::Int: Number of observations.
noise::Number: The SD of the random gaussian noise.
groupnames::Vector: Vector of group names (default to :random).
kwargs...: Arguments to pass to other functions.

Note

Ideas / help required:

Different group sizes (See #9)

Examples

simulate_data_logistic(0.2)

source

SDT

Psycho.sdt_indices — Method.

sdt_indices(hit::Int, fa::Int, miss::Int, cr::Int; adjusted::Bool=true)

Compute Signal Detection Theory (SDT) indices (d', beta, c, A', B''...).

Signal detection theory (SDT) is used when psychologists want to measure the way we make decisions under conditions of uncertainty. SDT assumes that the decision maker is not a passive receiver of information, but an active decision-maker who makes difficult perceptual judgments under conditions of uncertainty. In tasks where stimuli were either present or absent, and the observer categorized each trial as having the stimulus present or absent, the trials can be sorted into one of four categories: Hit, Miss, Correct Rejection and False Alarm.

Arguments

hit: Number of hits.
fa: Number of false alarms.
miss: Number of misses.
cr: Number of correct rejections.
adjusted::Bool: Use Hautus (1995) adjustments for extreme values.

Indices

Returns a Dict containing the following indices:

dprime (d'): Sensitivity (pronounced ("dee-prime"). Reflects the distance between the two distributions: signal, and signal+noise and corresponds to the Z value of the hit-rate minus that of the false-alarm rate.
beta: Likelihood ratio decision criterion. The value for beta is the ratio of the normal density functions at the criterion of the Z values used in the computation of d'. This reflects an observer's bias to say 'yes' or 'no' with the unbiased observer having a value around 1.0. As the bias to say 'yes' increases (liberal), resulting in a higher hit-rate and false-alarm-rate, beta approaches 0.0. As the bias to say 'no' increases (conservative), resulting in a lower hit-rate and false-alarm rate, beta increases over 1.0 on an open-ended scale.
c: Another index of response bias. the number of standard deviations from the midpoint between these two distributions, i.e., a measure on a continuum from "conservative" to "liberal".
c_relative (c'): Scaled criterion location (c) relative to performance (d'). Indeed, with easier discrimination tasks a more extreme criterion (as measured by c) would be needed to yield the same amount of bias.
Xc: Decision criterion, given by the negative standardized false alarm rate (DeCarlo, 1998).
aprime (A'): Non-parametric estimate of discriminability. An A' near 1.0 indicates good discriminability, while a value near 0.5 means chance performance.
bpp (B''): Also referred to as B''D (pronounced "b prime prime d"). Non-parametric estimate of bias. A B'' equal to 0.0 indicates no bias, positive numbers represent conservative bias (i.e., a tendency to answer 'no'), negative numbers represent liberal bias (i.e., a tendency to answer 'yes'). The maximum absolute value is 1.0.
pr and br: Indices based on the Two-High Threshold Model (Feenan & Snodgrass, 1990). Pr is the discrimination measure (also sometimes called the corrected recognition score). Br is the bias measure; values greater than 0.5 indicate a liberal bias, values less than 0.5 indicate a conservative bias.
dprime_glm and Xc_glm: Indices as estimated by a Bernouilli probit GLM.

Note that for d' and beta, adjustement for extreme values are made by default following the recommandations of Hautus (1995).

Note

Ideas / help required:

Compute new indices (See #17)

Examples

sdt_indices(hit=6, fa=7, miss=8, cr=9)

References

Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user's guide (2nd ed.). Hove, England: Psychology Press
Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior research methods, instruments, & computers, 31(1), 137-149.
https://memory.psych.mun.ca/models/recognition/index.shtml

source

Misc

Psycho.simulate_coefs_correlation — Function.

simulate_coefs_correlation(coefs_mean::Number=0.1; coefs_sd::Number=0.1, n::Int=10)

Generate a vector of random correlation coefficients from a normal distribution.

Arguments

coefs_mean::Number: Mean of the normal distribution from which to get the coefs.
coefs_sd::Number: SD of the normal distribution.
n::Int: Number of coefficients.

Examples

simulate_coefs_correlation(0.5)

source

Psycho.simulate_groupnames — Method.

simulate_groupnames(n::Int; nchar::Int=2)

Create vector of random group names of length n containing nchar characters.

Arguments

n::Int: Number of group names.
nchar::Int: Number of random characters in the name.

Note

Ideas / help required:

Can be enhanced to make it more procedural and less random (See #8)
implement different types simulations (e.g., "A, B... AA, AB...") (See #8)

Examples

simulate_groupnames(10)

source