API
Details about Psycho's functions.
Report
Data
Psycho.report
— Method.report(df::DataFrames.DataFrame; kwargs...)
Describe the variables in a DataFrame.
Arguments
df
: DataFrame.missing_percentage::Bool
: Show missings by percentage (default) or number.levels_percentage::Bool
: Show factor levels by percentage (default) or number.median::Bool
: Showmean
andsd
(default) ormedian
andmad
.dispersion::Bool
: Show dispersion (sd
ormad
).range::Bool
: Show range.n_strings::Int
: Number of different string elements to show.
Examples
report(simulate_data_correlation([[0.1], [0.2]]))
Models
Psycho.report
— Method.report(model::StatsModels.DataFrameRegressionModel{<:GLM.LinearModel};
CI::Number=95,
std_coefs::Bool=true,
effect_size="cohen1988")
Describe a linear model.
Arguments
model
: ALinearModel
.CI
: Confidence interval level.std_coefs
: Interpret effect sizes of standardized, rather than raw, coefs.effect_size
: Can be 'cohen1988', 'sawilowsky2009', or custom set of Rules (See cohen_d). Set tonothing
to omit interpretation.
Examples
using GLM, DataFrames
model = lm(@formula(y ~ Var1), DataFrame(y=[0, 1, 2, 3], Var1=[2, 3, 3.5, 4]))
report(model)
report(model::StatsModels.DataFrameRegressionModel{<:GLM.GeneralizedLinearModel}; CI=95)
Core
Psycho.datagrid
— Method.datagrid(df::DataFrames.DataFrame; cols=:all, n::Int=10, kwargs...)
Create a reference grid of data.
Arguments
df
: ADataFrame
(can be aStatsModel
).cols
: The target columns. The rest will be maintained "fixed".n
: For numeric targets columns, what desired length (controls the spacing).fix_num
: How to fix the numeric variables. Can be a function or a number.fix_fac
: How to fix the factors. Should be a String indicating an existing level.
Examples
df = simulate_data_correlation([[0.2, 0.5], [0.4, 0.2]])
grid = datagrid(df, n=3)
Psycho.standardize
— Function.standardize(X; robust::Bool=false)
Standardize (scale and reduce, Z-score) X so that the variables are expressed in terms of standard deviation (i.e., mean = 0, SD = 1).
Arguments
X
: Array or DataFrame.robust::Bool
: If true, the standardization will be based onmedian
andmad
instead ofmean
andsd
(default).
Ideas / help required:
- Deal with missing values (See #4)
Examples
standardize([1, 2, 3])
# output
3-element Array{Float64,1}:
-1.0
0.0
1.0
Psycho.perfectNormal
— Function.perfectNormal(n::Int, mean::Number=0, sd::Number=1)
Generate an almost-perfect normal distribution of size n
.
Arguments
n::Int
: Length of the vector.mean::Number
: Mean of the vector.sd::Number
: SD of the vector.
Examples
perfectNormal(10, 0, 1)
Psycho.r2_tjur
— Method.r2_tjur(model::StatsModels.DataFrameRegressionModel{<:GLM.GeneralizedLinearModel})
Compute Tjur's (2009) D (R²).
The Coefficients of Determination (D), also referred to as Tjur's R² (Tjur, 2009), is asymptotically equivalent to the classical version of R² for linear models.
Arguments
model
: AGeneralizedLinearModel
.
Examples
using GLM, DataFrames
model = glm(@formula(y ~ Var1), DataFrame(y=[0, 0, 1, 1], Var1=[1, 2, 2, 4]), GLM.Binomial())
r2_tjur(model)
# output
0.5
References
- Tjur, T. (2009). Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. The American Statistician, 63(4), 366-372.
Interpret
Psycho.Rules
— Type.Rules(breakpoints::AbstractVector, labels::AbstractVector, iflower::Bool=true)
Create a container for interpretation rules of thumb. See interpret(x::Real, rules::Rules)
.
Arguments
breakpoints
: Vector of value break points (edges defining categories).labels
: Labels associated with each category. Must contain one label more than breakpoints.iflower
: If true, each label will be given if the value is lower than its breakpoint. The contrary if false.
Examples
Rules([0.05], ["significant", "not significant"], true)
# output
Rules{Float64}([0.05], ["significant", "not significant"], true)
Psycho.interpret
— Method.interpret(x::Real, rules::Rules)
Interpret a value based on a set of rules of thumb.
Arguments
x
: The value to interpret.rules
: ARules
object.
Examples
p_rules = Rules([0.05], ["significant", "not significant"], true)
interpret(0.04, p_rules)
# output
"significant"
Psycho.interpret_p
— Method.interpret_p(p::Number; alpha=::Number=0.05)
Interpret the p value based on the alpha
level.
P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true (i.e., claiming "there is an effect" when there is not). For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. The traditional default for alpha
is .05
. However, a strong wave of criticism suggests to either justify your alpha (Lakens et al., 2018) or lower the treshold (for instance to .005; Benjamin et al., 2018).
Arguments
p
: The p value.alpha
: Significance treshold.
Examples
interpret_p(0.04)
# output
"significant"
Psycho.format_p
— Method.format_p(p::Number; stars::Bool=false)
Format the p value according to APA standards.
Arguments
p
: The p value.stars
: Add stars (*) when significant.
Examples
format_p(0.04, stars=true)
# output
"p < .05*"
Simulate
Data
Psycho.simulate_data_correlation
— Function.simulate_data_correlation(coefs; n::Int=100, noise::Number=0.0, groupnames=:random)
Generate a DataFrame of correlated variables.
Multiple Variables / Groups
- If
coefs
is a vector (*e.g.,[0.1, 0.2]
), the DataFrame will containlength(coefs)
variables (Var1, Var2, ...
). Altough uncorrelated between them, they are correlated to the outcome (y
) by the specified coefs. - If
coefs
is a vector of vectors (e.g.,[[0.1], [0.2]]
), it will createlength(coefs)
groups, *i.e., stacked DataFrames with a correlation between the variables and the outcome varying between groups. It is possible to specify thegroupnames
.
Arguments
coefs
: Correlation coefficients. Can be a number, a vector of numbers or a vector of vectors.n::Int
: Number of observations.noise::Number
: The SD of the random gaussian noise.groupnames::Vector
: Vector of group names (default to:random
).kwargs...
: Arguments to pass to other functions.
Examples
simulate_data_correlation(0.2)
Psycho.simulate_data_logistic
— Function.simulate_data_logistic(coefs; n::Int=100, noise::Number=0.0, groupnames=:random)
Generate a DataFrame of variables related a binary dependent variable by specified regression coefs
.
Multiple Variables / Groups
- If
coefs
is a vector (*e.g.,[0.1, 0.2]
), the DataFrame will containlength(coefs)
variables (Var1, Var2, ...
). Altough uncorrelated between them, they are correlated to the outcome (y
) by the specified coefs. - If
coefs
is a vector of vectors (e.g.,[[0.1], [0.2]]
), it will createlength(coefs)
groups, *i.e., stacked DataFrames with a correlation between the variables and the outcome varying between groups. It is possible to specify thegroupnames
.
Arguments
coefs
: Regression coefficients. Can be a number, a vector of numbers or a vector of vectors.n::Int
: Number of observations.noise::Number
: The SD of the random gaussian noise.groupnames::Vector
: Vector of group names (default to:random
).kwargs...
: Arguments to pass to other functions.
Ideas / help required:
- Different group sizes (See #9)
Examples
simulate_data_logistic(0.2)
SDT
Psycho.sdt_indices
— Method.sdt_indices(hit::Int, fa::Int, miss::Int, cr::Int; adjusted::Bool=true)
Compute Signal Detection Theory (SDT) indices (d', beta, c, A', B''...).
Signal detection theory (SDT) is used when psychologists want to measure the way we make decisions under conditions of uncertainty. SDT assumes that the decision maker is not a passive receiver of information, but an active decision-maker who makes difficult perceptual judgments under conditions of uncertainty. In tasks where stimuli were either present or absent, and the observer categorized each trial as having the stimulus present or absent, the trials can be sorted into one of four categories: Hit, Miss, Correct Rejection and False Alarm.
Arguments
hit
: Number of hits.fa
: Number of false alarms.miss
: Number of misses.cr
: Number of correct rejections.adjusted::Bool
: Use Hautus (1995) adjustments for extreme values.
Indices
Returns a Dict
containing the following indices:
- dprime (d'): Sensitivity (pronounced ("dee-prime"). Reflects the distance between the two distributions: signal, and signal+noise and corresponds to the Z value of the hit-rate minus that of the false-alarm rate.
- beta: Likelihood ratio decision criterion. The value for beta is the ratio of the normal density functions at the criterion of the Z values used in the computation of d'. This reflects an observer's bias to say 'yes' or 'no' with the unbiased observer having a value around 1.0. As the bias to say 'yes' increases (liberal), resulting in a higher hit-rate and false-alarm-rate, beta approaches 0.0. As the bias to say 'no' increases (conservative), resulting in a lower hit-rate and false-alarm rate, beta increases over 1.0 on an open-ended scale.
- c: Another index of response bias. the number of standard deviations from the midpoint between these two distributions, i.e., a measure on a continuum from "conservative" to "liberal".
- c_relative (c'): Scaled criterion location (c) relative to performance (d'). Indeed, with easier discrimination tasks a more extreme criterion (as measured by c) would be needed to yield the same amount of bias.
- Xc: Decision criterion, given by the negative standardized false alarm rate (DeCarlo, 1998).
- aprime (A'): Non-parametric estimate of discriminability. An A' near 1.0 indicates good discriminability, while a value near 0.5 means chance performance.
- bpp (B''): Also referred to as B''D (pronounced "b prime prime d"). Non-parametric estimate of bias. A B'' equal to 0.0 indicates no bias, positive numbers represent conservative bias (i.e., a tendency to answer 'no'), negative numbers represent liberal bias (i.e., a tendency to answer 'yes'). The maximum absolute value is 1.0.
- pr and br: Indices based on the Two-High Threshold Model (Feenan & Snodgrass, 1990). Pr is the discrimination measure (also sometimes called the corrected recognition score). Br is the bias measure; values greater than 0.5 indicate a liberal bias, values less than 0.5 indicate a conservative bias.
- dprime_glm and Xc_glm: Indices as estimated by a Bernouilli probit GLM.
Note that for d' and beta, adjustement for extreme values are made by default following the recommandations of Hautus (1995).
Ideas / help required:
- Compute new indices (See #17)
Examples
sdt_indices(hit=6, fa=7, miss=8, cr=9)
References
- Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user's guide (2nd ed.). Hove, England: Psychology Press
- Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior research methods, instruments, & computers, 31(1), 137-149.
- https://memory.psych.mun.ca/models/recognition/index.shtml
Misc
Psycho.simulate_coefs_correlation
— Function.simulate_coefs_correlation(coefs_mean::Number=0.1; coefs_sd::Number=0.1, n::Int=10)
Generate a vector of random correlation coefficients from a normal distribution.
Arguments
coefs_mean::Number
: Mean of the normal distribution from which to get the coefs.coefs_sd::Number
: SD of the normal distribution.n::Int
: Number of coefficients.
Examples
simulate_coefs_correlation(0.5)
Psycho.simulate_groupnames
— Method.simulate_groupnames(n::Int; nchar::Int=2)
Create vector of random group names of length n
containing nchar
characters.
Arguments
n::Int
: Number of group names.nchar::Int
: Number of random characters in the name.
Examples
simulate_groupnames(10)