ANOVA |
A statistical test used to compare the means of three or more groups. |
Bar chart |
A graph showing the frequencies of different categories, with the horizontal axis representing the categories and the vertical axis representing the frequencies. |
Beta distribution |
A continuous distribution of probabilities that is defined on the interval between 0 and 1, and is often used in Bayesian statistics. |
Between-subjects design |
A research design in which different subjects are measured under different conditions or at different time points. |
Binomial distribution |
A distribution of probabilities for a discrete variable that has only two possible outcomes, such as heads or tails. |
Box plot |
A graph showing the distribution of a set of data, with a box representing the middle 50% of the values and whiskers extending to the minimum and maximum values. |
Central limit theorem |
The statistical principle that states that the distribution of sample means will be approximately normal, regardless of the distribution of the population from which the samples are drawn. |
Chi-square test |
A statistical test used to determine whether two categorical variables are related. |
Cluster analysis |
A statistical technique used to group data into clusters or groups based on similarity. |
Cluster sampling |
A sampling method in which the population is divided into groups or clusters, and a representative sample is selected from each cluster. |
Confidence interval |
A range of values that is likely to contain the true value of a population parameter, with a certain level of confidence. |
Confirmatory factor analysis |
A statistical technique used to test the fit of a statistical model to the data, and to identify the underlying structure of a set of observed variables. |
Continuous variable |
A variable that can take on any value within a given range. |
Correlation |
A statistical relationship between two variables, measured by the strength and direction of the linear relationship between them. |
Cox proportional hazards model |
A statistical model used to estimate the risk of an event occurring over time, taking into account the effects of multiple covariates. |
Discrete variable |
A variable that can only take on specific, distinct values. |
Discriminant analysis |
A statistical technique used to classify observations into different groups based on their characteristics. |
Event history analysis |
A statistical technique used to analyze data on the timing and occurrence of events, such as transitions between different states or stages. |
Exponential distribution |
A continuous distribution of probabilities that represents the time between events occurring at a constant rate. |
F-distribution |
A continuous distribution of probabilities that is used in hypothesis testing to compare the variances of two or more groups. |
Factor analysis |
A statistical technique used to identify the underlying structure or patterns in a set of correlated variables. |
Factorial ANOVA |
A statistical test used to analyze the effects of two or more factors on a response variable, assuming that the data are normally distributed and the variances are equal. |
Factorial design |
A research design in which multiple treatment conditions are combined in a single study, allowing for the analysis of main effects and interactions. |
Fixed effect |
A variable in a statistical model that is considered to be a fixed part of the model, and is not allowed to vary across different levels or groups. |
Frequency distribution |
A tabular summary of the data showing the number of occurrences of each unique value or range of values. |
Friedman test |
A nonparametric statistical test used to compare the means of two or more groups, when the data are not normally distributed or the variances are not equal, and the subjects are measured under multiple conditions or at multiple time points. |
Generalizability |
The extent to which the results of a study can be generalized to a larger population. |
Generalized estimating equations |
A statistical technique used to estimate the parameters of a statistical model when the data are correlated or unbalanced. |
Generalized linear mixed model |
A statistical model that extends the generalized linear model to allow for both fixed and random effects. |
Generalized linear model |
A statistical model that extends the linear regression model to allow for non-normal distributions of the response variable. |
Heteroscedasticity |
A violation of the assumption of homogeneity of variance, where the variance of |
Histogram |
A graph showing the frequency distribution of a set of data, with the horizontal axis representing the values and the vertical axis representing the frequencies. |
Hyperparameter |
A parameter that is set before training a machine learning model, influencing the model's behavior and performance. |
Hyperparameter Search |
The process of finding the optimal hyperparameter values for a machine learning model through methods like grid search, random search, or Bayesian optimization. |
Hyperparameter Tuning |
The process of finding the best hyperparameter values for a machine learning model, often done using techniques like grid search or random search. |
Hypothesis testing |
A statistical procedure used to evaluate the validity of a hypothesis or claim about a population, by comparing the observed data to what would be expected under the null hypothesis. |
Imbalanced Class Handling |
Techniques used to deal with imbalanced class distributions in classification tasks, such as class weighting or resampling. |
Interquartile range |
The difference between the upper and lower quartiles of a set of data. |
Interval scale |
A scale of measurement in which the categories have a numerical order and the intervals between the categories are equal, but there is no true zero point. |
Item response theory |
A statistical theory used to model the relationship between an individual's ability and their performance on a test or assessment. |
Kruskal-Wallis test |
A nonparametric statistical test used to compare the medians of three or more groups, when the data are not normally distributed or the variances are not equal. |
Kurtosis |
A measure of the peakedness or flatness of a distribution, indicating whether it has a heavy or light tail. |
Latent class analysis |
A statistical technique used to identify unobserved or latent classes or groups within a population based on observed characteristics. |
Latent growth curve model |
A statistical model that estimates individual differences in the rate and level of change over time. |
Latent semantic analysis |
A statistical technique used to analyze the relationships between words and documents in a text corpus. |
Line chart |
A graph showing the trend or pattern in a set of data over time, with the horizontal axis representing the time and the vertical axis representing the values. |
Logistic regression |
A statistical analysis used to predict the probability of a binary outcome, such as success or failure. |
Longitudinal data analysis |
A statistical technique used to analyze data that are collected at multiple time points from the same subjects. |
MANOVA |
A statistical test used to compare the means of two or more groups on multiple dependent variables, assuming that the data are normally distributed and the variances are equal. |
Maximum likelihood estimation |
A statistical technique used to estimate the parameters of a statistical model that maximizes the likelihood of the observed data. |
McNemar test |
A statistical test used to compare the proportions of two groups on a dichotomous outcome, when the data are paired or matched. |
Mean |
The average of a set of numbers, calculated by adding all the numbers together and dividing by the number of items in the set. |
Mean Absolute Error (MAE) |
A loss function used in regression tasks, calculated as the average absolute difference between predicted and actual values. |
Mean Squared Error (MSE) |
A common loss function used in regression tasks, calculated as the average squared difference between predicted and actual values. |
Mean Squared Logarithmic Error (MSLE) |
A loss function used in regression tasks, calculated as the average squared logarithmic difference between predicted and actual values. |
Median |
The middle value in a set of numbers, where half the values are higher and half are lower. |
Meta-analysis |
A statistical technique used to synthesize and combine the results of multiple studies, in order to estimate the overall effect size and statistical significance of a research question. |
Mixed ANOVA |
A statistical test used to analyze data with a mixed design, where some subjects are measured under multiple conditions or at multiple time points, while others are only measured once. |
Mixed effects model |
A statistical model that includes both fixed and random effects, allowing for the analysis of both within- and between-group variations. |
Mode |
The most frequently occurring value in a set of numbers. |
Multilevel modeling |
A statistical technique used to analyze data with a hierarchical or nested structure, such as data from individuals nested within groups. |
Multinomial logistic regression |
A statistical analysis used to predict the probability of a categorical outcome with more than two categories. |
Multiple regression |
A statistical analysis used to predict the value of a dependent variable based on the values of two or more independent variables. |
Multivariate analysis |
A statistical analysis that involves the simultaneous study of multiple variables. |
Nominal scale |
A scale of measurement in which the categories are mutually exclusive and do not have a numerical order. |
Normality test |
A statistical test used to determine whether a set of data follows a normal distribution. |
One-way ANOVA |
A statistical test used to compare the means of three or more groups, assuming that the data are normally distributed and the variances are equal. |
Ordinal logistic regression |
A statistical analysis used to predict the probability of an ordinal outcome, such as a rating scale. |
Ordinal scale |
A scale of measurement in which the categories have a numerical order, but the intervals between the categories are not equal. |
Outlier |
A value that is significantly higher or lower than the other values in a set of data. |
P-value |
The probability of obtaining a result as extreme or more extreme than the observed data, if the null hypothesis is true. |
Panel data analysis |
A statistical technique used to analyze data that are collected from the same subjects over multiple time points. |
Partial correlation coefficient |
A statistical measure of the association between two variables, controlling for the effects of one or more other variables. |
Pearson's correlation coefficient |
A statistical measure of the linear association between two continuous variables, ranging from -1 to 1. |
Percentile |
The value below which a certain percentage of the data falls. |
Point-biserial correlation coefficient |
A statistical measure of the association between a continuous variable and a dichotomous variable. |
Poisson distribution |
A distribution of probabilities for a discrete variable that represents the number of events occurring in a fixed interval of time or space. |
Power |
The probability of correctly rejecting the null hypothesis, given that it is false. |
Principal component analysis |
A statistical technique used to reduce the dimensionality of a data set by projecting the data onto a lower-dimensional space. |
Probability |
The likelihood or chance of an event occurring, expressed as a number between 0 and 1. |
Quartile |
One of the three points that divide a set of data into four equal parts. |
Random effect |
A variable in a statistical model that is allowed to vary across different levels or groups, but is not considered to be a fixed part of the model. |
Random sampling |
A sampling method in which each member of the population has an equal chance of being selected for the sample. |
Range |
The difference between the highest and lowest values in a set of numbers. |
Rasch model |
A statistical model used in item response theory to measure an individual's ability or trait level based on their responses to a series of items. |
Ratio scale |
A scale of measurement in which the categories have a numerical order, the intervals between the categories are equal, and there is a true zero point. |
Regression |
A statistical analysis used to predict the value of a dependent variable based on the value of one or more independent variables. |
Repeated measures ANOVA |
A statistical test used to compare the means of two or more groups, where the subjects are measured under multiple conditions or at multiple time points. |
Repeated measures design |
A research design in which the same subjects are measured under multiple conditions or at multiple time points. |
Sampling |
The process of selecting a subset of a population for study, in order to make inferences about the population as a whole. |
Scatter plot |
A graph showing the relationship between two numerical variables, with each data point represented by a dot plotted on the horizontal and vertical axes. |
Skewness |
A measure of the asymmetry of a distribution, indicating whether it is skewed to the left or right. |
Spearman's rank correlation coefficient |
A statistical measure of the monotonic association between two ordinal or continuous variables, ranging from -1 to 1. |
Standard deviation |
A measure of the dispersion or spread of a set of numbers, calculated as the square root of the variance. |
Stratified sampling |
A sampling method in which the population is divided into subgroups or strata, and a representative sample is selected from each stratum. |
Structural equation modeling |
A statistical technique used to test and estimate relationships between variables, both observed and latent. |
Survival analysis |
A statistical technique used to analyze data on the time it takes for an event of interest to occur, such as death or failure. |
T-score |
A standardized score used in hypothesis testing, calculated as the number of standard deviations a sample mean is from the hypothesized population mean. |
t-test |
A statistical test used to compare the means of two groups, assuming that the data are normally distributed and the variances are equal. |
Time series analysis |
A statistical technique used to analyze data that are collected at regular intervals over time. |
Type I error |
The error of rejecting the null hypothesis when it is true. |
Type II error |
The error of failing to reject the null hypothesis when it is false. |
Variance |
A measure of the dispersion or spread of a set of numbers, calculated as the average of the squared differences from the mean. |
Weibull distribution |
A continuous distribution of probabilities that is often used to model failure times or lifespan data. |
Wilcoxon rank-sum test |
A nonparametric statistical test used to compare the medians of two groups, when the data are not normally distributed or the variances are not equal. |
Z-score |
The number of standard deviations a value is from the mean of a distribution. |