Statistics

Probability theory and statistical inference for data science

Statistical Foundations

Statistics provides the theoretical foundation for understanding data distributions, making inferences, and quantifying uncertainty in data science and machine learning.

Core Statistical Concepts

Descriptive Statistics

Descriptive Statistics
  • Mean, Median, Mode
  • Variance, Standard Deviation
  • Quartiles, Percentiles

Inferential Statistics

Inferential Statistics
  • Hypothesis Testing
  • Confidence Intervals
  • P-values

Probability Distributions

Continuous Distributions

Continuous Distributions
  • • Normal Distribution
  • • Student's t-Distribution
  • • Chi-Square Distribution
  • • F-Distribution

Discrete Distributions

Discrete Distributions
  • • Binomial Distribution
  • • Poisson Distribution
  • • Geometric Distribution
  • • Negative Binomial

Statistical Analysis with Python

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)

# Descriptive statistics
mean = np.mean(data)
std = np.std(data)
quartiles = np.percentile(data, [25, 50, 75])

# Hypothesis testing
# One-sample t-test
t_stat, p_value = stats.ttest_1samp(data, popmean=0)

# Confidence interval
ci = stats.t.interval(alpha=0.95, df=len(data)-1,
                     loc=mean, scale=stats.sem(data))

# Visualization
plt.figure(figsize=(12, 6))

# Histogram with KDE
sns.histplot(data=data, kde=True)
plt.title('Distribution of Data')
plt.axvline(mean, color='r', linestyle='--', label='Mean')
plt.axvline(ci[0], color='g', linestyle=':', label='95% CI')
plt.axvline(ci[1], color='g', linestyle=':')
plt.legend()

plt.show()

print(f"Mean: {mean:.2f}")
print(f"Standard Deviation: {std:.2f}")
print(f"95% Confidence Interval: ({ci[0]:.2f}, {ci[1]:.2f})")
print(f"P-value: {p_value:.4f}")

Common Statistical Tests

Parametric Tests

Parametric Tests
  • • t-tests
  • • ANOVA
  • • Pearson Correlation

Non-parametric Tests

Non-parametric Tests
  • • Mann-Whitney U
  • • Kruskal-Wallis
  • • Spearman Correlation