Statistics
Probability theory and statistical inference for data science
Statistical Foundations
Statistics provides the theoretical foundation for understanding data distributions, making inferences, and quantifying uncertainty in data science and machine learning.
Core Statistical Concepts
Descriptive Statistics

- Mean, Median, Mode
- Variance, Standard Deviation
- Quartiles, Percentiles
Inferential Statistics

- Hypothesis Testing
- Confidence Intervals
- P-values
Probability Distributions
Continuous Distributions

- • Normal Distribution
- • Student's t-Distribution
- • Chi-Square Distribution
- • F-Distribution
Discrete Distributions

- • Binomial Distribution
- • Poisson Distribution
- • Geometric Distribution
- • Negative Binomial
Statistical Analysis with Python
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
# Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)
# Descriptive statistics
mean = np.mean(data)
std = np.std(data)
quartiles = np.percentile(data, [25, 50, 75])
# Hypothesis testing
# One-sample t-test
t_stat, p_value = stats.ttest_1samp(data, popmean=0)
# Confidence interval
ci = stats.t.interval(alpha=0.95, df=len(data)-1,
loc=mean, scale=stats.sem(data))
# Visualization
plt.figure(figsize=(12, 6))
# Histogram with KDE
sns.histplot(data=data, kde=True)
plt.title('Distribution of Data')
plt.axvline(mean, color='r', linestyle='--', label='Mean')
plt.axvline(ci[0], color='g', linestyle=':', label='95% CI')
plt.axvline(ci[1], color='g', linestyle=':')
plt.legend()
plt.show()
print(f"Mean: {mean:.2f}")
print(f"Standard Deviation: {std:.2f}")
print(f"95% Confidence Interval: ({ci[0]:.2f}, {ci[1]:.2f})")
print(f"P-value: {p_value:.4f}")
Common Statistical Tests
Parametric Tests

- • t-tests
- • ANOVA
- • Pearson Correlation
Non-parametric Tests

- • Mann-Whitney U
- • Kruskal-Wallis
- • Spearman Correlation