Iris Flower Classification
A classic Machine Learning project predicting Iris flower species using scikit-learn.
Project Overview
The Iris dataset is considered the "Hello World" of machine learning. In this project, we'll build a model to classify iris flowers into three species (Setosa, Versicolor, and Virginica) based on the length and width of their sepals and petals.
Learning Objectives
- Load a built-in dataset from scikit-learn
- Split data into training and testing sets
- Train a Random Forest Classifier
- Evaluate model accuracy and read a confusion matrix
1. Loading the Data
We start by importing the necessary libraries and loading the Iris dataset directly from scikit-learn.
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target
print(f"Dataset shape: {X.shape}")
print(f"Target classes: {iris.target_names}")2. Splitting and Training
Now, we'll reserve 20% of our data for testing our model's performance, and train a Random Forest on the remaining 80%.
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print("Model trained successfully!")3. Evaluation
Finally, we'll evaluate how accurately the model predicts the species of our held-out test set.
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%\n")
print(classification_report(y_test, y_pred, target_names=iris.target_names))Want to try it yourself?
Download the full interactive Jupyter Notebook for this project to run the code and visualize the results.