Project: Lobster Data Analysis and Modeling
This project involved analyzing a dataset of lobsters to derive insights about their physical attributes and predict certain characteristics using machine learning and statistical methods.
Objectives:
-
Data Preprocessing:
- Load and clean the dataset using pandas.
- Convert specific units of measurement (mm to cm, grams to kilograms) to ensure consistency.
- Handle missing data by imputing values with medians and handling zeros in non-appropriate columns.
- Identify and remove outliers using the interquartile range (IQR) method.
-
Descriptive Analysis:
- Explore the distribution of size and weight variables grouped by the sex of the lobsters.
- Visualize the distribution of data using boxplots to compare the lobster sizes and weights across different sexes.
- Analyze the number of outliers for each numerical variable and remove them for further analysis.
-
Clustering:
- Use KMeans clustering to group lobsters based on physical features (length, weight, and spots).
- Perform PCA to reduce the dimensions and visualize the clusters.
- Calculate the silhouette score to evaluate the quality of clustering.
-
Correlation Analysis:
- Use a correlation matrix to analyze relationships between various features in the dataset and visualize it using a heatmap.
-
Weight-Length Relationship:
- Perform a regression analysis to understand the relationship between lobster length and weight.
- Use logarithmic transformations to linearize the data.
- Create a model to predict lobster weight based on its length, with the equation of the form: ( W = aL^b ).
- Visualize the weight-length relationship and compare it to real observed data points.
-
Model Evaluation: