
Project: Shark incidents • 2000-2024
Tools: Tableau, MySQL (ETL), DBeaver
Description: Independent, pro bono analysis based on the Global Shark Attack File (GSAF) – from importing and cleaning data (ETL in MySQL) to building an interactive dashboard in Tableau.
Objective: Education and context, not fear – emphasizing seasonality, human activity and geography (with clear definitions: fatal vs non-fatal and unprovoked).
Data: Global Shark Attack File (GSAF) • snapshot 2025-09-24 • incidents 2000-2024.
Repository:
See on GitHub
View the reference letter from GSAF
Introduction
Sharks have always sparked strong emotions — and data helps turn those emotions into understanding. This project was created as part of my volunteer collaboration with the Global Shark Attack File (GSAF). My goal was to tell this story in a calm and responsible way — showing that most incidents do not result in death, and that risk depends mainly on the time of year, the type of activity, and the location, rather than on “dangerous species.”
This was not only a data analysis project, but also a process of improving data quality and practicing ethical data visualization. The dashboard was designed to be clear, minimalist, and free from sensationalism.
Data
- Source: Global Shark Attack File (GSAF)
- Time range:akres lat: 2000-2024 (snapshot)
- PoFields included in the dataset: incident date and location, human activity, shark species (if known), outcome (fatal / non-fatal), type (provoked / unprovoked), season, and country.
- Note: The dataset contains historical records that may vary in level of detail or accuracy depending on how and where incidents were reported. The goal of this project was clarity and structure, not sensationalism.
Process
1) Data preparation (MySQL)
- Importing the source file and creating backups.
- Removing duplicates and empty rows, and standardizing text case and formats.
- Parsing dates from different formats (YYYY-MMM-DD, DD-MMM-YY, MMM-YYYY); when the day of the month was missing, a default value of 15 was used.
- Standardizing country and state names (ISO codes, aliases) and cleaning inconsistent location names.
- Creating derived fields such as IsFatal, IsProvoked, ActivityCategory , and SpeciesCategory.
- Performing data quality checks to ensure logical ranges and consistent record counts.
2) Visualization (Tableau)
- KPI: total number of incidents, non-fatal incidents, unprovoked incidents and the country with the highest number of cases.
- Trend chart for 2000–2024 (fatal vs non-fatal vs unknown).
- Share of different activities (e.g., board sports, swimming, fishing).
- Seasonality by month, with filters for country, activity, and incident type.
- Tooltips including definitions and methodology notes.
- The color palette and layout are focused on clarity and balance — without dramatization.
Technologies and tools: MySQL, DBeaver, Tableau, SQL/ETL, methodology documentation.
Results
- Most incidents do not result in death.
- Seasonality: more incidents occur in summer and in areas with high levels of water activity.
- Activity matters: board sports, swimming, and fishing are the most frequently reported activities.
- The fatalityrate did not increase between 2000 and 2024.
- The interactive dashboard makes it easy to explore trends over time, by activity, and by geography.
Reflections
- Data quality first: The most time-consuming part was standardizing locations. Many reports contained descriptive place names (beaches, bays, reefs, regions) that Tableau couldn’t automatically recognize. I applied regex-based rules (spelling normalization, removing abbreviations/special characters, and standardizing the format) to harmonize country, state, and location names, which allowed the map to display correctly.
- Clear definitions prevent misinterpretation: Distinguishing between provoked and unprovoked incidents, as well as fatal and non-fatal cases, is essential for responsible analysis.
- EEthical visualization: Muted colors, clear labels, and avoiding rankings of the “most dangerous sharks” ensure the message remains informative rather than alarmist.
- Acknowledging limitations: Results depend on how incidents are reported in different regions — the numbers should always be interpreted in the appropriate context.

A cookie with your coffee?