Olympics Data Analysis

Project ID: 22

About Us

Victoria Roh (wroh@nd.edu) & Brendan Ye (bye@nd.edu)

Olympics Data Analysis - Our Journey

Project Topic Choice and Exploration Questions

Motivation for Choosing Thie Topic: The primary motivation for this project was to explore the intricate relationship between various socio-economic and geographical factors of countries and their Olympic performances. By understanding these relationships, we can gain insights into how different elements might influence a nation's success in the Olympics.

Reason for Choosing This Topic: The intersection between sports performance and socio-economic/geographical factors is a fascinating area of study that has not been extensively explored in popular media. This topic offers a unique perspective on the Olympics, going beyond just the sports statistics and delving into how a country's characteristics might influence its athletes' performances.

Discussion Questions:

1. What is the relationship between a country's olympics performance and socioeconomic factors?
2. Is there a correlation between olympics performance and economic metrics such as GDP, CPI, and umemployment rate?
3. Is there a correlation between olympics performance and geographic data such as forested area, land area, and agricultural land area?
4. Is there a correlation between olympics performance and demographic data such as birth rate, life expectancy, and maternal mortality ratio? 5. Which measures are the most accurate predictors of a country's olympics performance?
6. What can countries do to improve their performance in the olympics?
7. Which countries have the best/worst performance in the olympics relative to their socioeconomic circumstances?

Data Source

https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023

The primary data sources for this project were:

Olympic Results Dataset: Contains detailed information about Olympic events, participants, and outcomes.
World Data 2023 Dataset: Offers comprehensive socio-economic and geographical data for countries worldwide.

These datasets were chosen because they provide a rich and diverse set of variables that can be used to analyze Olympic performance. The Olympic Results Dataset brings in the sports-specific data, while the World Data 2023 Dataset provides a broader context by including various country-specific factors.

Potential Users and Their Reasons

The project's findings would be valuable for several groups:

Sports Analysts and Commentators: They could use this analysis to deepen their discussions about the factors contributing to countries' Olympic successes.

Policy Makers and Sports Administrators: Understanding these relationships could help in formulating policies to improve their countries' sports performance.

Academic Researchers: Researchers in the fields of sports science, sociology, or economics might use this analysis as a basis for further studies.

General Public and Sports Enthusiasts: For anyone interested in the Olympics, this analysis provides an engaging way to understand the dynamics behind Olympic success.

Data Source Description and Data-to-Information Transformation

The data sources mentioned above were extensive and detailed. The transformation from data to information involved several key steps:

Data Cleaning and Preparation: This involved handling missing values, ensuring consistent data formats, and merging datasets based on common columns like country names.

Data Analysis and Aggregation: We conducted various analyses, such as grouping data by country and calculating aggregate metrics like total medals won, average rank positions, and categorizing data into bins or quartiles for better visualization.

Visualization: By employing different types of visualizations (scatter plots, bar plots, box plots, line plots, and histograms), we transformed raw data into insightful information that reveals patterns and relationships between different variables and Olympic performance.

Each step in this process was crucial in converting raw data into meaningful information that could be easily interpreted and understood by the target audience, ultimately fulfilling the project's objective to explore the factors influencing Olympic success.

Challenges of the Project

The most challenging aspect of the Olympic data analysis project was undoubtedly the data cleaning and merging process. Here's why:

Data Inconsistency: The datasets from different sources often had inconsistencies in how countries were named or coded. Aligning them correctly to ensure accurate merging required meticulous attention.

Handling Missing or Incomplete Data: Both datasets contained missing or incomplete entries. Deciding how to handle these (whether to fill in missing values, drop them, or find alternate sources) was a complex task that required careful consideration to avoid skewing the results.

Complexity of Data: The datasets were extensive and rich with various types of data. Extracting the relevant information and transforming it into a format suitable for analysis while maintaining data integrity was a significant challenge.

Rewarding Aspects of the Project

The most rewarding aspect of this project was the ability to derive meaningful insights and narratives from complex datasets. Specifically:

Discovery of Patterns and Trends: Unveiling potential correlations between socio-economic/geographical factors and Olympic performance was deeply satisfying. These insights could provide valuable perspectives for sports analysts, policymakers, and the general public.

Visualization Impact: Seeing the raw, complex data transformed into clear, understandable, and visually appealing charts and graphs was highly gratifying. It made the data accessible to a broader audience, not just those with a technical background.

Potential for Real-World Impact: The project's findings could have implications in sports science, policy-making, and even education, providing a sense of contributing to broader societal understanding and development.