The datasets that data professionals use to solve problems typically contain missing values, which must be dealt with in order to achieve clean, useful data. This is particularly crucial in exploratory data analysis (EDA). In this activity, you will learn how to address missing data.
You are a financial data consultant, and an investor has tasked your team with identifying new business opportunities. To help them decide which future companies to invest in, you will provide a list of current businesses valued at more than $1 billion. These are sometimes referred to as "unicorns." Your client will use this information to learn about profitable businesses in general.
The investor has asked you to provide them with the following data:
- Companies in the
hardwareindustry based in eitherBeijing,San Francisco, orLondon - Companies in the
artificial intelligenceindustry based inLondon - A list of the top 20 countries sorted by sum of company valuations in each country, excluding
United States,China,India, andUnited Kingdom - A global valuation map of all countries with companies that joined the list after 2020
- A global valuation map of all countries except
United States,China,India, andUnited Kingdom(a separate map for Europe is also required)
What are some key takeaways that you learned during this lab?
- Missing data is a common problem for data professionals anytime they work with a data sample.
- Addressing missing values is a part of the data-cleaning process and an important step in EDA.
- Address missing values by either removing them or filling them in.
- When considering how to address missing values, keep in mind the business, the data, and the questions to be answered. Always ensure you are not introducing bias into the dataset.
- Addressing the missing values enabled you to answer your investor's questions.