Let’s get Organized!
Organizing a data science project
Many students approach data science projects with an idea that all of the data in the world is available and could potentially be used in one data science project. If you sit with this thought process for a moment it doesn’t make sense but for some reason many students feel this way. I think that this thought process stems from how foreign actual data sets are from end users. It is important to start a data science project with a particular data set in mind. Data science is nothing more than using data to infer something about the world, if there is no data colleted about a certain subject than clearly there can not be a data science project. As you think about structuring your data science project here are a few things to think about:
What data is available?
- We have now discovered that there is tons of NBA data that is easily accessiable. Trying to figure out what is useful is my next task.
What does that data look like?
- The data is in a format that is good in regards to extracting into an excel type file but I have been unable to find the x,y axis. I will find it!
How assumptions can we make about that data? The assumptions that I have about the data is that I can determine point guard shooting performance easily by looking at the data during the playoffs. I also know that is is well organized and ready for me to use.
How additional data might be useful to validate our assumptions? I think it will validate who is the best point guard shooter of all time.
How can we turn our assumptions into a model? We can develop an algo that extracts the data and puts it into our platform similar to what the NBA does but for our athletes we will be able to use our own systems.