Understanding Data Sciences purpose

Please watch the video in the link above. It will help to explain why this course is useful and how data science is a unique blend of computer science, data analytics, and real world knowledge. Data science has become applicable in many different fields and we want to explore why and how they are connected.

  • Data Science is more than a combination of statistics and computing.
  • In data science you dont understand programming for developing apps and games. We use programming to understand the world around us.
  • Learn Machine learning, statistics, and computing
  • It is an overview of the data8 program
  • I am confused on which program I should focus on as we have multiple including Microsoft, Cal, datacamp, etc.

There are many other videos which explore data science on the UC Berkeley Division of Computing Data Science and Society Instead of just watching the video, please practice using the tools and applications described in the videos. Write down what the tools are used for in a blog post, and screenshot an application that you create using the tool.

  • I have to carve out some time to do this
  • Data camp has been my focus

After working with one student and reviewing the day one content, here are my take-aways from that day.

Understanding what GitHub is and how it can be useful is challenging and confusing. When thinking about structuring a data science project where do you store your information and how can you access that information in the future? What is the difference between a repository and a GitHub page? Which part of Github hold the code, tha data, and the actual blogging?

  • I understand GitHub now from a general perspecitve now but I need to get into understanding it from a more micro perspective

What are GitHub pages used for?

  • It is a HTML, CSS, and Javascript files that are pulled from the site to create a site for your blog or website.

Why isn’t my project description specific enough? How do I define the question I am looking to answer? Your project description was not specific enough. I understand that you are working with NBA data and you want to understand how individual players get better. In order to create a data science project around this topic, you need to be a bit more specific about the time period and your definition of improvement. After talking through your problem statement we looked at a dataset containing season averages and found three columns which when significantly high would indicate the player had the ‘best’ season. You need to explain why these fields indicate that the season is the best, and what the fields are tracking. You then explained that finding complementary data to that season could help explain why the season was so good. (additional players, improved practice regimen, etc?) You need to find a dataset and match that dataset to the player for that year. How would you do that? This is something to think through and explain in your first blog post.

  • I want to challenge the assumption that Steph Curry is the best shooter of all time. There are 8 players that have had the best shootings in league history. 3 players were point guards including Steph, Marc Price, and Steve Nash. I will look at which one of them are the best shooter by finding ways to look in depth at the determining factors.

How can I ensure that the data I have will help answer my questions? You need to contextualize your dataset.

  • This is still something Im starting to understand

A few take-aways from today -

Github Pages are a unique feature which we need to demonstrate provide specific value Project description - this needs to be data-driven. It is useful for the student to describe the project to another student so that the listening student can pick holes in the project description. When a student talks through the project out loud it becomes clear which parts are unclear. This could be a daily zoom exercise. Students seem to have a general understanding of available datasets and think that all of the relevant data would be useful for the project even though only a small subset of the data is.

  • I think I have figured this out.

Exploring CVS files and looking at what the columns mean will have a significant impact.

  • I understand what CSV files are and why they are important in data science.

Skills Acquired

Create a technical blog

  • This is dope and I know how :)

Designing a Data Science project

  • This has been a process but I am here now

Sourcing credible datasets

  • I have found numerous data sets. Now it is time for the next phase

MetaData to Collect

  • A dataset that gives you data on another dataset.

User Account - Portfolio Link Skills Model - Add these skills One by One, this is a cool feature I think I will prototype

  • I am not sure what these two mean yet because I have not gotten into MetaData