Monday 7 April 2014

What do data scientists do?

In general terms, Data Scientists gets data and convert it in to information/predicts results.

Data Scientists collects data from real world(generally BigData from internet), process that data and convert it in to dataset that can be analyzed. They analyze this dataset based on statistical models or machine learning and create results/reports which can be useful for data driven products(even for general public).

Step by step process they do with data :
  • Define the question
  • Define the ideal dataset
  • Determine what data you can access
  • Obtain the data
    • Reading data - Excel, XML,JSON, Web,..
    • Merging data
  • Clean the data
    • Reshaping data
    • Summarizing data
  • Exploratory data analysis
    • Graphs
    • Plotting systems
    • Clustering
  • Statistical prediction/modeling
    • Extracting generalization information from data
  • Interpret results
  • Challenge results
  • Synthesize/Write up results
  • Create reproducible code
    • Completely reproduce all the documents such that you can communicate it with other people
  • Distribute results to other people

No comments:

Post a Comment