Skip to main content

Chapter I - The Beginning

Data science, in basic understanding, is the place where voluminous data meet insights. At present, roughly around 2.5 Quintilian bytes of data is generated everyday. This data is of little significance until we can extract meaningful information from them. The insights drawn from the data available, can be leveraged to make informed decisions.


Image created using wordcloud2 package in R


Few varied applications of data science are - 
1. Companies analyzing the customer data to understand market preferences
2. Bank drawing insights from customer transactions to understand probability of churn
3. Teams analyzing the players statistics to improve team dynamics
4. Google making use of data science technology to deliver best search results
5. E-commerce websites making use of recommender systems to attract customers
6. Facebook making use of face recognition algorithm to enable 'tag your friend' feature and many more

The data available in the ecosystem can be in structured or unstructured form. Unstructured/Unorganized data can be in the form of text and multimedia content. E-mail messages, word documents, audio files and web pages are some of the examples. Structured data, on the other hand, are organised data that can be easily read. Examples of structured data includes data stored in databases, excel files and csv files.


Analyzing data is a comprehensive process. One must dive deep into the data pool to extract interesting and useful insights. Data processing is an extensive and an iterative process which can lead us to meaningful interpretations.


Steps that are commonly followed during data analysis are -

1. Data gathering
2. Data understanding
3. Data cleaning
4. Exploratory Data analysis to draw insights
5. Machine Learning (advance analytics)


P.S - We will discuss all the steps in detail in upcoming blogs

Comments

  1. Good start Snigdha :) Love to visit your blog now and then.. please continue to post often :)

    ReplyDelete
  2. Good Work Snigdha & Prachi.. look forward to more interesting articles.

    ReplyDelete

Post a Comment

Popular posts from this blog

Chapter III - Data Understanding

Data understanding is one of the key process of CRISP-DM framework. CRISP-DM stands for cross-industry process for data mining. Data understanding helps us to decide whether the data acquired during data collection satisfies the business requirement and is useful for further analysis. Data understanding process can be sub divided into following steps –        1. Describing the data - In this process, we can get the feel of the data by preparing a data description report. This report will consist of description of variables in the data, their data types and so on      2.    Exploring and Verify the data - The variables in the data set can be further analyzed by creating univariate and bivariate plots. These plots will enable us to identify key variables and target variable. We can also get some insights on certain data quality issues while performing the exploratory data analysis. Certain data quality issues can be missing values, vagu...

Chapter II - Data Gathering

The foremost step in data science journey is to decide the domain of data analysis. Based on the domain that you choose, you must gather relevant data from different sources. Data gathering/collection is a technique to gather information from various sources to pull out significant information. Here, we will discuss about few techniques of data collection. 1. Collecting survey data - Such data can be collected by circulating questionnaire to the audience.  This becomes handy if you want to limit the scope of analysis. For example, you want to find the age distribution and number of dependents in each household of your society.You can circulate the questionnaire to the residents asking them about their age and number of dependents and then draw an analysis from the data collected The drawbacks of this method is that it becomes a tedious process, the audience may not be interested in this drill and it also limits the collection of data Other examples of this method are : ...