Skip to main content

Posts

Showing posts from 2018

Chapter III - Data Understanding

Data understanding is one of the key process of CRISP-DM framework. CRISP-DM stands for cross-industry process for data mining. Data understanding helps us to decide whether the data acquired during data collection satisfies the business requirement and is useful for further analysis. Data understanding process can be sub divided into following steps –        1. Describing the data - In this process, we can get the feel of the data by preparing a data description report. This report will consist of description of variables in the data, their data types and so on      2.    Exploring and Verify the data - The variables in the data set can be further analyzed by creating univariate and bivariate plots. These plots will enable us to identify key variables and target variable. We can also get some insights on certain data quality issues while performing the exploratory data analysis. Certain data quality issues can be missing values, vagu...

Chapter II - Data Gathering

The foremost step in data science journey is to decide the domain of data analysis. Based on the domain that you choose, you must gather relevant data from different sources. Data gathering/collection is a technique to gather information from various sources to pull out significant information. Here, we will discuss about few techniques of data collection. 1. Collecting survey data - Such data can be collected by circulating questionnaire to the audience.  This becomes handy if you want to limit the scope of analysis. For example, you want to find the age distribution and number of dependents in each household of your society.You can circulate the questionnaire to the residents asking them about their age and number of dependents and then draw an analysis from the data collected The drawbacks of this method is that it becomes a tedious process, the audience may not be interested in this drill and it also limits the collection of data Other examples of this method are : ...

Chapter I - The Beginning

Data science, in basic understanding, is the place where voluminous data meet insights. At present, roughly around 2.5 Quintilian bytes of data is generated everyday. This data is of little significance until we can extract meaningful information from them. The insights drawn from the data available, can be leveraged to make informed decisions. Image created using wordcloud2 package in R Few varied applications of data science are -  1. Companies analyzing the customer data to understand market preferences 2. Bank drawing insights from customer transactions to understand probability of churn 3. Teams analyzing the players statistics to improve team dynamics 4. Google making use of data science technology to deliver best search results 5. E-commerce websites making use of recommender systems to attract customers 6. Facebook making use of face recognition algorithm to enable 'tag your friend' feature and many more The data available in the ecosyste...