The foremost step in data science journey is to decide the domain of data analysis. Based on the domain that you choose, you must gather relevant data from different sources. Data gathering/collection is a technique to gather information from various sources to pull out significant information.
Here, we will discuss about few techniques of data collection.
1. Collecting survey data -
Here, we will discuss about few techniques of data collection.
1. Collecting survey data -
- Such data can be collected by circulating questionnaire to the audience.
- This becomes handy if you want to limit the scope of analysis. For example, you want to find the age distribution and number of dependents in each household of your society.You can circulate the questionnaire to the residents asking them about their age and number of dependents and then draw an analysis from the data collected
- The drawbacks of this method is that it becomes a tedious process, the audience may not be interested in this drill and it also limits the collection of data
- Other examples of this method are :
- Customers are requested to leave their comments on the food, service and ambiance of the restaurants
- The instructors request the learners to share their feedback in the form of ratings after an online course
- Data can be collected from various open sources like open government data platform | India
- UCI machine learning repository also have large amount of data from various genre
- Many data science enthusiasts also share awesome data sets that are publicly accessible
- Such data sets are called public data sets
- Many a time, data scientists may want to extract some specific data from websites.
- In such cases, they design their own web crawlers to pull out the required data.
- You can check one such crawler here that parses the Quora(leading question and answer site) website and extract data from available profiles
- There are tons and tons of data that is generated and collected in organisations. These data can be used for analysis to derive insights
- Limitations of such data is that they are confidential and the access is restricted to only few people
- Such data sets are called private data sets
Comments
Post a Comment