Data for Artificial Intelligence and Machine Learning - Part 1
Data for Machine
Learning
Raw Data
- Raw data is a term used to describe data that is collected and stored, but not yet been processed.
- Data is often defined as facts or numbers, but it can also be non-numeric and nonfactual.
- It can be quantitative or qualitative data.
- Raw Data can be collected from Sensors, User interactions with machine machine interfaces (e.g.,
- Mobile devices, touch screens, keyboards, websites etc.), Journal entries, event recordings, references information, medical diagnosis, recordings of human observations, experiments, communication, market behaviour, business transactions, social media and so on.)
- Raw data is sometimes called source data or atomic data.
Raw Data – Basic
classifications
Fundamentally-2 classes
– Primary and Secondary - Both primary and secondary data may be quantitative and qualitative.
Primary data consists of a collection of original data collected by
the researcher. It is often undertaken after the researcher has gained some
insight into the problem by reviewing existing research work or by analysing
previously collected primary data.
Secondary Data
Data that has previously been collected (primary
data) by
someone other than the current user. Secondary data is often used in social and economic
analysis,
especially when access to
primary data is unavailable. Secondary data is the data that have been already
collected and readily available from other sources.
Data Stores, Warehouse
and Marts
Actually, only a small fraction of the data that are captured, processed and stored is
used by decision makers. The data store, warehouse and marts provide facilities
to integrate the data generated by an un-integrated environment. A data warehouse organizes and stores all
available data for further analytical process, information extraction and store
its historical perspectives.
Metadata
Metadata
is data that describes the real data. Metadata describes the characteristics of
data such as what are the identities of data, storage location details, where
they originated from and how they can be accessed. In order to provide easy
access, it is necessary to maintain a form of data directory with data
information about data. Metadata are abstractions from data or high level data
that provide concise description. It forms an important component of DW
environment.
Consider
the following three different groups of data.
1. 23489,
33765, 27668.
2. Auto
magazine reports that vehicle sales has dipped by 25% in India during 2019
first half.
3. Sales
figures of leading manufactures have dipped, Maruti 20%, Hyundai 30%, Mahindra
25%.
The
first group of data conveys no information. The second group is more
descriptive text and straightforward. The third one is concise and contains
metadata.
Consider
the case of book manufactures, a sample example of metadata is follows:
·
Author, title, ISBN
number, publisher name
·
Headlines of stories
·
Definitions of data
elements in a DW.
·
Road maps
·
etc
References:
1. Wikipedia
2. Raw is an Oxymoron, L.Gitelman, MIT Press, 2013
3. Collecting Primary Data A time saving guide, Helen Kara, Policy Press, 2013
Comments
Post a Comment