Data Theory
Outline of lecture for Proseminar on Data Analysis and Visualization.
Data Theory
A data theory is an abstract representation of the varying aspects of data which classifies all possible data situations according to a few parsimonious concepts.
There are two major aspects of a data theory
- Empirical Design Theory
- Measurement Theory
We discuss these aspects in this lecture.
For most empirical design notions, there is a corresponding measurement theory notion.
Empirical Design Theory
Many of the following definitions are the usual ones from experimental design.
However, ways, modes and categories deserve special attention here.
- Conditions - The identified empirical aspects of the situation.
- Ways - Has the usual experimental design meaning: The number of manipulated experimental conditions. The experimental design entails the simultaneous conjunction of the several manipulated experimental conditions - the ways of the design. The design is the cartesian product of the ways.
- Levels - The number of different values of a way of the design.
- Replications - A special way of the design that is manipulated but which simply repeats the cartesian combination of all other ways of the design.
- Modes - Has a special meaning here - Modes refers to the number of unique ways of the design. This allows for relational data.
- Shape - Data may be square (two-way, one-mode) or rectangular (two-way, two-mode).
- Symmetry/Asymmetry - Square data can be symmetric or asymmetric. Symmetric data are also called triangular data.
- Multivariate Data - These data are rectangular. One mode is ofter subjects, the other variables. The data are often column conditional.
- Completeness - If there are no missing data the data are said to be complete.
- Observation Categories
- All observations are categorical - Don't confuse nominal/discrete/categorical: Nominal refers to a measurement level (see below). Discrete refers to a measurement process (see below). Categorical refers to the basic nature of the data.
- Data parameters - Yes, data parameters! We can think of each observation category as as having a parameter whose value we wish to estimate. This is called the observation's scale value.
We are verging into measurement theory, here.
- The observation category notion is an elegant and parsimonious notion that allows discussion of the measurment characteristics of any data in a particularly simple way (see below).
Measurement Theory
Definitions
- S.S. Stevens
- The assignment of numerals to objects or events according to a rule.
- Warren Torgerson
- The assignment of numbers to perceived attributes of objects or events according to a rule.
- Lyle Jones
- Measurement is the determination of the magnitude of a specified attribute of an object or event in terms of a unit of measurement. Classification (including ordering) is not measurment.
- Norman Cliff
- Good measurement is the assignment of numbers to perceived attributes of objects or events according to rules that
- are easily understood;
- are easily used;
- yield numbers that are as simply related as possible to as many other sets of measurments as possible.
- Forrest Young
- My definition is the same as Norm Cliff's.
Characteristics
- We measure perceived attributes or differences of perceived attributes.
- Basic tenet: All observations are categorical.(Don't confuse discrete/nominal/categorical. To me they all mean different things.)
- Measurment characteristics. There are three basic measurment characteristics (they apply to differences as well as attributes).
- Measurement Level - refers to nature of permissible relationships among observations in different categories (between category restrictions). There are many measurment levels. The most basic are:
- Nominal - No measurment level (between category) restrictions.
- Ordinal - Observations in one category are ordered relative to those in another.
- Numerical - Observations in one category are functionally related to those in another. This includes the familiar interval and ratio levels, as well as less famiar levels such as absolute and log-interval. These various levels differ in terms of the function relating observations between categories.
- Measurement Process - refers to nature of permissible relationships among observations in the same category (within categoriy restrictions). There are two measurment processes:
- Discrete - all observations in a category are represented by the same number.
- Continuous - all observations in a category are represented by an interval of numbers.
- Measurement Conditionality - refers to nature of permissible relationships among observations in sets of categories (between set restrictions). Commonly encountered (though not commonly identified) varieties include:
- Unconditional - All observations are comparable.
- Row/column conditional - Only observations with rows/column of a matrix of data are comparable. Comparable observations form a partition (subset) of the data. Multivariate data are usually column conditional.
- Matrix conditional - Only observations within matrices are comparable. When there are multiple matrices (as when there are multiple times or conditions) this is usually the case.
- Missing Data. Missing data form their own separate subset (partition) of the data. There is only one observation category, so it is not possible to define between category relationships, so no measurement level can be defined. Therefore, missing data are the weakest level of measurement. However, they can have either measurement process.
Correspondence between Empirical and Measurement aspects
The correspondence between the major two aspects of Data Theory is explained in the accompaning tables. (Click here for Adobe Acrobat versions of
Table 1 and
Table 2, which can be read with Adobe Acrobat Reader.
Philosophy of Measurement
Measurement properties are not properties of the data themselves.
- They depend on interaction of the data with a model of the data.
- Empirical information can be obtained about measurment characteristics of the data as it interacts with the model.
- We can analyze the same data with the same model, but with different measurement assumptions.
- If two analyses give the same results, this means that the stricter measurement assumptions are appropriate.
Measurment level is not discrete --- it is continuous.
- Some levels are axiomatizable, some are not.
- If two identical (except for measurment level assumptions) analyses give approximately the same results, the measurement level is in between the two that were used.
To get in touch:
email: forrest@unc.edu
WWW:
http://forrest.psych.unc.edu