Stata's mishandling of missing data: A problem and two solutions
The design decisions made by Stata in handling missing data in relational and logical expressions have, for the user, complex, pernicious, and poorly understood consequences. This presentation intends to substantiate that claim and to present two possible resolutions to the problem. As is well documented and reasonably well known, Stata considers p & q (and p | q) to be true when both p and q are indeterminate. This interpretation is counterintuitive and at odds with the formal-logic definition of these operators. To assert two unknowns is not to assert truth. Nevertheless, introductions to Stata characteristically present this as merely a “feature†and suggest that the obligation imposed on users (us) to explicitly test for missing data is straightforwardly implementable. Simple cases are indeed simple but, it will be argued, do not readily scale up to complex, real-life instances. For example, the one-line Stata command to implement the intention, "generate v = p|q" becomes "generate v = p|q if !mi(p,q)|(p&!mi(p))|(q&!mi(q))" And so forth. Such coding is a problem, not a feature—so solutions should be sought. One solution (really a work-around) introduces my command, validly, which allows expressions such as "validly generate v = p|q" and correctly, without fuss, interprets the logical or relational operators (here returning true if p is true but q indeterminate and indeterminate if p is false but q indeterminate). More generally, validly serves as a “wrapper†for any standard conditional command. So, for example, "validly reg a b c if p|q" is handled correctly. But validly (its code deploys nested calls to cond()) is computationally expensive. The better resolution would be for Stata, in its next release, to redesign its core code so that logical and relational operators would (as algebraic operators currently do) handle missing data appropriately. (Objections to this strategy are examined and deemed to lack force.) I would like to enlist the informed and active judgment of the participants of the 14th Users Group meeting to help bring this about.
Year of publication: |
2008-09-11
|
---|---|
Authors: | MacDonald, Kenneth I. |
Institutions: | Stata User Group |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Campbell, Lisa M., (2014)
-
Serving sahibs with pony and pen: The discursive uses of 'Native authenticity'
Butz, David, (2001)
-
Use and Valuation: Information in the City
Macdonald, Kenneth I., (2000)
- More ...