Knowledgebase

Why Do the Number of Records in the Core Files and the Digitized Data Differ? (FAQ)


Background

NCCS has created "Core files" containing 30-100 financial and descriptive variables for every year since FY 1989. For returns received by the IRS since July 1998, NCCS has also been working with GuideStar data to create annual files that are roughly comparable to the Core files but that contain many more data elements.

The Difference: An Example

A researcher wishing to calculate the number of 501(c)(3) public charities and their total public support in calendar year 1999 can draw from either the Core 2000 file or the 1999 Digitized Data Revenue file:

  • Excluding several thousand "out-of-scope" foreign and governmental organizations, the Core file contains more than 236 thousand records and total public support is nearly $166 billion.
  • In version 1 of the digitized data, the researcher would find 206 thousand records accounting for $150 billion in public support, approximately 15% fewer records and 11% less in public support.

Explaining the Differences

Some of the differences in the files are expected and readily explainable. Others are more complex and less fully understood.

The major difference in the design of the files is that the digitized data files contain only records for a single fiscal year while the Core files include data from returns in two earlier years for organizations that lacked a current year but are not known to be defunct.

Each approach is useful. The narrow fiscal year approach used in the Digitized Data facilitates year-to-year comparisons at an organization level. Thus, the median change in public support for arts organizations is readily calculated from the fiscal year files. In contrast, the same fiscal year may be reported in two different Core files for a significant number of organizations so the zero change from year to year merely reflects the fact that the same return was found in two Core files. For example, a 1998 return took the place of the 1999 return in the Core 2000 file for Organization X so there is no change from the Core 1999 to the Core 2000 file for that record.

Which total is most useful for estimating total public support to public charities? The approach used in the Digitized Data leads to some underreporting for recent fiscal years since returns dribble into the IRS several years after a fiscal year has ended and there may be some organizations that either didn't file or where their returns were never entered into our databases for unknown reasons.

On the other hand, looking back to earlier Core files, we see that an average of 47% (with wide variation across the years) of organizations with prior-year returns in the Core files appear to be defunct since they do not appear in any subsequent files. Unfortunately, we cannot know which specific organizations ought to be excluded from a given year's file until several years after its creation. If we adjust all prior year files based on this knowledge, then the current year file will no longer be consistent with prior year's.

Options and Next Steps

NCCS has added a "Latest" field that indicates when the fiscal year for which the most recent digitized data record is available. A researcher wishing to compare Core and Digitized data can use "Latest in ('1999','2000')" instead of "fisyr = '2000'" to produce a data set that is closer to the Core files.

----

For further information, see "Why Do the Number of Records in the Core Files and the Digitized Data Differ When LATEST Is Used? (FAQ)".


Added 10/03/2002 by tpollak, Modified 07/25/2006 by tpollak

Comments

No comments.

Please login to add your own comments.