Knowledgebase

Why Do the Number of Records in the Core Files and the Digitized Data Differ when LATEST is used? (FAQ)


The differences between the Core and GuideStar-NCCS National Nonprofit Database (hereafter called simply "the Digitized Data" or DD) depend on whether you use the DD for a specific fiscal year (in which case the number will be smaller than the Core) or if you combine two or three years (we recommend two) and take only the LATEST return. (The standard SAS or DataWeb query for 2003 would be: LATEST='Y' AND FISYR IN ('2002','2003')

If you use the LATEST returns for a two-year period in the DD, the totals SHOULD be close to the Core numbers. However, differences remain.

Different Origins

The Dig. Data and Core files are created through largely separate processes, which account for different results. The IRS, NCCS, and GuideStar have all tried hard to build in strong quality controls but the processes are still not perfect.

The returns are scanned as imaged by the IRS when they are first received by the IRS Service Center in Ogden Utah. From there, the images are sent in monthly shipments to NCCS and GuideStar. GuideStar transcribes the data, NCCS receives it, makes corrections to financial and other variables, adds some new fields, and then creates the Digitized Data.

The Core Files are created by NCCS by combining the Return Transaction Files (which are transcribed by IRS staff after returns have left the imaging department) with a series of IRS Business Master Files.

Sources of Difference

When we looked at this years ago, we couldn't find a single reason why they are different. A combination of factors contributes to the differences:

- Core files are fixed in time. We receive the Return Transaction File for a given year in January of each year from the IRS. We cull out the latest returns and add in returns from prior Core files for organizations that are likely to still be alive. However, we do NOT add new returns to the file after it is created although there are likely to be a small number of organizations that are, for a variety of reasons, VERY late in filing their returns. By pulling returns that are up to two years old into the Core file if a current year is missing, we should be able to capture the vast majority of these late filers. However, the return for a new organization (with no previous returns available) that filed late might be contained in the DD but not in the Core file.

- Zipcode errors by filers, by the IRS, or by GuideStar. An incorrect zipcode, for example, could put an organization in a different county.

- Misclassification of an organization's IRS subsection code. An organization, for example, might inadvertently identify itself as a 501(c)(3) while the IRS granted it exemption as a 501(c)(8). Until recently, that would have led the organization to be included in the DD but not in the Core Public Charity file.

- Errors entering financial data -- not relevant to your initial analysis -- also occur.

- Two organizations may have put the same EIN on their returns, either due to a typo or due to confusion about whether, for example, a "child" organization has the same EIN as its parent. The data underlying the Core files may exclude one return while the Digitized excludes the other.

- Some returns simply appear to be missing from one or the other file for reasons we cannot explain.

-----

For further information, see "Why Do the Number of Records in the Core Files and the Digitized Data Differ? (FAQ)".


Added 07/25/2006 by tpollak, Modified 07/25/2006 by tpollak

Comments

No comments.

Please login to add your own comments.