Our client is a leading credit reporting agency in the US that offers credit bureau and decision analytics services to businesses, along with providing credit reports to individual customers.
The client maintains credit information on over 40 million active US businesses. This data is collected from a variety of public record sources, self-reported information, information from the US census, the phone book, and other commercial entities.
Our objective was to map the source-to-target flow for all the bureau attributes from the small business portfolio. These attributes included several legacy batches and online attributes with an outdated data dictionary. The exercise included finding the data at rest and in motion, refine transformation logic, and update the data dictionary for these attributes.
- The process involved auditing several thousand attributes and their transformations
- Inconsistent data definition and layout changes as data moved across several Mainframe, Unix, RDBMS, and Hadoop systems
- Different load frequency and update schedules across various source systems
- Outdated or missing specification documents – several attributes had an inadequate definition
- Automated scripts to standardize different codes in various languages such as COBOL, JCL, SAS, and SQL
- Processed data iteratively to create a data flow diagram for all major groups of attributes
- Created standard scripting to document attributes using their pseudo-codes
- Worked with business units, corrected and standardized snapshot and trend variables having lower confidence
- Increased accuracy and confidence for hundreds of attributes for downstream analysis and model development
- Established standard documentation for all attributes to make it consistent with the enhanced data governance practice