PRD - Content Ingestion Automation - ETL

Introduction

The product requirements document (PRD) is a central document used to align all stakeholders (product management, engineering, QA, designers, and leadership) on how we will solve a specific problem with the proposed solution.

When creating the PRD, provide just as much information as needed and nothing more. If the document is too long and complex, it will quickly become outdated, and readers will lose interest.

Strategic Planning and Decision Making

Describe the problem we are solving, the high-level approach, and goals so that before we get too far into the details, readers will have a good understand of where we are headed.

What problem are we trying to solve?


We currently rely on a team of expert mappers to meticulously add content into the UCF. The process works well but is slow. With the advent of automation and AI, Unified Compliance risks attacks from competitors who will use technology to accelerate content acquisition.

Customers are not able keep up to date with compliance requirements in the face of the quickly evolving best practices, implementation guide, new segments, etc.

We will also find it difficult to take on new market segments without automation.

Briefly describe the approach you’re taking to solve this problem. Provide enough information for the reader to imagine possible solution directions and get a rough sense of the scope of this proposal.


The approach is to start with the “left-hand” side of the end-to-end content ingestion process focusing on compliance content through ETL for an identified list of four (4) compliance content providers.

For each content provider, Authority Documents will be identified, then Citations and Glossaries will be extracted from the Authority Documents, transformed into the Common Data Format specification, and loaded into the UC platform.

Automation tools and AI will be used to accelerate the end-to-end process with human assistance to review and approve critical steps in the process focusing on reviewing and updating AI suggestions.

Once the process is proven out, the intent is to extend the solution to many additional compliance content providers.

What does success look like? What metrics do we measure today that we can affect? What metrics should we absolutely add? Why it is important to affect those metrics?

Goal

Metric

Why Important?

Goal

Metric

Why Important?

Reliably extract citations from Authority Documents

When AI is not needed, 100% accuracy.

When AI is utilized, greater than 80% accuracy where 20% of Citations need to be reworked (e.g., split, merged, rejected …)

If there is poor accuracy requiring extensive human correction, then there is little value.

Reliably extract glossaries from Authority Documents

When AI is not needed, 100% accuracy.

When AI is utilized, greater than 80% accuracy where only 20% of term-definition pairs need to be reworked. Glossaries are substantially easier to identify and extract than citations.

If there is poor accuracy requiring extensive human correction, then there is little value.

Reliably automate the end-to-end process of capturing, transforming, and loading STIG, NIST 800-53, FedRAMP, eCFR compliance content into the Unified Compliance platform.

100% of identified Authority Documents for all four compliance content contributor sources is loaded into the UCF.

All four Authority Documents sources are related to securing and hardening IT infrastructure for both the private and public sector.

To provide value to customers with Security Operation's requirements, UC needs to maximize the breadth of security coverage to ensure we can provide security guidance for as many IT assets as possible.

Ingested compliance content, including Authority Documents, Citations, and Glossaries, are available for access via the UC 4.0 API Gateway

100% of identified Authority Documents for all four compliance content contributor sources are available for access via the UCF 4.0 API Gateway

We are in the migration phase from CCH to UC 4.0. To ensure we don’t elongate the migration process, all new content must come into UC 4.0 and out the API Gateway.

Reliably catalog Authority Documents, track versions, and detect changes

100% of all identified Authority Documents from the four (4) source sites are automatically cataloged with 0 documents moving further into the pipeline if no metadata changes are detected.

Before context is extracted, the Authority Documents must be inventoried, cataloged, and only reprocessed if changes are detected to reduce expensive AI processing resources.

Scope and Features

Milestones and Launch Checklist

Additional References