Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleVision and Initiative Alignment

How does this proposal fit into our overall vision and which specific initiative does this proposal align with and how?

The UC Strategic Plan for 2024 has two foci:

  1. content and

  2. the sale of that content

This project squarely fit fits into the focus of bringing in additional content.

Content Ingestion Automation - ETL is a critical aspect of the initiative to “Partner with 3rd Party to Develop Automated Content Mapping”.

The automated content mapping is the complete end-to-end content capture, ETL, and mapping to the common controls. This particular product proposal is the “left-hand” side from capture to ETL.

Expand
titleThe Problem

What problem are we trying to solve? and why it important to our customers and/or to Unified Compliance?

We currently rely on a team of expert mappers to meticulously add content into the UCF. The process works well but is slow. With the advent of automation and AI, Unified Compliance risks attacks from competitors who use technology to accelerate content acquisition.

We risk losing customers to other platforms if we fall behind on the extent of coverage.

We will also find it difficult to take on new market segments without automation.

Expand
titleHigh-level Approach

Briefly describe the approach you’re taking to solve this problem. Provide enough information for the reader to imagine possible solution directions and get a rough sense of the scope of this proposal.

The approach is to start with “left-hand” side of the automation process where compliance content is captured from a small set of sourcescompliance content providers.

Authority Documents, Citations, and Glossaries are will be extracted from those documentsthe content providers, transformed into the Common Data Format specification, and loaded into the UC platform. Automation tools and AI will be used to accelerate the end-to-end process with human assistance to review and approve most steps in the processin the process.

Once the process proven out, the intent is that it will be extended to many additional compliance content providers.

Expand
titleGoals

What does success look like? What metrics can we effect and why it is important to affect those metrics?

Goal

Metric

Why Important?

Automate an end-to-end process to capture all STIG content (approximately 457 documents), perform ETL, and load into the UCF in common data format.

All 457 STIGs, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

All Citations as part of the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

All Glossaries with term-definition pairs as they related to the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

STIGs sit at the intersection of Sec Ops and GRC. Organizations need to harden their security posture with DoD approved security measures that are in alignment with the software and hardware vendors.

IT departments will utilize a variety of software and hardware in their data centers. UC needs to maximize the breadth of STIG coverage to ensure we can match provide security guidance for as many IT assets as possible.

Automate an end-to-end process to capture all NIST 800-53 content (approximately 36 files with a mixture of json, yaml, and xml documents), perform ETL, and load into the UCF in common data format.

All NIST-800-53 content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway

All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway

NIST 800-53 helps IT departments implement proper security controls to proactively take care of their organization's infrastructure.

As is the case with STIGs, the broader the coverage, will assist IT and security departments secure their security assets.

Automate an end-to-end process to capture all FedRAMP content (approximately 32 files with a mixture of json, yaml, and xml documents), perform ETL, and load into the UCF in common data format.

All FedRAMP content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway

All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway

FedRAMP is a government-wide program that promotes the adoption of secure cloud services across the federal government by providing a standardized approach to security and risk assessment for cloud technologies and federal agencies.

UC can assist federal agencies or organizations working with federal agencies to grow and use secure cloud technologies.

eCFR

New Customers

New Markets

...

Expand
titleRequirements

Describe the product requirements that will fulfill the underserved need(s) starting off with the use cases, then specific functionality.

Requirement

Importance

Comments

STIG Pipeline

Scrape the STIG document library to download all zip files.

High

The zip files are multi-level nested zip files.

Unzip each STIG file to retrieve the XML files.

High

Store the XML files for later use.

High

The hierarchy of zip files must be maintained to ensure follow-on functions have context.

Identify which documents within the hierarchy are Authority Documents.

High

The zip files may contain readme’s or other files that do not constitute Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Potential metadata changes could be a new document or a new version of an old document.

Only pass new or changed documents further down the pipeline.

High

NIST 800-53 Pipeline

Access the GitHub repository for all NIST 800-53 content.

High

Retrieve and store the XML, JSON, and YAML files for later use.

High

Identify which documents are Authority Documents.

High

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Only pass new or changed documents further down the pipeline.

High

FedRAMP Data Pipeline

Access the GitHub repository for all FedRAMP content.

Retrieve and store the XML, JSON, and YAML files for later use.

Identify which documents are Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

Only pass new or changed documents further down the pipeline.

eCFR Data Pipeline

Access the eCFR files via the eCFR APIs.

Store files for later use.

Identify which documents are Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

Only pass new or changed documents further down the pipeline.

General Pipeline (after initial source-specific tasks are completed, if possible)

Catalog each source Authority Document.

High

Gather all information as is required by the Common Data format.

Identify and extract Citations from the Authority Document

High

Citations are passages in the Authority Document that:

  1. contain Mandates (requirements) OR

  2. related contextual information such as stubs, informational, and informational gathering.

Maintain Citation structure.

High

Since Authority Documents will contain multiple Citations and passages may have related Citations, that structure must be maintained to know the relationship between Citations.

Extract Glossary from within Authority Documents.

High

Some Authority Documents may have glossary within the document. This will typically be near the end of the file.

Extract the Glossary details including the Title, source, and all term-definition pairs.

Extract Glossary from glossary-specific files.

High

Some files may only have Glossary entries with term definition pairs.

Extract the Glossary details including the Title, source, and all term-definition pairs.

Detech content changes from prior loads.

High

Nee discussion here. See questions section.

Transform the Authority Document into the Common Data Format

High

The transformation documentation must be used as reference as to how source document schema structures are related to the Common Data Format.

Transform the Authority Document related Citations into the Common Data Format

High

As above, use the CDF transformation document as reference.

Transform the Glossaries into the Common Data Format

High

As above, use the CDF transformation document as reference.

Load Authority Documents into the Unified Compliance Platform

High

UCF engineering team will determine the optimal approach for loading (API, service, …)

Load Citations into the Unified Compliance Platform

High

Same as above

Load Glossaries into the Unified Compliance Platform

High

Same as above

Human Validation via a simple front-end

Since this is all back-end pipeline work with no customer interaction, the user experience needs to be good enough for us to the “dog good” but won’t be exposed to customers or partners in this release.

Allow human experts to view each Citation and relationships between Citations.

High

Allow human experts to reject a Citation.

This could be for one single citation of many that the process got “wrong”.

Allow human experts to reject the entire Authority Document.

No need to reject citation by citation, if the entire document wasn’t extracted properly.

Allow human experts to approve all or individual Citations.

Allow human experts to change Citations.

What might they do here?

...