Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleThe Problem

What problem are we trying to solve? and why it important to our customers and/or to Unified Compliance?

We currently rely on a team of expert mappers to meticulously add content into the UCF. The process works well but is slow. With the advent of automation and AI, Unified Compliance risks attacks from competitors who will use technology to accelerate content acquisition.

We risk losing customers to other platforms if we fall behind on the extent of coverage.

We will also find it difficult to take on new market segments without automation.

...

Expand
titleGoals

What does success look like? What metrics can we effect and why it is important to affect those metrics?

Goal

Metric

Why Important?

Reliably extract citations from Authority Documents

>= 80% accuracy where 20% of Citations need to be reworked (e.g., split, merged, rejected …)

If there is poor accuracy requiring extensive human correction, then there is little value.

Reliably extract glossaries from Authority Documents

100% accuracy>= 95% accuracy where only 5% of term-definition pairs need to be reworked. Glossaries are substantially easier to identify and extract than citations.

If there is poor accuracy requiring extensive human correction, then there is little value.

Reliably automate the end-to-end process of capturing, transforming, and loading STIG, NIST 800-53, FedRAMP, eCFR compliance content into the Unified Compliance platform.

100% of all four compliance content contributor sources is loaded into the UCF.

All four Authority Documents sources related to securing and hardening IT infrastructure for both the private and public sector.

To provide value to customers with Security Operation's requirements, UC needs to maximize the breadth of IT security coverage to ensure we can provide security guidance for as many IT assets as possible.

Automate an end-to-end process to capture all STIG content.

All 457 STIGs, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

All Citations as part of the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

All Glossaries with term-definition pairs as they related to the 457 STIGs New Customers

New Markets

Preparation for follow-on AI projects

Scope and Requirements

Expand
titleSection Explanation. Click to expand.

The intent of this section is for the following:

Scope Definition: defines the scope of the proposed product (or features), including what will and will not be included helping manage expectations and focus development efforts.

Guideline for Development: provides detailed information on the product’s features, functionalities, user flow, and interface to guide the development team in building the product.

Framework: provides high-level evaluation criteria for alternative solutions (build, buy, partner) to evaluate different routes to success.

Expand
titleRequirements

Describe the product requirements that will fulfill the underserved need(s).

Expand
titleRequirements

Describe the product requirements that will fulfill the underserved need(s).

All FedRAMP

Requirement

Importance

Comments

STIG Pipeline and Goal:

  • All 457 STIGs, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

STIGs sit at the intersection of Sec Ops and GRC. Organizations need to harden their security posture with DoD approved security measures that are in alignment with the software and hardware vendors.

IT departments will utilize a variety of software and hardware in their data centers. UC needs to maximize the breadth of STIG coverage to ensure we can provide security guidance for as many IT assets as possible.

Automate an end-to-end process to capture all NIST 800-53 content (approximately 36 files with a mixture of json, yaml, and xml documents), perform ETL, and load into the UCF in common data format.

All NIST-800-53 content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway

All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway

NIST 800-53 helps IT departments implement proper security controls to proactively take care of their organization's infrastructure.

As is the case with STIGs, the broader the coverage, will assist IT and security departments secure their security assets.

Automate an end-to-end process to capture all FedRAMP content (approximately 32 files with a mixture of json, yaml, and xml documents), perform ETL, and load into the UCF in common data format.

  • All Citations as part of the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

  • All Glossaries with term-definition pairs as they related to the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

Scrape the STIG document library to download all zip files.

High

The zip files are multi-level nested zip files.

Unzip each STIG file to retrieve the XML files.

High

Store the XML files for later use.

High

The hierarchy of zip files must be maintained to ensure follow-on functions have context.

Identify which documents within the hierarchy are Authority Documents.

High

The zip files may contain readme’s or other files that do not constitute Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Potential metadata changes could be a new document or a new version of an old document.

Only pass new or changed documents further down the pipeline.

High

No need to promote unchanged files.

Question on what to do with depreciated documents in questions.

NIST 800-53 Pipeline and Goal:

  • All NIST-800-53 content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

  • All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway

  • All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway

FedRAMP is a government-wide program that promotes the adoption of secure cloud services across the federal government by providing a standardized approach to security and risk assessment for cloud technologies and federal agencies.

UC can assist federal agencies or organizations working with federal agencies to grow and use secure cloud technologies.

eCFR

New Customers

New Markets

Preparation for follow-on AI projects

Scope and Requirements

Expand
titleSection Explanation. Click to expand.

The intent of this section is for the following:

Scope Definition: defines the scope of the proposed product (or features), including what will and will not be included helping manage expectations and focus development efforts.

Guideline for Development: provides detailed information on the product’s features, functionalities, user flow, and interface to guide the development team in building the product.

Framework: provides high-level evaluation criteria for alternative solutions (build, buy, partner) to evaluate different routes to success.

The hierarchy of zip files must be maintained to ensure follow-on functions have context.

FedRAMP Data Pipeline and Goal:

All FedRAMP content, as Authority Documents, pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API GatewayGeneral Pipeline (after initial source-specific tasks are completed, if possible)High and informational gathering

Requirement

Importance

Comments

STIG Pipeline and Goal:

  • All 457 STIGs, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

  • All Citations as part of the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

  • All Glossaries with term-definition pairs as they related to the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway

Scrape the STIG document library to download all zip files.

High

The zip files are multi-level nested zip files.

Unzip each STIG file to retrieve the XML files.

High

Store the XML files for later use.

High

Identify which documents within the hierarchy are Authority Documents.

High

The zip files may contain readme’s or other files that do not constitute Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Potential metadata changes could be a new document or a new version of an old document.

Only pass new or changed documents further down the pipeline.

High

No need to promote unchanged files.

Question on what to do with depreciated documents in questions.

NIST 800-53 Pipeline and Goal:

  • All NIST-800-53 content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

  • All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway

  • All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway

Access the GitHub repository for all NIST 800-53 content.

High

Retrieve and store the XML, JSON, and YAML files for later use.

High

Identify which documents are Authority Documents.

High

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Only pass new or changed documents further down the pipeline.

High

Access the GitHub repository for all NIST 800-53 content.

High

Retrieve and store the XML, JSON, and YAML files for later use.

High

Identify which documents are Authority Documents.

High

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Only pass new or changed documents further down the pipeline.

High

FedRAMP Data Pipeline and Goal:

  • All FedRAMP content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway

  • All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API GatewayAll Citations as part of

  • All Glossaries with term-definition pairs as they related to the NIST-800-53 documents content are available for customer consumption via API from the UC 4.0 API GatewayAll

Access the GitHub repository for all FedRAMP content.

High

Retrieve and store the XML, JSON, and YAML files for later use.

High

Identify which documents are Authority Documents.

High

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition

Access the GitHub repository for all FedRAMP content.

Retrieve and store the XML, JSON, and YAML pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Only pass new or changed documents further down the pipeline.

High

eCFR Data Pipeline and Goal:

  • a

  • b

  • c

Access the eCFR files via the eCFR APIs.

High

Store files for later use.

High

Identify which documents are Authority Documents.

High

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

High

Only pass new or changed documents further down the pipeline.

eCFR Data Pipeline and Goal:

  • a

  • b

  • c

Access the eCFR files via the eCFR APIs.

Store files for later use.

Identify which documents are Authority Documents.

Identify which files are Glossary-specific.

High

Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.

Detect file metadata changes from prior processing.

Only pass new or changed documents further down the pipeline.

No need to pass along files with no changes.

or changed documents further down the pipeline.

High

No need to pass along files with no changes.

General Pipeline (after initial source-specific tasks are completed, if possible)

  • Reliably extract citations from Authority Documents with at least 80% accuracy where 20% of Citations need to be reworked (e.g., split, merged, rejected …)

  • Reliably extract glossaries from Authority Documents with at least 95% accuracy where only 5% of term-definition pairs need to be reworked. Glossaries are substantially easier to identify and extract than

Catalog each source Authority Document.

High

Gather all information as is required by the Common Data format.

Identify and extract Citations from the Authority Document

Critical

Citations are passages in the Authority Document that:

  1. contain Mandates (requirements) OR

  2. related contextual information such as stubs, informational,

  1. and informational gathering.

This requirement is the “brains” in the solution where the entire project’s success hinges on the success of this particular requirement.

Maintain Citation structure.

High

Since Authority Documents will contain multiple Citations and passages may have related Citations, that structure must be maintained to know the relationship between Citations.

Extract Glossary from within Authority Documents.

High

Some Authority Documents may have glossary within the document. This will typically be near the end of the file.

Extract the Glossary details including the Title, source, and all term-definition pairs.

Extract Glossary from glossary-specific files.

High

Some files may only have Glossary entries with term definition pairs.

Extract the Glossary details including the Title, source, and all term-definition pairs.

Detect content changes from prior loads.

High

Nee discussion here. See questions sectionquestions section.

Transform the Authority Document into the Common Data Format

High

Use transformation documentation as reference as to how the source document schema structures are transformed into the Common Data Format.

Transform the Authority Document related Citations into the Common Data Format

High

As above, use the CDF transformation document as reference.

Transform the Authority Document Glossaries into the Common Data Format

High

Use transformation documentation as reference as to how the source document schema structures are transformed into the Common Data Format.

Transform the Authority Document related Citations into the Common Data Format

High

As above, use the CDF transformation document as reference.

Transform the Glossaries into the Common Data Format

High

As above, use the CDF transformation document as reference.

Load Authority Documents into the Unified Compliance Platform

High

UCF engineering team will determine the optimal approach for loading (API, service, …)

Load Citations into the Unified Compliance Platform

High

Same as above

Load Glossaries into the Unified Compliance Platform

High

Same as above

Human Validation via a simple front-end

Since this is all back-end pipeline work with no customer interaction, the user experience needs to be good enough for us as “dog food”.

Allow human experts to view each Citation and relationships between Citations.

High

Allow human experts to approve or reject Citations.

High

Rejection reasons could include Citations that were improperly identified (e.g., it is not a mandate and/or does not include contextual information).As above, use the CDF transformation document as reference.

Load Authority Documents into the Unified Compliance Platform

High

UCF engineering team will determine the optimal approach for loading (API, service, …)

Load Citations into the Unified Compliance Platform

High

Same as above

Load Glossaries into the Unified Compliance Platform

High

Same as above

Human Validation via a simple front-end

Since this is all back-end pipeline work with no customer interaction (for this release), the user experience needs to be good enough for us as “dog food”.

Allow human experts to view the proposed Citation along with relationships between Citations.

High

This need not be a hierarchical structure. The solution needs to inform the user as to the proposed citations by type and the relationship between them.

As example, a section in an Authority Document may have an introductory passage that is designated as an informational citation followed by three (3) citations that are identified as mandates/requirements, followed by a final passage that is also informational.

Allow human experts to change Citations.

High

Changes need to include:

  1. splitting suggested citations

  2. combining suggested citations

  3. changing the citation type (mandate, informational, stub …)

  4. adding missing suggested citations

  5. removing suggested citations

Allow human experts to approve or reject the entire Authority Document.

HighRejection reasons could be if the entire document was extracted improperly (e.g., is not an authority document, no changes,

Once all the citations have been reviewed and updated, need to be able to approve the document for further steps

OR

reject the suggested authority document and citations with reasons such as

  1. not an Authority Document

  2. too many citations are improperly identified

)

Allow human experts to change Citations.

High

What might they do here?

Combine passages?

Split out passages into other citations?

Automation and Monitoring

Produce logs for each step in the end-to-end process.

High

In the content capture step, capture the following statistical information:

  1. complete Complete count of documents

  2. count Count of identified Authority Documents

  3. count Count of identified Glossaries.

  4. list List of Authority Documents including name, location, change date, and metadata change value : (same, new, updated, or deprecated).

  5. list List of Glossaries including name, location, change date, and metadata change value : (same, new, updated, or deprecated).

  6. count Count of ADs per metadata change value : (same, new, updated, or deprecated).count

  7. Count of Glossaries per metadata change value : same, new, updated, or deprecated.

In the citation extraction
  1. (same, new, updated, or deprecated).

High

In the citation extraction step, capture the following statistical information per AD:

  1. Total count of citations extracted.

  2. Count of citations per type: requirement/mandate, stub, informational, and information gathering.

High

In the human validation step, capture the following statistical information per AD:

  • Total count of citations extracted.

  • Count of citations per type: requirement/mandate, stub, informational, and information gathering

    1. Total count of ADs approved and rejected.

    2. Total count of suggested Citations per AD.

    3. Total count of Citations changed and by change type (split, merged, removed …).

    Critical

    This is critical information to capture since the measurement of the success of the project hinges on the correctness of suggested citations.

    In the Glossary extraction step, capture the following statistical information per AD or Glossary:

    1. Total count of Glossaries extracted (assume will be 1)

    2. Total count of term-definition pairs extracted.

    In the human validation step, capture the following statistical information per AD:

    1. Total count of ADs approved.

    2. Total count of ADs rejected.will be 1)

    3. Total count of Citations Glossaries approved or rejected.

    4. Total count of Citations rejectedterm-definition pairs extracted.

    5. Total count of Citations term-definition pairs changed and by change type (split, merged, removed …).

    Critical

    This is critical information to capture since the measurement of the success of the project hinges on the correctness of suggested glossaries and term-definition pairs.

    In the transformation step, capture the following statistical information:

    1. Total count of ADs count of ADs, Citations, Glossaries, and Term-definition pairs entering the process.

    2. Total count of ADs completing the process by TBD status (e.g., success vs. , failure, …).

    High

    In the load step, capture the following statistical information:

    1. Total count of Citations completing the process by TBD status (e.g., success vs. failure)ADs, Citations, Glossaries, and Term-definition pairs entering the process.

    2. Total count of Glossaries ADs completing the process by TBD status (e.g., success vs. , failure, …).

    High

    Allow for administrative administrators to monitor, start, and stop each step in the end-to-end process including :

    Content capture

    content capture, extraction, transformation, and load.

    High

    Expand
    titleOut of Scope / Future Functionality

    List the known features that are out of scope for this project or might be revisited at a later time.

    As is case with the assumptions, it is important to list these out so that architects and engineers can plan accordingly for these later updates.

    Requirement

    Comments

    Tagging and Mapping STIGs, NIST, FedRAMP, or eCFR of content to the Common Controls.

    This project ends at the AD, Citation and Glossary extraction, transformation, and load.

    Follow-on projects will include the tagging and mapping.

    Human validation of the content capture.

    Later projects can include additional human validation. To get limit scope of this project out quickly, steps such as metadata change detection can be reviewed and validated after the fact looking at logs and other information.

    Human validation of the AD cataloging.

    Same as above

    Human validation of the transformation into the common data format

    Same as above

    Human validation of loading into the UCF

    Same as above

    Human validation of the transformation into the common data format

    Same as above

    Human validation of loading into the UCF

    Same as above

    No Corpora

    No data lake

    Productization of the Corpora

    A non-production version of a corpora exists. Any work on the corpora is out of scope for this product.

    Loading of data into a corpus, data lake, or any other target other than UC 4.0

    The focus of this PRD is to load compliance content into the UCF 4.0 application for customer consumption over the API.

    Follow-on projects can tap into the pipeline and use the content for other purposes.

    Expand
    titleUser Interaction and Design

    Link to mockups, prototypes, or screenshots related to the requirements.

    ...

    Expand
    titleOpen Questions

    List any open questions that come to mind throughout the lifecycle of this initiative.

    Question

    Answer

    Date Answered

    What do we do with deprecated authority documents?

    For STIGs, how do we identify which files are authority documents?

    For NIST 800-53, how do we identify which files are authority documents?

    For FedRAMP, how do we identify which files are authority documents?

    For eCFRs, how do we identify which files are authority documents?

    Specifically, what is required to catalog an AD?

    In this first pass, what should constitute content changes?

    We don’t want to get too crazy and make this a massive project.

    Need to discuss.How do we identify a citation that includes a mandate or related contextual information such as stubs, informational, and informational gathering?

    Expand
    titleAlternative Solutions

    Provide a high-level evaluation criterion for alternative solutions (build, buy, partner) to evaluate different routes to success.

    ...