Requirement | Importance | Comments |
---|
STIG Pipeline and Goal: All 457 STIGs, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway All Citations as part of the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway All Glossaries with term-definition pairs as they related to the 457 STIGs are available for customer consumption via API from the UC 4.0 API Gateway
|
Scrape the STIG document library to download all zip files. | High | The zip files are multi-level nested zip files. |
Unzip each STIG file to retrieve the XML files. | High | |
Store the XML files for later use. | High | The hierarchy of zip files must be maintained to ensure follow-on functions have context. |
Identify which documents within the hierarchy are Authority Documents. | High | The zip files may contain readme’s or other files that do not constitute Authority Documents. |
Identify which files are Glossary-specific. | High | Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps. |
Detect file metadata changes from prior processing. | HighOnly | pass new Potential metadata changes could be a new document or a new version of an old document. |
Only pass new or changed documents further down the pipeline. | High |
FedRAMP Data Pipeline |
Access the GitHub repository for all FedRAMP content. | Retrieve and store the XML, JSON, and YAML files for later use. | Identify which documents are Authority Documents. | Identify which files are Glossary-specific. | High | Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps. |
Detect file metadata changes from prior processing. | Only pass new or changed documents further down the pipeline. | eCFR Data Pipeline |
Access the eCFR files via the eCFR APIs. | Store files for later use.No need to promote unchanged files. Question on what to do with depreciated documents in questions. |
NIST 800-53 Pipeline and Goal: All NIST-800-53 content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway
|
Access the GitHub repository for all NIST 800-53 content. | High | |
Retrieve and store the XML, JSON, and YAML files for later use. | High | |
Identify which documents are Authority Documents. | High | |
Identify which files are Glossary-specific. | High | Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps. |
Detect file metadata changes from prior processing. | High | |
Only pass new or changed documents further down the pipeline. |
General Pipeline (after initial source-specific tasks are completed, if possible) |
Catalog each source Authority Document. | High | Gather all information as is required by the Common Data format. |
Identify and extract Citations from the Authority Document | High | Citations are passages in the Authority Document that: contain Mandates (requirements) OR related contextual information such as stubs, informational, and informational gathering.
|
Maintain Citation structure. | High | Since Authority Documents will contain multiple Citations and passages may have related Citations, that structure must be maintained to know the relationship between Citations. |
Extract Glossary from within Authority Documents. | High | Some Authority Documents may have glossary within the document. This will typically be near the end of the file. Extract the Glossary details including the Title, source, and all term-definition pairs. |
Extract Glossary from glossary-specific filesHigh | |
FedRAMP Data Pipeline and Goal: All FedRAMP content, as Authority Documents, are available for customer consumption via API from the UC 4.0 API Gateway All Citations as part of the NIST-800-53 documents are available for customer consumption via API from the UC 4.0 API Gateway All Glossaries with term-definition pairs as they related to the NIST-800-53 content are available for customer consumption via API from the UC 4.0 API Gateway
|
Access the GitHub repository for all FedRAMP content. | | |
Retrieve and store the XML, JSON, and YAML files for later use. | | |
Identify which documents are Authority Documents. | | |
Identify which files are Glossary-specific. | High | Allow human experts to view each Citation and relationships between Citations. | High | Allow human experts to reject a Citation. | This could be for one single citation of many that the process got “wrong”. | Allow human experts to reject the entire Authority Document. | No need to reject citation by citation, if the entire document wasn’t extracted properly. | Allow human experts to approve all or individual Citations. | Allow human experts to change Citations. | Some files may only have Glossary entries solely be Glossaries with term-definition pairs. Extract the Glossary details including the Title, source, and all term-definition pairs. |
Detech content changes from prior loads. | High | Nee discussion here. See questions section. |
Transform the Authority Document into the Common Data Format | High | The transformation documentation must be used as reference as to how source document schema structures are related to the Common Data Format. |
Transform the Authority Document related Citations into the Common Data Format | High | As above, use the CDF transformation document as reference. |
Transform the Glossaries into the Common Data Format | High | As above, use the CDF transformation document as reference. |
Load Authority Documents into the Unified Compliance Platform | High | UCF engineering team will determine the optimal approach for loading (API, service, …) |
Load Citations into the Unified Compliance Platform | High | Same as above |
Load Glossaries into the Unified Compliance Platform | High | Same as above |
Human Validation via a simple front-end |
Since this is all back-end pipeline work with no customer interaction, the user experience needs to be good enough for us to the “dog good” but won’t be exposed to customers or partners in this release. |
What might they do here?pair entries. Ensure those documents are also processed for follow-on steps. |
Detect file metadata changes from prior processing. | | |
Only pass new or changed documents further down the pipeline. | | |
eCFR Data Pipeline and Goal: |
Access the eCFR files via the eCFR APIs. | | |
Store files for later use. | | |
Identify which documents are Authority Documents. | | |
Identify which files are Glossary-specific. | High | Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps. |
Detect file metadata changes from prior processing. | | |
Only pass new or changed documents further down the pipeline. | | No need to pass along files with no changes. |
General Pipeline (after initial source-specific tasks are completed, if possible) |
Catalog each source Authority Document. | High | Gather all information as is required by the Common Data format. |
Identify and extract Citations from the Authority Document | High | Citations are passages in the Authority Document that: contain Mandates (requirements) OR related contextual information such as stubs, informational, and informational gathering.
|
Maintain Citation structure. | High | Since Authority Documents will contain multiple Citations and passages may have related Citations, that structure must be maintained to know the relationship between Citations. |
Extract Glossary from within Authority Documents. | High | Some Authority Documents may have glossary within the document. This will typically be near the end of the file. Extract the Glossary details including the Title, source, and all term-definition pairs. |
Extract Glossary from glossary-specific files. | High | Some files may only have Glossary entries with term definition pairs. Extract the Glossary details including the Title, source, and all term-definition pairs. |
Detect content changes from prior loads. | High | Nee discussion here. See questions section. |
Transform the Authority Document into the Common Data Format | High | Use transformation documentation as reference as to how the source document schema structures are transformed into the Common Data Format. |
Transform the Authority Document related Citations into the Common Data Format | High | As above, use the CDF transformation document as reference. |
Transform the Glossaries into the Common Data Format | High | As above, use the CDF transformation document as reference. |
Load Authority Documents into the Unified Compliance Platform | High | UCF engineering team will determine the optimal approach for loading (API, service, …) |
Load Citations into the Unified Compliance Platform | High | Same as above |
Load Glossaries into the Unified Compliance Platform | High | Same as above |
Human Validation via a simple front-end |
Since this is all back-end pipeline work with no customer interaction, the user experience needs to be good enough for us as “dog food”. |
Allow human experts to view each Citation and relationships between Citations. | High | |
Allow human experts to approve or reject Citations. | High | Rejection reasons could include Citations that were improperly identified (e.g., it is not a mandate and/or does not include contextual information). |
Allow human experts to approve or reject the entire Authority Document. | High | Rejection reasons could be if the entire document was extracted improperly (e.g., is not an authority document, no changes, too many citations are improperly identified …) |
Allow human experts to change Citations. | High | What might they do here? Combine passages? Split out passages into other citations? |
Automation and Monitoring |
Produce logs for each step in the end-to-end process. | | |
In the content capture step, capture the following statistical information: complete count of documents count of identified Authority Documents count of identified Glossaries. list of Authority Documents including name, location, change date, and metadata change value: same, new, updated, or deprecated. list of Glossaries including name, location, change date, and metadata change value: same, new, updated, or deprecated. count of ADs per metadata change value: same, new, updated, or deprecated. count of Glossaries per metadata change value: same, new, updated, or deprecated.
| | |
In the citation extraction step, capture the following statistical information per AD: Total count of citations extracted. Count of citations per type: requirement/mandate, stub, informational, and information gathering.
| | |
In the Glossary extraction step, capture the following statistical information per AD or Glossary: Total count of Glossaries extracted (assume will be 1) Total count of term-definition pairs extracted.
| | |
In the human validation step, capture the following statistical information per AD: Total count of ADs approved. Total count of ADs rejected. Total count of Citations approved. Total count of Citations rejected. Total count of Citations changed and by change type (split, merged, removed …).
| | |
In the transformation step, capture the following statistical information: Total count of ADs entering the process. Total count of ADs completing the process by TBD status (e.g., success vs. failure) Total count of Citations completing the process by TBD status (e.g., success vs. failure) Total count of Glossaries completing the process by TBD status (e.g., success vs. failure)
| | |
Allow for administrative monitor, start, and stop each step in the end to end process including: Content capture
| | |
| | |
| | |