Page Comparison

...

Expand

title	Section Explanation. Click to expand.

The intent of this section is for the following:

Scope Definition: defines the scope of the proposed product (or features), including what will and will not be included helping manage expectations and focus development efforts.

Guideline for Development: provides detailed information on the product’s features, functionalities, user flow, and interface to guide the development team in building the product.

Framework: provides high-level evaluation criteria for alternative solutions (build, buy, partner) to evaluate different routes to success.

Report and Analytics on AI Correctness

Expand

title	RequirementsFeatures

Describe the product requirements features that will bring value to customers and fulfill the underserved need(s).

Feature

Comment

Content Pipeline for STIGs

Content Pipeline for eCFR

Monitoring and Logging

Automatic Citation Extraction

Automatic Glossary and Terms Extraction

The hierarchy of zip files must be maintained to ensure follow-on functions have context.

RequirementFeature	Importance	Comments
STIG Pipeline and Goal: All STIGs (approximately 457) including Authority Documents, related Citations and Glossaries with term-definition pairs are available for customer consumption via API from the UC 4.0 API Gateway
Scrape the STIG document library to download all zip files.	High	The zip files are multi-level nested zip files.
Unzip each STIG file to retrieve the XML files.	High	Store the XML files for later use.	High
Identify which documents within the hierarchy are Authority Documents.	High	The zip files may contain readme’s or other files that do not constitute Authority Documents.
Identify which files are Glossary-specific.	High	Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.
Detect file metadata changes from prior processing.	High	Potential metadata changes could be a new document or a new version of an old document.
Only pass new or changed documents further down the pipeline.	High	No need to promote unchanged files. Question on what to do with depreciated documents in questions.
NIST 800-53 Pipeline and Goal: All NIST 800-53 (approximately 36) including Authority Documents, related Citations and Glossaries with term-definition pairs are available for customer consumption via API from the UC 4.0 API Gateway
Access the GitHub repository for all NIST 800-53 content.	High	Retrieve and store the XML, JSON, and YAML files for later use.	High	Identify which documents are Authority Documents.	High
Identify which files are Glossary-specific.	High	Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.
Detect file metadata changes from prior processing.	High	Only pass new or changed documents further down the pipeline.	High
FedRAMP Data Pipeline and Goal: All FedRAMP (approximately 32) including Authority Documents, related Citations and Glossaries with term-definition pairs are available for customer consumption via API from the UC 4.0 API Gateway
Access the GitHub repository for all FedRAMP content.	High	Retrieve and store the XML, JSON, and YAML files for later use.	High	Identify which documents are Authority Documents.	High
Identify which files are Glossary-specific.	High	Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.
Detect file metadata changes from prior processing.	High	Only pass new or changed documents further down the pipeline.	High
eCFR Data Pipeline and Goal: All eCFR content (TBD count) including Authority Documents, related Citations and Glossaries with term-definition pairs are available for customer consumption via API from the UC 4.0 API Gateway
Access the eCFR files via the eCFR APIs.	High	Store files for later use.	High	Identify which documents are Authority Documents.	High
Identify which files are Glossary-specific.	High	Some files may solely be Glossaries with term-definition pair entries. Ensure those documents are also processed for follow-on steps.
Detect file metadata changes from prior processing.	High
Only pass new or changed documents further down the pipeline.	High	No need to pass along files with no changes.
General Pipeline (after initial source-specific tasks are completed, if possible) Reliably extract citations from Authority Documents with at least 80% accuracy where 20% of Citations need to be reworked (e.g., split, merged, rejected …) Reliably extract glossaries from Authority Documents with at least 95% accuracy where only 5% of term-definition pairs need to be reworked. Glossaries are substantially easier to identify and extract than
Catalog each source Authority Document.	High	Gather all information as is required by the Common Data format.
Identify and extract Citations from the Authority Document	Critical	Citations are passages in the Authority Document that: contain Mandates (requirements) OR related contextual information such as stubs, informational, and informational gathering. This requirement is the “brains” in the solution where the entire project’s success hinges on the success of this particular requirement.
Maintain Citation structure.	High	Since Authority Documents will contain multiple Citations and passages may have related Citations, that structure must be maintained to know the relationship between Citations.
Extract Glossary from within Authority Documents.	High	Some Authority Documents may have a glossary within the document. This will typically be near the end of the file. Extract the Glossary details including the title, source, and all term-definition pairs.
Extract Glossary from glossary-specific files.	High	Some files may only have Glossary entries with term definition pairs. Extract the Glossary details including the title, source, and all term-definition pairs.
Detect content changes from prior loads.	TBD	Open questions on extent of this requirement and priority.
Transform the Authority Document into the Common Data Format	High	Use the transformation documentation as reference as to how the source document schema structures are transformed into the Common Data Format.
Transform the Authority Document related Citations into the Common Data Format	High	As above, use transformation document as reference.
Transform the Glossaries into the Common Data Format	High	As above, use transformation document as reference.
Load Authority Documents into the Unified Compliance Platform	High	UCF core engineering team will determine the optimal approach for loading (API, service, …)
Load Citations into the Unified Compliance Platform	High	Same as above
Load Glossaries into the Unified Compliance Platform	High	Same as above
Human Validation via a simple front-end Since this is all back-end pipeline work with no customer interaction (for this release), the user experience needs to be good enough for us as “dog food”.
Allow human experts to view the proposed Citations along with relationships between Citations.	High	The solution needs to inform the user as to the proposed citations by type and the relationship between them but need not be a hierarchical structure. As example, a section in an Authority Document may have an introductory passage that is designated as an informational citation followed by three (3) citations that are identified as mandates/requirements, followed by a final passage that is also informational.
Allow human experts to change Citations.	High	Changes may include, but not limited to the following: splitting suggested citations combining suggested citations changing the citation type (mandate, informational, stub …) adding missing suggested citations removing suggested citations
Allow human experts to approve or reject the entire Authority Document.	High	At any time during the review process, need to be able to approve the document for further steps. OR reject the suggested authority document and citations with reasons such as not an Authority Document too many citations are improperly identified …
Allow human experts to change Glossaries.	High	Changes may include, but not limited to the following: title of the Glossary splitting suggested term-definition pairs combining suggested term-definition pairs adding missing term-definition pairs removing suggested term-definition pairs
Allow human experts to approve or reject the entire Glossary.	High	At any time during the review process, need to be able to approve the glossary for further steps. OR reject the suggested glossary and term-definition pairs with reasons such as not a Glossary too many term-definition pairs are improperly identified. …
Automation and Monitoring
Produce logs for each step in the end-to-end process.	High	In the content capture step, capture the following statistical information: Complete count of documents Count of identified Authority Documents Count of identified Glossaries. List of Authority Documents including name, location, change date, and metadata change value (same, new, updated, or deprecated). List of Glossaries including name, location, change date, and metadata change value (same, new, updated, or deprecated). Count of ADs per metadata change value (same, new, updated, or deprecated). Count of Glossaries per metadata change value (same, new, updated, or deprecated).	High	In the citation extraction step, capture the following statistical information per AD: Total count of citations extracted. Count of citations per type: requirement/mandate, stub, informational, and information gathering.	High
In the human validation step, capture the following statistical information per AD: Total count of ADs approved and rejected. Total count of suggested Citations per AD. Total count of Citations changed and by change type (split, merged, removed …).	Critical	This is critical information to capture since the measurement of the success of the project hinges on the correctness of suggested citations.
In the Glossary extraction step, capture the following statistical information per AD or Glossary: Total count of Glossaries extracted (assume will be 1) Total count of Glossaries approved or rejected. Total count of term-definition pairs extracted. Total count of term-definition pairs changed and by change type (split, merged, removed …).	Critical	This is critical information to capture since the measurement of the success of the project hinges on the correctness of suggested glossaries and term-definition pairs.
In the transformation step, capture the following statistical information: Total count of ADs, Citations, Glossaries, and Term-definition pairs entering the process. Total count of ADs completing the process by status (e.g., success, failure, …).	High	In the load step, capture the following statistical information: Total count of ADs, Citations, Glossaries, and Term-definition pairs entering the process. Total count of ADs completing the process by status (e.g., success, failure, …).	High	Allow for administrators to monitor, start, and stop each step in the end-to-end process including content capture, extraction, transformation, and load.	High

Automatic Citation and Glossary Extraction from Authority Documents	Citations are passages in the Authority Document that: contain Mandates (requirements) OR related contextual information such as stubs, informational, and informational gathering.
Human-in-the-loop Training for Content Extraction AI Models	Specifically for the Citation and Glossary extraction
Automated Compliance Content Ingestion into the Common Data Format	Initially targeted for STIGs, NIST 800-53, FedRAMP, and eCFR
Monitoring and Logging
Metadata and Data Change Detection
Statistical Data Capture and Reporting	For each step in the ingestion pipeline with critical data capture of AI model accuracy.

Versions Compared

Old Version 18

New Version 19

Key