This document outlines the requirements for a tool designed to help Compliance Analysts create new compliance terms within an organizationPRD delineates the development and deployment strategy for an AI-driven Compliance Term Creation Tool to transform organizations' process of generating compliance-related terms. This tool leverages cutting-edge AI technologies to address the inefficiencies inherent in manual term creation, offering a rapid and reliable solution focusing on compliance-specific applications.

Background:

Our current Dictionary solution allows to add a term to an organization’s dictionary but does not help in the creation of these termsexisting dictionary solution facilitates the addition of terms to an organization's dictionary but needs more capabilities to aid in creating these terms. This gap has been identified as a critical area for enhancement.

...

Purpose

The objective is to develop an AI-based Term Creation tool is designed to revolutionize how compliance terms are generated. Current manual methods of creating compliance terms are time-consuming, inconsistent in quality, and challenging due to the sheer volume of sources that can be analyzedpowered tool that streamlines the creation of compliance terms, overcoming the limitations of current manual methodologies, which are time-intensive, inconsistent, and hindered by the volume of data sources available. This tool addresses these challenges by providing a fast, accurate, and reference-rich solution for generating compliance terms.

...

The tool will be robust and knowledgeable enough to cover a wide range of industries, with the ability to process any diverse digital format sources. Its primary function in the initial phase is to allow

The initial release will focus on enabling users to input a term and receive a comprehensive definition and relevant detailed definition along with pertinent references.

Future versions can include integration as a plugin for document editors like MSFT Word and Google Docs, further enhancing its functionality to analyze documents and automatically identify and define terms not yet in the organization's compliance dictionary.

...

The generator is specifically designed for larger enterprises with established compliance departments. These organizations typically face greater complexity in their compliance requirements and will benefit most from the tool.While the primary users of this tool are Compliance Officers, the tool is designed to produce definitions that are easily understandable by non-compliance professionals as wellCreating terms can be done by compliance officers, legal advisors or business subject-matter-experts (SME).

Reach:

The tool will also cater to a broader audience by providing easily understandable definitions for non-compliance professionals. This accessibility broadens its utility across different organizational departments, making it a valuable tool for anyone needing clarity on compliance terms.

Confidence:

The need was validated internally with all mappers, and with AT&T.

...

In its initial version, the tool will offer a user-friendly interface where users can input a compliance term and promptly receive a well-defined, accurate definition and references. The focus is on simplicity and efficiency to ensure ease of use. Looking ahead, the integration into platforms like Google Docs will enable users to retrieve definitions directly within their working documents, enhancing workflow efficiency.

Technology

The development of our solution is driven by the challenges posed by manual term creation methods and the advent of recent advancements in AI technologies, particularly in

Future enhancements will include seamless integration with document editing platforms for direct in-document support.

Benefits

Speed

Our (professional) mappers spend fifteen to thirty minutes on a single definition. Our solution spends seconds,

Accuracy

The tool is meant to be a support for humans. Current results with the proof-of-concept are encouragingly accurate. A production version will further benefit from benchmarks and human-in-the-loop reinforcement learning.

Quality of references

The tool consistently will only reference high-relevance sources (NIST, eCFR, PCI,…) and we control this quality of input.

Current sources in our dictionary are often non-existing or of lower quality (wikipedia.com. techsmith.com,…).

Technology

Philosophy:

AI is advancing rapidly, and the progress in tools is unprecedented. We want to maximize existing tools before diving into research to create new tools.

Our approach leverages existing advancements in AI, particularly Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), to process extensive text volumes and generate contextually accurate information.

RAG (Retrieval-Augmented Generation). These technologies offer unprecedented capabilities in processing large volumes of text and generating accurate, contextually relevant information.

...

Loading: Adding data from where it lives, whether it is PDF, text files, a website, a database or an API into our pipeline
Indexing & storing: Creating a data structure that allows for querying the data. In our proof-of-concept we created vector embeddings to accurately find contextually relevant strings. The index is then stored along with other metadata.
Query/answer: the user can utilize LLMs to submit a query. The LLM then creates a response based on the input from the index.
Evaluation: objective measurement on how accurate, faithful and fast the responses are

Proof-of-concept

The technology and concept was tested in concept has been validated through a proof-of-concept.Feedback , receiving positive feedback from the entire mapping team on the proof-of-concept was unanimously positive, and results are surprisingly accurate, despite limited sources and training.Milestones.

Technical architecture for the POC

...

Live link:

The POC can be accessed here.

Cost:

The cost of the POC was 2,000 USD for development, plus some time from Prod/Design.

Development Roadmap

Next steps would entail:

...

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Proposed way forward:

Refinement of Retrieval Mechanism: Enhance the AI's ability to accurately source and retrieve relevant data.
- Re-visit Architecture
- Evaluate vectorstores (Vectorstore/Knowledge Graph)
- Set up Semantic Retrieval
Enrichment of Sources: Expand and diversify the data sources to improve the tool's comprehensiveness.
Fine-Tuning for Quality: Optimize the AI algorithms to ensure high-quality, accurate compliance term generation.

...

- Set up evaluation RAG Triad with set of evaluation questions
- Iterate with:
  - Re-rank,
  - Sentence-window and
  - Auto-merging Retrieval
- Self-evaluation by LLM

Milestone after 3 months: Internal Beta Roll-out:

POC to Production in AWS: Transition the Proof of Concept (POC) into a full-scale production environment on AWS.
Integration with Mapper Team: Provide the tool to the mapper team for human validation, ensuring accuracy and reliability.

...

Milestone after 4-5 months: External Beta Roll-out:

Following successful internal use and quality benchmarks, release the tool externally as a feature of the Dictionary solution.

Competition

genAI, is a very fast-moving and intensely competitive field. Today, competition for our solution would come from more The market is rapidly evolving with general genAI tools like Bard , ChatGPT, and othersand ChatGPT. We focus on developing a niche, compliance-specific solution to stay ahead in this competitive space.

We see niche solutions sprouting up for academic research, legal, etc. It is undoubtedly only a matter of time before someone brings out an LLM that is trained on compliance termsIn compliance, the amount of new startups is less, but bound to catch up.

Unique Selling Proposition (USP)

...

Niche Expertise Development: Continuously enhancing our AI models with the latest compliance and regulatory knowledge will keep our tool at the forefront of this niche.
Strategic Partnerships: Collaborating with regulatory bodies and compliance experts can improve our tool's capabilities and credibility.
Focused Marketing and Branding: Emphasizing our specialization in compliance term creation in marketing efforts will help distinguish our tool from broad genAI solutions.

GTM (Go-To-Market)

In-app promotion

The Term Generator will be embedded inside Unified Compliance’s platform, and available to all customers. A free test/trial would be available to customers.

Product page

The Term Generator can be embedded on the Dictionary product page and offer new visitors a limited number of free terms to create.

SEO Lead generation

For many compliance terms, there are very limited or no valid search results on Google. We can create a limited amount pages which would feature a compliance term. Potentially this can create inbound traffic with limited effort.

Financials
The AI-based Term Creation tool is projected to be cost-neutral, delivering significant internal cost savings and serving as a valuable solution for customers.

Cost Savings

...

(validated)

UCF Mapping Team Efficiency: The team has created 1,351 new terms year-to-date, spending 20-30 minutes per term, totaling approximately 563 hours.
Hourly Cost Savings: With an internal cost of $60/hour, the tool offers an annual cost saving of $33,775.

Revenue Projections

...

Pricing Strategy: The tool will be priced at $5/user/month, with additional charges for token credits in case of over-use.
Market Penetration Assumptions:
Existing Customer Base Penetration: Estimated at 50%, with an average of 2 users per account.
Annual Recurring Revenue (ARR): Potential ARR is projected to be $240,000
...
Development and Operational Costs:
NLU Developer: $15,000-$25,000 for enhancing Natural Language Understanding capabilities.
AWS Migration and UI Refinement: $7,500 for development costs associated with AWS migration and user interface improvements.
Quality Assurance: $2,500 for QA processes.
...
This PRD outlines the development of an AI-based Term Creation tool tailored for Compliance Analysts, addressing the inefficiencies of manual compliance term creation.
Our tool differentiates itself by focusing on compliance-specific term generation, leveraging advanced genAI and NLP technologies. Targeted at larger enterprises, it promises enhanced accuracy, efficiency, and industry-wide applicability.
Moving forward, our focus can be on continuous technological refinement and market responsiveness to maintain a competitive edge against broader genAI solutions, ensuring our tool remains a specialized, valuable asset in compliance.

Strategically, this smallish genAI solution is a first step, a beachhead in our potential approach to meeting the growing demand for genAI solutions in the compliance industry.
Preliminary RICE evaluation
Reach & Impact
Small w/medium impact to large w/ small impact
Confidence:
Validated customer and internal need
Effort:
Small to Medium

Appendix:

Proof-of-concept

...

The POC could digest as source information for its compliance knowledge both PDF documents, and websites.

Detail on Development Roadmap

Refinement of Retrieval Mechanism

The main evaluation parameters for RAG:

...

Precision

Performance of RAG hinges very much on its capability to make sure the context is precisely related to the Query.

Hallucinations

RAG is an effective way to combat ‘hallucinations’ of an LLM. However, even with RAG, problems can occur.

When the LLM does not find relevant information in its added knowledge contexts, it tries to produce an answer, and goes to its pre-existing knowledge from its pre-training phase.

So when context relevance is low, LLMs tend to ‘fill in’ gaps with their ‘general’ knowledge from the training phase. This answer will have low groundnedness even though the answer might seem like a good answer and be contextually relevant.

Precision: Some underlying problems:

Chunk size

...

Fragmentation

After vector retrieval, we are feeding a bunch of fragmented chunks of information into the LLM context window, and the fragmentation is worse the smaller our chunk size.

...

Evaluation

Additional evaluation parameters:

...

Solution approaches

Precision

Reranking

After first semantic search obtained with Dense Retrieval, we add an additional sorting using traditional semantic search for the first set of results.

Sentence-window retrieval

Retrieve not only the sentence found in embedding lookup but also the sentence before and after.

...

LLM’s work better with larger chunks but vector-based retrieval delivers smaller chunks.

...

After embedding and before sending the chunks to the LLM, we re-rank the chunks.

Auto-merging retrieval

We create a hierarchy of larger parent nodes with smaller children nodes.

For results in the embedding lookup, the child node will be merged into the parent node if a threshold is exceeded.

...

“Self-evaluation” by LLM

...

Examples of LLM evaluations

...

Result exploration

Evaluation results can be visualized, below is an example of how this can be done with e.g. Trulens.

...

Evaluation allows to drill down into individual results

...

And show the feedback from the LLM on the results. This allows to get insights into how changing parameters influence results.

...

Potential metrics

Accuracy Rate:
- Definition Correctness: Percentage of terms where the generated definition accurately reflects the intended meaning.
- Reference Relevance: Proportion of contextually relevant references to the generated terms.
Error Rate:
- Misinterpretation Frequency: Track the frequency of incorrect interpretations or irrelevant definitions generated.
- Inconsistency Detection: Measure instances where the tool provides varying quality across similar requests.
Response Time:
- Generation Speed: Monitor the average time to generate a term and its definition, ensuring it meets efficiency standards.
Usage Metrics:
- Adoption Rate: Track the number of active users and frequency of use, indicating the tool's perceived value.
- Repeat Usage: Measure how often users return to the tool, indicating reliance and satisfaction.
Benchmarking:
- Comparison with Manual Processes: Compare the quality of terms generated by the tool against those created manually.
- Competitor Comparison: Regularly compare the tool's output quality against similar offerings in the market.

User feedback:

Image Added

Versions Compared

Old Version 2

New Version Current

Key

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Milestone after 3 months: Internal Beta Roll-out:

Milestone after 4-5 months: External Beta Roll-out:

Cost Savings

(validated)

Revenue Projections

Appendix:

Proof-of-concept

Detail on Development Roadmap

Precision

Hallucinations

Precision: Some underlying problems:

Fragmentation

Evaluation

Solution approaches

Precision

Reranking

Sentence-window retrieval

Auto-merging retrieval

“Self-evaluation” by LLM

Examples of LLM evaluations

Result exploration

Potential metrics

User feedback:

Page Comparison

Versions Compared

Old Version 2

New Version Current

Key

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Milestone after 3 months: Internal Beta Roll-out:

Milestone after 4-5 months: External Beta Roll-out:

Cost Savings

(validated)

Revenue Projections

Appendix:

Proof-of-concept

Detail on Development Roadmap

Precision

Hallucinations

Precision: Some underlying problems:

Fragmentation

Evaluation

Solution approaches

Precision

Reranking

Sentence-window retrieval

Auto-merging retrieval

“Self-evaluation” by LLM

Examples of LLM evaluations

Result exploration

Potential metrics

User feedback: