This PRD delineates the development and deployment strategy for an AI-driven Compliance Term Creation Tool to transform organizations' process of generating compliance-related terms. This tool leverages cutting-edge AI technologies to address the inefficiencies inherent in manual term creation, offering a rapid and reliable solution focusing on compliance-specific applications.
Background:
Our existing dictionary solution facilitates the addition of terms to an organization's dictionary but needs more capabilities to aid in creating these terms. This gap has been identified as a critical area for enhancement.
Purpose
The objective is to develop an AI-powered tool that streamlines the creation of compliance terms, overcoming the limitations of current manual methodologies, which are time-intensive, inconsistent, and hindered by the volume of data sources available. This tool addresses these challenges by providing a fast, accurate, and reference-rich solution for generating compliance terms.
Scope
The tool will be robust and knowledgeable enough to cover a wide range of industries, with the ability to process diverse digital format sources.
The initial release will focus on enabling users to input a term and receive a detailed definition along with pertinent references.
Future versions can include integration as a plugin for document editors like MSFT Word and Google Docs, further enhancing its functionality to analyze documents and automatically identify and define terms not yet in the organization's compliance dictionary.
Target users
The generator is specifically designed for larger enterprises with established compliance departments. Creating terms can be done by compliance officers, legal advisors or business subject-matter-experts (SME).
The tool will also cater to a broader audience by providing easily understandable definitions for non-compliance professionals. This accessibility broadens its utility across different organizational departments, making it a valuable tool for anyone needing clarity on compliance terms.
The need was validated internally with all mappers, and with AT&T.
Initial Concept and User Interaction
In its initial version, the tool will offer a user-friendly interface where users can input a compliance term and promptly receive a well-defined, accurate definition and references. The focus is on simplicity and efficiency to ensure ease of use.
Future enhancements will include seamless integration with document editing platforms for direct in-document support.
Benefits
Speed
Our (professional) mappers spend fifteen to thirty minutes on a single definition. Our solution spends seconds,
Accuracy
The tool is meant to be a support for humans. Current results with the proof-of-concept are encouragingly accurate. A production version will further benefit from benchmarks and human-in-the-loop reinforcement learning.
Quality of references
The tool consistently will only reference high-relevance sources (NIST, eCFR, PCI,…) and we control this quality of input.
Current sources in our dictionary are often non-existing or of lower quality (wikipedia.com. techsmith.com,…).
Technology
Philosophy:
AI is advancing rapidly, and the progress in tools is unprecedented. We want to maximize existing tools before diving into research to create new tools.
Our approach leverages existing advancements in AI, particularly Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), to process extensive text volumes and generate contextually accurate information.
RAG (Retrieval-Augmented Generation)
Loading: Adding data from where it lives, whether it is PDF, text files, a website, a database or an API into our pipeline
Indexing & storing: Creating a data structure that allows for querying the data. In our proof-of-concept we created vector embeddings to accurately find contextually relevant strings. The index is then stored along with other metadata.
Query/answer: the user can utilize LLMs to submit a query. The LLM then creates a response based on the input from the index.
Evaluation: objective measurement on how accurate, faithful and fast the responses are
Proof-of-concept
The concept has been validated through a proof-of-concept, receiving positive feedback from the entire mapping team, despite limited sources and training.
Technical architecture for the POC
Live link:
The POC can be accessed here.
Cost:
The cost of the POC was 2,000 USD for development, plus some time from Prod/Design.
Development Roadmap
Next steps would entail:
Q1 Q2 2024 Going from a proof-of-concept to a production-ready platform:
Proposed way forward:
Refinement of Retrieval Mechanism: Enhance the AI's ability to accurately source and retrieve relevant data.
Re-visit Architecture
Evaluate vectorstores (Vectorstore/Knowledge Graph)
Set up Semantic Retrieval
Enrichment of Sources: Expand and diversify the data sources to improve the tool's comprehensiveness.
Fine-Tuning for Quality: Optimize the AI algorithms to ensure high-quality, accurate compliance term generation.
Set up evaluation RAG Triad with set of evaluation questions
Iterate with:
Re-rank,
Sentence-window and
Auto-merging Retrieval
Self-evaluation by LLM
Q1 2024 Internal Beta Roll-out:
POC to Production in AWS: Transition the Proof of Concept (POC) into a full-scale production environment on AWS.
Integration with Mapper Team: Provide the tool to the mapper team for human validation, ensuring accuracy and reliability.
Q2/Q3 2024 External Beta Roll-out:
Following successful internal use and quality benchmarks, release the tool externally as a feature of the Dictionary solution.
Competition
The market is rapidly evolving with general genAI tools like Bard and ChatGPT. We focus on developing a niche, compliance-specific solution to stay ahead in this competitive space.
We see niche solutions sprouting up for academic research, legal, etc. In compliance, the amount of new startups is less, but bound to catch up.
Unique Selling Proposition (USP)
Versus the current way of doing things, the manual way, the tool's USP lies in its:
speed,
accuracy, and
the comprehensive nature of the references it provides.
Leveraging advanced genAI and NLP techniques, it promises a significant improvement over traditional methods, offering quick and reliable compliance term definitions. This tool is a time-saver and a step towards more consistent and universally understandable compliance practices.
Versus more general-oriented tools, our solution’s USP lie in:
Niche Expertise Development: Continuously enhancing our AI models with the latest compliance and regulatory knowledge will keep our tool at the forefront of this niche.
Strategic Partnerships: Collaborating with regulatory bodies and compliance experts can improve our tool's capabilities and credibility.
Focused Marketing and Branding: Emphasizing our specialization in compliance term creation in marketing efforts will help distinguish our tool from broad genAI solutions.
GTM (Go-To-Market)
In-app promotion
The Term Generator will be embedded inside Unified Compliance’s platform, and available to all customers. A free test/trial would be available to customers.
Product page
The Term Generator can be embedded on the Dictionary product page and offer new visitors a limited number of free terms to create.
SEO Lead generation
For many compliance terms, there are very limited or no valid search results on Google. We can create a limited amount pages which would feature a compliance term. Potentially this can create inbound traffic with limited effort.
Financials
The AI-based Term Creation tool is projected to be cost-neutral, delivering significant internal cost savings and serving as a valuable solution for customers.
Cost Savings (validated)
UCF Mapping Team Efficiency: The team has created 1,351 new terms year-to-date, spending 20-30 minutes per term, totaling approximately 563 hours.
Hourly Cost Savings: With an internal cost of $60/hour, the tool offers an annual cost saving of $33,775.
Revenue Projections
Pricing Strategy: The tool will be priced at $5/user/month, with additional charges for token credits in case of over-use.
Market Penetration Assumptions:
Existing Customer Base Penetration: Estimated at 50%, with an average of 2 users per account.
Annual Recurring Revenue (ARR): Potential ARR is projected to be $240,000
Cost Projections:
Development and Operational Costs:
NLU Developer: $15,000 for enhancing Natural Language Understanding capabilities.
AWS Migration and UI Refinement: $7,500 for development costs associated with AWS migration and user interface improvements.
Quality Assurance: $2,500 for QA processes.
This financial overview underscores the tool's potential for cost-effectiveness and revenue generation, aligning with our strategic goals of efficiency and market competitiveness.
Conclusion
This PRD outlines the development of an AI-based Term Creation tool tailored for Compliance Analysts, addressing the inefficiencies of manual compliance term creation.
Our tool differentiates itself by focusing on compliance-specific term generation, leveraging advanced genAI and NLP technologies. Targeted at larger enterprises, it promises enhanced accuracy, efficiency, and industry-wide applicability.
Moving forward, our focus can be on continuous technological refinement and market responsiveness to maintain a competitive edge against broader genAI solutions, ensuring our tool remains a specialized, valuable asset in compliance.
Strategically, this smallish genAI solution is a first step, a beachhead in our potential approach to meeting the growing demand for genAI solutions in the compliance industry.
Appendix:
Proof-of-concept
A screenshot of results of the live POC made in October 23.
The POC provided the following:
a short definition
a slightly more complete definition
three questions which the user can ask that help explain the term
references upon which the answer is based
The POC could digest as source information for its compliance knowledge both PDF documents, and websites.
Detail on Development Roadmap
Refinement of Retrieval Mechanism
The main evaluation parameters for RAG:
Precision
Performance of RAG hinges very much on its capability to make sure the context is precisely related to the Query.
Hallucinations
RAG is an effective way to combat ‘hallucinations’ of an LLM. However, even with RAG, problems can occur.
When the LLM does not find relevant information in its added knowledge contexts, it tries to produce an answer, and goes to its pre-existing knowledge from its pre-training phase.
So when context relevance is low, LLMs tend to ‘fill in’ gaps with their ‘general’ knowledge from the training phase. This answer will have low groundnedness even though the answer might seem like a good answer and be contextually relevant.
Precision: Some underlying problems:
Chunk size
Fragmentation
After vector retrieval, we are feeding a bunch of fragmented chunks of information into the LLM context window, and the fragmentation is worse the smaller our chunk size.
Evaluation
Additional evaluation parameters:
Solution approaches
Precision
Reranking
After first semantic search obtained with Dense Retrieval, we add an additional sorting using traditional semantic search for the first set of results.
Sentence-window retrieval
Retrieve not only the sentence found in embedding lookup but also the sentence before and after.
LLM’s work better with larger chunks but vector-based retrieval delivers smaller chunks.
After embedding and before sending the chunks to the LLM, we re-rank the chunks.
Auto-merging retrieval
We create a hierarchy of larger parent nodes with smaller children nodes.
For results in the embedding lookup, the child node will be merged into the parent node if a threshold is exceeded.
“Self-evaluation” by LLM
Examples of LLM evaluations
Result exploration
Evaluation results can be visualized, below is an example of how this can be done with e.g. Trulens.
Evaluation allows to drill down into individual results
And show the feedback from the LLM on the results. This allows to get insights into how changing parameters influence results.
Potential metrics
Accuracy Rate:
Definition Correctness: Percentage of terms where the generated definition accurately reflects the intended meaning.
Reference Relevance: Proportion of contextually relevant references to the generated terms.
Error Rate:
Misinterpretation Frequency: Track the frequency of incorrect interpretations or irrelevant definitions generated.
Inconsistency Detection: Measure instances where the tool provides varying quality across similar requests.
Response Time:
Generation Speed: Monitor the average time to generate a term and its definition, ensuring it meets efficiency standards.
Usage Metrics:
Adoption Rate: Track the number of active users and frequency of use, indicating the tool's perceived value.
Repeat Usage: Measure how often users return to the tool, indicating reliance and satisfaction.
Benchmarking:
Comparison with Manual Processes: Compare the quality of terms generated by the tool against those created manually.
Competitor Comparison: Regularly compare the tool's output quality against similar offerings in the market.