RFP Answers Program
Automated RFP answering system built in 2016 using NLP and Latent Semantic Indexing to match new questions with historical answers, reducing response time from hours to minutes.
Project Overview
An intelligent RFP (Request for Proposal) answering system that leverages Natural Language Processing and machine learning to automate the tedious process of responding to repetitive RFP questions. Built while working at Oracle Marketing Cloud in 2016, this system analyzed thousands of previously answered questions to suggest relevant answers for new RFPs.
The Problem
The sales engineering team I was on was spending 10+ hours/week per person answering RFPs, often retyping similar answers to slightly different questions. A single RFP could contain 50-200 questions, with 60-80% being variations of previously answered questions. My goal was to figure out how to reduce the time spent answering questions while ensuring we were only re-using answers that were relevant.
The Solution
Using Latent Semantic Indexing (LSI) and TF-IDF vectorization, the system:
- Indexed historical Q&A pairs from past RFPs
- Processed new questions through NLP pipelines
- Found the 3 most semantically similar historical questions
- Suggested answers with confidence scores (0-1 similarity)
Key Features
- Semantic Matching: Moved beyond keywords to identify key phrases, enhancing relevance significantly
- Batch Processing: Processed entire RFPs (100+ questions) in seconds
- Confidence Scoring: Provided similarity scores to indicate match quality
- Web Interface: Flask-based UI for uploading RFPs and reviewing suggestions
Technology Deep Dive
NLP Pipeline
- Text Normalization: URL removal, phrase detection, punctuation handling
- Tokenization: Stopword removal, stemming (Snowball), lemmatization (WordNet)
- Phrase Detection: Bigram/trigram extraction for compound terms
- Vector Space Model: Dictionary creation and corpus representation
Machine Learning
- TF-IDF: Weighted words by importance (rare terms ranked higher)
- LSI: Created 30 latent topics to capture semantic relationships
- Cosine Similarity: Measured document similarity in LSI space
- Gensim: Core similarity modeling and indexing
Architecture
- Backend: Python 2.7, Flask, SQLAlchemy
- NLP: NLTK, Gensim
- Frontend: Bootstrap 3, jQuery, React (partial)
- Database: SQLite
Training Data
Trained on 1,000+ Q&A pairs from Oracle Eloqua Marketing Cloud RFPs, covering:
- Email marketing and automation
- CRM integration
- Campaign management
- Analytics and reporting
- API capabilities
- Security and compliance
Performance
- Accuracy: a top 3 result could be used about ~60% of the time
- Speed: 100 questions processed in ~10 seconds
- Time Savings: Reduced RFP response time from 8-12 hours to 2-3 hours
Example Results
Question: "Can we perform A/B Testing?"
Suggestion 1: "Yes. Oracle Eloqua supports multivariate testing..." (0.876)
Suggestion 2: "Yes. Our platform includes split testing..." (0.743)
Suggestion 3: "A/B testing is available through..." (0.621)
Impact
This system enabled the Oracle Marketing Cloud sales engineering team to:
- Respond to RFPs 4x faster
- Maintain consistency across responses
- Capture institutional knowledge
- Onboard new team members more quickly
Lessons Learned
- Domain-specific training data is crucial for accuracy
- Hybrid approaches (LSI + supervised learning) could further improve results
- User feedback loops are essential for continuous improvement
- Short questions needed special handling for good matches