npx skills add https://github.com/adaptationio/skrillz --skill bedrock-knowledge-basesSKILL.md
Amazon Bedrock Knowledge Bases
Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) solution that handles data ingestion, embedding generation, vector storage, retrieval with reranking, source attribution, and session context management.
Overview
What It Does
Amazon Bedrock Knowledge Bases provides:
- Data Ingestion: Automatically process documents from S3, web, Confluence, SharePoint, Salesforce
- Embedding Generation: Convert text to vectors using foundation models
- Vector Storage: Store embeddings in multiple vector database options
- Retrieval: Semantic and hybrid search with metadata filtering
- Generation: RAG workflows with source attribution
- Session Management: Multi-turn conversations with context
- Chunking Strategies: Fixed, semantic, hierarchical, and custom chunking
When to Use This Skill
Use this skill when you need to:
- Build RAG applications for document Q&A
- Implement semantic search over enterprise knowledge
- Create chatbots with knowledge bases
- Integrate retrieval with Bedrock Agents
- Configure optimal chunking strategies
- Query documents with source attribution
- Manage multi-turn conversations with context
- Optimize RAG performance and cost
Key Capabilities
- Multiple Vector Store Options: OpenSearch, S3 Vectors, Neptune, Pinecone, MongoDB, Redis
- Flexible Data Sources: S3, web crawlers, Confluence, SharePoint, Salesforce
- Advanced Chunking: Fixed-size, semantic, hierarchical, custom Lambda
- Hybrid Search: Combine semantic (vector) and keyword search
- Session Management: Built-in conversation context tracking
- GraphRAG: Relationship-aware retrieval with Neptune Analytics
- Cost Optimization: S3 Vectors for up to 90% storage savings
Quick Start
Basic RAG Workflow
import boto3
import json
# Initialize clients
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')
# 1. Create Knowledge Base
kb_response = bedrock_agent.create_knowledge_base(
name='enterprise-docs-kb',
description='Company documentation knowledge base',
roleArn='arn:aws:iam::123456789012:role/BedrockKBRole',
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
'vectorIndexName': 'bedrock-knowledge-base-index',
'fieldMapping': {
'vectorField': 'bedrock-knowledge-base-default-vector',
'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
'metadataField': 'AMAZON_BEDROCK_METADATA'
}
}
}
)
knowledge_base_id = kb_response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base ID: {knowledge_base_id}")
# 2. Add S3 Data Source
ds_response = bedrock_agent.create_data_source(
knowledgeBaseId=knowledge_base_id,
name='s3-documents',
description='Company documents from S3',
dataSourceConfiguration={
'type': 'S3',
's3Configuration': {
'bucketArn': 'arn:aws:s3:::my-docs-bucket',
'inclusionPrefixes': ['documents/']
}
},
vectorIngestionConfiguration={
'chunkingConfiguration': {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 512,
'overlapPercentage': 20
}
}
}
)
data_source_id = ds_response['dataSource']['dataSourceId']
# 3. Start Ingestion
ingestion_response = bedrock_agent.start_ingestion_job(
knowledgeBaseId=knowledge_base_id,
dataSourceId=data_source_id,
description='Initial document ingestion'
)
print(f"Ingestion Job ID: {ingestion_response['ingestionJob']['ingestionJobId']}")
# 4. Query with Retrieve and Generate
response = bedrock_agent_runtime.retrieve_and_generate(
input={
'text': 'What is our vacation policy?'
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': knowledge_base_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'HYBRID'
}
}
}
}
)
print(f"Answer: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
for reference in citation['retrievedReferences']:
...
Repository
adaptationio/skrillzParent repository
Repository Stats
Stars1
Forks0