Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases is a fully managed RAG (Retrieval-Augmented Generation) solution that handles data ingestion, embedding generation, vector storage, retrieval with reranking, source attribution, and session context management.

Overview

What It Does

Amazon Bedrock Knowledge Bases provides:

Data Ingestion: Automatically process documents from S3, web, Confluence, SharePoint, Salesforce
Embedding Generation: Convert text to vectors using foundation models
Vector Storage: Store embeddings in multiple vector database options
Retrieval: Semantic and hybrid search with metadata filtering
Generation: RAG workflows with source attribution
Session Management: Multi-turn conversations with context
Chunking Strategies: Fixed, semantic, hierarchical, and custom chunking

When to Use This Skill

Use this skill when you need to:

Build RAG applications for document Q&A
Implement semantic search over enterprise knowledge
Create chatbots with knowledge bases
Integrate retrieval with Bedrock Agents
Configure optimal chunking strategies
Query documents with source attribution
Manage multi-turn conversations with context
Optimize RAG performance and cost

Key Capabilities

Multiple Vector Store Options: OpenSearch, S3 Vectors, Neptune, Pinecone, MongoDB, Redis
Flexible Data Sources: S3, web crawlers, Confluence, SharePoint, Salesforce
Advanced Chunking: Fixed-size, semantic, hierarchical, custom Lambda
Hybrid Search: Combine semantic (vector) and keyword search
Session Management: Built-in conversation context tracking
GraphRAG: Relationship-aware retrieval with Neptune Analytics
Cost Optimization: S3 Vectors for up to 90% storage savings

Quick Start

Basic RAG Workflow

import boto3
import json

# Initialize clients
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

# 1. Create Knowledge Base
kb_response = bedrock_agent.create_knowledge_base(
    name='enterprise-docs-kb',
    description='Company documentation knowledge base',
    roleArn='arn:aws:iam::123456789012:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:123456789012:collection/kb-collection',
            'vectorIndexName': 'bedrock-knowledge-base-index',
            'fieldMapping': {
                'vectorField': 'bedrock-knowledge-base-default-vector',
                'textField': 'AMAZON_BEDROCK_TEXT_CHUNK',
                'metadataField': 'AMAZON_BEDROCK_METADATA'
            }
        }
    }
)

knowledge_base_id = kb_response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base ID: {knowledge_base_id}")

# 2. Add S3 Data Source
ds_response = bedrock_agent.create_data_source(
    knowledgeBaseId=knowledge_base_id,
    name='s3-documents',
    description='Company documents from S3',
    dataSourceConfiguration={
        'type': 'S3',
        's3Configuration': {
            'bucketArn': 'arn:aws:s3:::my-docs-bucket',
            'inclusionPrefixes': ['documents/']
        }
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 512,
                'overlapPercentage': 20
            }
        }
    }
)

data_source_id = ds_response['dataSource']['dataSourceId']

# 3. Start Ingestion
ingestion_response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    description='Initial document ingestion'
)

print(f"Ingestion Job ID: {ingestion_response['ingestionJob']['ingestionJobId']}")

# 4. Query with Retrieve and Generate
response = bedrock_agent_runtime.retrieve_and_generate(
    input={
        'text': 'What is our vacation policy?'
    },
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': knowledge_base_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'overrideSearchType': 'HYBRID'
                }
            }
        }
    }
)

print(f"Answer: {response['output']['text']}")
print(f"\nSources:")
for citation in response['citations']:
    for reference in citation['retrievedReferences']:
  

...

bedrock-knowledge-bases

SKILL.md