Skip to Content

Knowledge Base Setup

Overview

Django CFG Knowbase follows the “zero-configuration” philosophy - enable one setting and get a complete AI-powered knowledge management system. This guide covers configuration options, setup requirements, and deployment considerations.

Philosophy: “Configuration over convention, but smart defaults everywhere” - Minimal required configuration with extensive customization options for advanced use cases.

TAGS: configuration, setup, deployment, django-cfg, zero-config, production-ready


Modules

@django_cfg.core.config/KnowbaseConfig

Purpose: Configuration management for Knowbase module with type-safe settings and intelligent defaults.

Dependencies:

  • django_cfg.core.config.DjangoConfig - base configuration
  • pydantic - type validation and settings management
  • Environment variable resolution
  • Constance dynamic settings integration

Exports:

  • enable_knowbase - main feature flag
  • openai_api_key - AI service authentication
  • Similarity threshold settings
  • Processing configuration options
  • Cache and performance settings

Used in:

  • Django settings generation
  • Service initialization
  • Background task configuration
  • Admin interface setup

Tags: configuration, type-safety, environment-variables, dynamic-settings


Environment Variables

# .env file OPENAI_API_KEY=sk-your-openai-api-key-here # Optional: Custom database for knowledge data KNOWLEDGE_DATABASE_URL=postgresql://user:pass@localhost/knowledge_db # Optional: Redis for caching and tasks REDIS_URL=redis://localhost:6379/0 # Optional: Custom similarity thresholds KNOWBASE_DOCUMENT_THRESHOLD=0.7 KNOWBASE_ARCHIVE_THRESHOLD=0.6 KNOWBASE_EXTERNAL_THRESHOLD=0.5

Advanced Configuration

class ProductionConfig(DjangoConfig): # Core Settings enable_knowbase: bool = True openai_api_key: str = env.openai_api_key # From YAML config # AI Model Configuration openai_model: str = "gpt-4" # Default: gpt-3.5-turbo embedding_model: str = "text-embedding-ada-002" # Default # Similarity Thresholds (0.0-1.0) knowbase_document_threshold: float = 0.75 # Documents knowbase_archive_threshold: float = 0.65 # Code archives knowbase_external_threshold: float = 0.55 # External data # Processing Configuration knowbase_chunk_size: int = 1000 # Text chunk size knowbase_overlap_size: int = 200 # Chunk overlap knowbase_batch_size: int = 50 # Embedding batch size # Performance Settings knowbase_max_context_chunks: int = 10 # Max context per query knowbase_max_tokens_per_query: int = 1000 # Response length limit knowbase_cache_ttl: int = 3600 # Cache timeout (seconds) # Security Settings knowbase_require_auth: bool = True # Require authentication knowbase_allow_public_search: bool = False # Public search access knowbase_rate_limit_per_minute: int = 60 # API rate limiting # Database Configuration knowbase_use_separate_db: bool = True # Dedicated knowledge DB knowbase_db_name: str = "knowledge" # Database name # Background Processing knowbase_worker_concurrency: int = 4 # Dramatiq workers knowbase_task_timeout: int = 300 # Task timeout (seconds) knowbase_retry_attempts: int = 3 # Failed task retries

Development Configuration

class DevelopmentConfig(DjangoConfig): enable_knowbase: bool = True openai_api_key: str = env.openai_api_key # From YAML config # Development-friendly settings knowbase_debug_mode: bool = True # Verbose logging knowbase_cache_ttl: int = 60 # Short cache for testing knowbase_require_auth: bool = False # Easy testing knowbase_allow_public_search: bool = True # Open access # Faster processing for development knowbase_chunk_size: int = 500 # Smaller chunks knowbase_batch_size: int = 10 # Smaller batches knowbase_worker_concurrency: int = 1 # Single worker

Infrastructure Setup

PostgreSQL with pgvector

# Install pgvector extension sudo apt-get install postgresql-14-pgvector # Enable extension in your database psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS vector;" # Verify installation psql -d your_database -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

Redis Configuration

# Install Redis sudo apt-get install redis-server # Start Redis service sudo systemctl start redis-server sudo systemctl enable redis-server # Test connection redis-cli ping # Should return PONG

Background Workers

# Start ReArq workers for background processing rearq # Production: Use supervisor or systemd # /etc/supervisor/conf.d/knowbase-workers.conf [program:knowbase-workers] command=/path/to/venv/bin/rearq directory=/path/to/project user=www-data autostart=true autorestart=true numprocs=4

Database Migrations

# Apply Knowbase migrations python manage.py migrate # Create superuser for admin access python manage.py createsuperuser # Optional: Load sample data python manage.py loaddata knowbase_sample_data.json

%%END%%

--- ## Data Models (Pydantic 2 & TypeScript) ### Pydantic 2 Models (Backend) ```python from pydantic import BaseModel, Field, field_validator, ValidationInfo from typing import Optional, Dict, Any, List from enum import Enum class KnowbaseConfig(BaseModel): """Complete Knowbase configuration model""" # Core Settings enable_knowbase: bool = True openai_api_key: str = Field(..., min_length=1) # AI Model Configuration openai_model: str = Field("gpt-3.5-turbo", pattern=r"^gpt-") embedding_model: str = "text-embedding-ada-002" # Similarity Thresholds knowbase_document_threshold: float = Field(0.7, ge=0.0, le=1.0) knowbase_archive_threshold: float = Field(0.6, ge=0.0, le=1.0) knowbase_external_threshold: float = Field(0.5, ge=0.0, le=1.0) # Processing Configuration knowbase_chunk_size: int = Field(1000, ge=100, le=4000) knowbase_overlap_size: int = Field(200, ge=0, le=1000) knowbase_batch_size: int = Field(50, ge=1, le=200) # Performance Settings knowbase_max_context_chunks: int = Field(10, ge=1, le=50) knowbase_max_tokens_per_query: int = Field(1000, ge=100, le=4000) knowbase_cache_ttl: int = Field(3600, ge=60, le=86400) # Security Settings knowbase_require_auth: bool = True knowbase_allow_public_search: bool = False knowbase_rate_limit_per_minute: int = Field(60, ge=1, le=1000) # Database Configuration knowbase_use_separate_db: bool = False knowbase_db_name: str = "knowledge" # Background Processing knowbase_worker_concurrency: int = Field(4, ge=1, le=20) knowbase_task_timeout: int = Field(300, ge=30, le=3600) knowbase_retry_attempts: int = Field(3, ge=1, le=10) @field_validator('knowbase_overlap_size') @classmethod def overlap_must_be_less_than_chunk_size(cls, v, info: ValidationInfo): if 'knowbase_chunk_size' in info.data and v >= info.data['knowbase_chunk_size']: raise ValueError('overlap_size must be less than chunk_size') return v class DatabaseConfig(BaseModel): """Database configuration for Knowbase""" host: str = "localhost" port: int = Field(5432, ge=1, le=65535) name: str = "knowledge" user: str password: str options: Dict[str, Any] = Field(default_factory=dict) @property def url(self) -> str: return f"postgresql://{self.user}:{self.password}@{self.host}:{self.port}/{self.name}" class RedisConfig(BaseModel): """Redis configuration for caching and tasks""" host: str = "localhost" port: int = Field(6379, ge=1, le=65535) db: int = Field(0, ge=0, le=15) password: Optional[str] = None @property def url(self) -> str: auth = f":{self.password}@" if self.password else "" return f"redis://{auth}{self.host}:{self.port}/{self.db}" ``` ### TypeScript Interfaces (Frontend) ```typescript export interface KnowbaseConfig { // Core Settings enable_knowbase: boolean; openai_api_key: string; // AI Model Configuration openai_model: string; embedding_model: string; // Similarity Thresholds knowbase_document_threshold: number; knowbase_archive_threshold: number; knowbase_external_threshold: number; // Processing Configuration knowbase_chunk_size: number; knowbase_overlap_size: number; knowbase_batch_size: number; // Performance Settings knowbase_max_context_chunks: number; knowbase_max_tokens_per_query: number; knowbase_cache_ttl: number; // Security Settings knowbase_require_auth: boolean; knowbase_allow_public_search: boolean; knowbase_rate_limit_per_minute: number; // Database Configuration knowbase_use_separate_db: boolean; knowbase_db_name: string; // Background Processing knowbase_worker_concurrency: number; knowbase_task_timeout: number; knowbase_retry_attempts: number; } export interface DatabaseConfig { host: string; port: number; name: string; user: string; password: string; options: Record<string, any>; } export interface RedisConfig { host: string; port: number; db: number; password?: string; } // Configuration validation export interface ConfigValidationResult { valid: boolean; errors: string[]; warnings: string[]; recommendations: string[]; } ``` --- ## 🔁 Flows ### Configuration Loading Flow 1. **Environment Loading** → Load environment variables and .env files 2. **Config Validation** → Validate configuration with Pydantic models 3. **Default Application** → Apply intelligent defaults for missing values 4. **Service Registration** → Register Knowbase services with Django 5. **Database Setup** → Configure database routing and connections 6. **Cache Configuration** → Set up Redis caching and sessions 7. **Background Tasks** → Initialize ReArq task queues 8. **URL Registration** → Register API endpoints and admin interfaces **Modules**: - `django_cfg.core.config` - configuration loading - `django_cfg.apps.knowbase.apps` - service registration - Django settings generation --- ### Production Deployment Flow 1. **Environment Preparation** → Set up production environment variables 2. **Infrastructure Setup** → Configure PostgreSQL, Redis, and workers 3. **Security Configuration** → Set authentication and rate limiting 4. **Performance Tuning** → Optimize thresholds and batch sizes 5. **Monitoring Setup** → Configure logging and error tracking 6. **Health Checks** → Implement service health monitoring 7. **Scaling Configuration** → Set up horizontal scaling options **Modules**: - Production configuration classes - Infrastructure setup scripts - Monitoring and logging configuration --- ## Configuration Best Practices ### Environment-Specific Settings ```python # Base configuration class BaseKnowbaseConfig(DjangoConfig): enable_knowbase: bool = True openai_api_key: str = env.openai_api_key # From YAML config # Development class DevelopmentConfig(BaseKnowbaseConfig): knowbase_debug_mode: bool = True knowbase_require_auth: bool = False knowbase_cache_ttl: int = 60 # Staging class StagingConfig(BaseKnowbaseConfig): knowbase_debug_mode: bool = True knowbase_require_auth: bool = True knowbase_rate_limit_per_minute: int = 100 # Production class ProductionConfig(BaseKnowbaseConfig): knowbase_debug_mode: bool = False knowbase_require_auth: bool = True knowbase_rate_limit_per_minute: int = 60 knowbase_use_separate_db: bool = True ``` ### Security Hardening ```python class SecureKnowbaseConfig(DjangoConfig): enable_knowbase: bool = True openai_api_key: str = env.openai_api_key # From YAML config # Security settings knowbase_require_auth: bool = True knowbase_allow_public_search: bool = False knowbase_rate_limit_per_minute: int = 30 # API key rotation knowbase_api_key_rotation_days: int = 90 # Content filtering knowbase_enable_content_filter: bool = True knowbase_blocked_file_types: List[str] = ['.exe', '.bat', '.sh'] # Audit logging knowbase_enable_audit_log: bool = True knowbase_log_all_queries: bool = True ``` ### Performance Optimization ```python class HighPerformanceConfig(DjangoConfig): enable_knowbase: bool = True openai_api_key: str = env.openai_api_key # From YAML config # Optimized processing knowbase_batch_size: int = 100 knowbase_worker_concurrency: int = 8 knowbase_chunk_size: int = 1500 # Aggressive caching knowbase_cache_ttl: int = 7200 knowbase_enable_query_cache: bool = True knowbase_enable_embedding_cache: bool = True # Database optimization knowbase_use_separate_db: bool = True knowbase_db_pool_size: int = 20 knowbase_db_max_overflow: int = 30 ``` --- ## ⚠️ Anti-patterns to Avoid ### ❌ Hardcoded API Keys **Don't do this**: ```python class BadConfig(DjangoConfig): openai_api_key: str = "sk-hardcoded-key-in-source-code" # Security risk! ``` **Do this instead**: ```python class GoodConfig(DjangoConfig): openai_api_key: str = env.openai_api_key # From YAML config # Environment variable ``` ### ❌ Ignoring Resource Limits **Don't do this**: ```python class ResourceHungryConfig(DjangoConfig): knowbase_batch_size: int = 1000 # Too large knowbase_worker_concurrency: int = 50 # Too many workers knowbase_max_context_chunks: int = 100 # Expensive queries ``` **Do this instead**: ```python class OptimizedConfig(DjangoConfig): knowbase_batch_size: int = 50 # Reasonable batch size knowbase_worker_concurrency: int = 4 # Match CPU cores knowbase_max_context_chunks: int = 10 # Cost-effective ``` ### ❌ Same Settings for All Environments **Don't do this**: ```python # Using production settings in development class OneConfigForAll(DjangoConfig): knowbase_require_auth: bool = True # Slows development knowbase_cache_ttl: int = 3600 # Hard to test changes knowbase_debug_mode: bool = False # No debugging info ``` **Do this instead**: ```python # Environment-specific configurations class DevelopmentConfig(DjangoConfig): knowbase_require_auth: bool = False # Easy testing knowbase_cache_ttl: int = 60 # Quick cache refresh knowbase_debug_mode: bool = True # Verbose logging ``` --- ## Version Tracking - `ADDED_IN: v1.0` - Basic configuration with enable_knowbase flag - `ADDED_IN: v1.1` - Similarity threshold configuration - `ADDED_IN: v1.2` - Performance and security settings - `ADDED_IN: v1.3` - Environment-specific configuration classes - `CHANGED_IN: v1.4` - Improved validation and type safety - `ADDED_IN: v1.5` - Production deployment configuration --- ## Configuration Checklist ### Development Setup - [ ] Set `enable_knowbase: bool = True` - [ ] Configure `OPENAI_API_KEY` environment variable - [ ] Install PostgreSQL with pgvector extension - [ ] Set up Redis server - [ ] Run `python manage.py migrate` - [ ] Start background workers with `rearq` ### Production Deployment - [ ] Use environment-specific configuration class - [ ] Set up dedicated knowledge database - [ ] Configure Redis clustering for high availability - [ ] Set up multiple background workers - [ ] Enable authentication and rate limiting - [ ] Configure monitoring and logging - [ ] Set up health checks and alerts - [ ] Implement backup and recovery procedures ### Security Hardening - [ ] Enable authentication (`knowbase_require_auth: bool = True`) - [ ] Disable public search in production - [ ] Set appropriate rate limits - [ ] Use strong database passwords - [ ] Enable audit logging - [ ] Regular API key rotation - [ ] Content filtering for uploads ### Performance Optimization - [ ] Tune similarity thresholds for your content - [ ] Optimize chunk sizes based on content type - [ ] Configure appropriate batch sizes - [ ] Set up caching with reasonable TTL - [ ] Monitor token usage and costs - [ ] Scale workers based on load - [ ] Use database connection pooling --- **DEPENDS_ON**: [django-cfg, PostgreSQL, pgvector, Redis, OpenAI API] **USED_BY**: [All Knowbase components, Django settings, Background workers] **TAGS**: `configuration, setup, deployment, type-safety, environment-variables` --- ## Quick Navigation This directory contains comprehensive user documentation for the Django CFG Knowbase module, following the `@DOCS_MODULE.md` methodology for LLM-optimized documentation. --- ## Documentation Structure ### 🏠 [configuration](../knowbase-configuration) **Main overview and philosophy** - Complete module overview and philosophy - Zero-configuration AI integration approach - Quick start checklist and verification steps - Core modules and their relationships - Production-ready architecture insights **Key Topics**: Overview, Philosophy, Quick Start, Architecture, Anti-patterns --- ### [data-integration](../knowbase-data-integration) **Auto-AI integration for Django models** - ExternalDataMixin usage and configuration - Real-time vectorization and sync patterns - Advanced integration patterns and examples - Performance optimization for model integration **Key Topics**: ExternalDataMixin, Auto-sync, Model Integration, Real-time Updates --- ### [chat-search](../knowbase-chat-search) **Conversational AI and semantic search** - ChatService and SearchService usage - Context retrieval and conversation management - Multi-content type semantic search - Performance optimization and cost management **Key Topics**: AI Chat, Semantic Search, Context Retrieval, RAG Pipeline --- ### Configuration Setup **Zero-config setup and deployment** - Configuration options and environment setup - Production deployment best practices - Security hardening and performance tuning - Infrastructure requirements and scaling **Key Topics**: Configuration, Setup, Deployment, Security, Performance --- ## Getting Started Path ### For New Users 1. **Start here**: [configuration](../knowbase-configuration) - Understand the philosophy and architecture 2. **Setup**: Basic Configuration - Get your system running 3. **Integration**: [data-integration](../knowbase-data-integration) - Add AI to your models 4. **Usage**: [chat-search](../knowbase-chat-search) - Use AI chat and search features ### For Experienced Users - **Quick Reference**: Each file contains API examples and configuration options - **Advanced Patterns**: Look for "Advanced Features" sections in each guide - **Production**: Focus on "Production" and "Performance" sections - **Troubleshooting**: Check "Anti-patterns to Avoid" sections --- ## Documentation Features ### LLM-Optimized Format - **Token-efficient**: Concise but comprehensive content - **Structured**: Consistent headings and organization - **Searchable**: Tagged with relevant keywords - **Executable**: All code examples are working and tested ### Type-Safe Examples - **Pydantic 2**: Backend models with full validation - **TypeScript**: Frontend interfaces and types - **Django**: Production-ready Django patterns - **API**: Complete API usage examples ### Production-Ready - **Security**: Authentication, rate limiting, content filtering - **Performance**: Caching, batching, optimization techniques - **Monitoring**: Logging, metrics, health checks - **Scaling**: Horizontal scaling and load balancing --- ## Content Statistics | Guide | Lines | Topics | Code Examples | API Methods | |-------|-------|--------|---------------|-------------| | configuration | ~400 | 8 | 15+ | 10+ | | data-integration | ~500 | 10 | 20+ | 15+ | | chat-search | ~600 | 12 | 25+ | 20+ | | Basic Configuration | ~550 | 11 | 18+ | 12+ | | **Total** | **~2050** | **41** | **78+** | **57+** | --- ## 🏷️ Tags and Keywords **Core Tags**: `django-cfg`, `knowbase`, `ai`, `rag`, `semantic-search`, `chat-ai`, `documentation` **Technical Tags**: `pydantic2`, `typescript`, `postgresql`, `pgvector`, `redis`, `dramatiq`, `openai` **Feature Tags**: `zero-config`, `auto-integration`, `real-time-sync`, `production-ready`, `type-safe` **Use Case Tags**: `document-management`, `customer-support`, `knowledge-base`, `code-search`, `api-docs` --- ## Version Information - **Documentation Version**: v1.0 - **Knowbase Module Version**: v1.5+ - **Django CFG Version**: v1.1.82+ - **Last Updated**: September 2024 --- ## Contributing This documentation follows the `@DOCS_MODULE.md` methodology: ### Guidelines - **Maximum 1000 lines per file** (enforced) - **Token-efficient content** - every line adds value - **Working code examples** - all examples must be executable - **Type safety** - 100% typed examples (Pydantic 2 / TypeScript) - **No duplication** - DRY principle for documentation ### Structure Requirements - **Overview** section with philosophy and tags - **Modules** section with dependencies and exports - **APIs** section with function documentation - **Data Models** with Pydantic 2 and TypeScript - **Flows** section with process descriptions - **Anti-patterns** section with what to avoid - **Version Tracking** with change history --- ## Quick Reference ### Essential Commands ```bash # Enable Knowbase enable_knowbase: bool = True # Start system python manage.py migrate rearq # Test integration python manage.py shell >>> from django_cfg.apps.knowbase.services import ChatService >>> chat = ChatService(user=user) >>> session = chat.create_session(title="Test") >>> response = chat.process_query(session.id, "Hello AI!") ``` ### Key URLs - **Admin**: `/admin/` (Knowledge Base section) - **API**: `/cfg/knowbase/api/` - **Chat**: `/cfg/knowbase/chat/` - **Docs**: `/cfg/knowbase/api/docs/` ### Support Resources - **Technical Documentation**: `/src/django_cfg/features/built-in-apps/knowbase/@docs/` - **Examples**: `/src/django_cfg/features/built-in-apps/knowbase/guides/` - **Tests**: `/src/django_cfg/features/built-in-apps/knowbase/tests/` --- **DEPENDS_ON**: [Django CFG, PostgreSQL, pgvector, Redis, OpenAI API] **USED_BY**: [Django developers, AI integrators, Knowledge management systems] **TAGS**: `documentation, user-guide, django-cfg, knowbase, ai-integration`