# AWS Deployment

{% hint style="danger" %}
TODO: proofread + prune. I just copypasted this section from Claude as is. \
high chance of hallucination + too much verbosity
{% endhint %}

This guide provides AskGov-specific guidance for AWS deployment. We assume you're familiar with AWS services and focus on AskGov-specific requirements and best practices.

**Deployment Duration**: 4-6 hours (excluding DNS propagation) **Complexity Level**: Intermediate to Advanced **Prerequisites**: AWS account with appropriate permissions, familiarity with AWS services

{% hint style="info" %}
**Note**: This guide focuses on AskGov-specific configurations. For general AWS setup instructions, refer to AWS documentation.
{% endhint %}

### Pre-Deployment Checklist

#### Required AWS Services

* [ ] VPC with public/private subnets
* [ ] Application Load Balancer
* [ ] ECS Fargate or EC2 for compute
* [ ] RDS PostgreSQL or EC2 for CockroachDB
* [ ] ElastiCache for Redis
* [ ] S3 for file storage
* [ ] Secrets Manager for credentials
* [ ] CloudWatch for monitoring

#### Prerequisites

* [ ] AWS CLI configured
* [ ] Domain name registered
* [ ] SSL certificate in ACM
* [ ] Docker image repository (ECR)

### Architecture Overview

```mermaid
graph TB
    subgraph "Route 53"
        DNS[DNS Records]
    end
    
    subgraph "CloudFront"
        CF[CDN Distribution]
    end
    
    subgraph "VPC"
        subgraph "Public Subnets"
            ALB[Application Load Balancer]
            NAT[NAT Gateways]
        end
        
        subgraph "Private Subnets"
            ECS[ECS Fargate/EC2]
            RDS[(RDS/CockroachDB)]
            REDIS[(ElastiCache)]
            WEAVIATE[Weaviate on EC2]
        end
    end
    
    subgraph "Storage"
        S3[S3 Buckets]
        SM[Secrets Manager]
    end
```

### Step 1: Network Infrastructure

#### Key Considerations for AskGov

* **Multi-AZ deployment** for high availability
* **Private subnets** for application and database tiers
* **Public subnets** only for load balancer and NAT gateways
* **Strict security groups** limiting traffic between tiers

#### Security Groups Required

1. **ALB Security Group**
   * Inbound: 80, 443 from internet
   * Outbound: 8080 to App Security Group
2. **App Security Group**
   * Inbound: 8080 from ALB Security Group
   * Outbound: 5432/26257 to DB, 6379 to Redis, 443 to internet
3. **Database Security Group**
   * Inbound: 5432/26257 from App Security Group only
   * No outbound rules needed
4. **Redis Security Group**
   * Inbound: 6379 from App Security Group only

### Step 2: Database Setup

#### Option A: RDS PostgreSQL (Simpler)

**Specifications for AskGov:**

* Engine: PostgreSQL 14+
* Instance: db.t3.medium minimum (adjust based on load)
* Storage: 100GB GP3 SSD, encrypted
* Multi-AZ: Required for production
* Automated backups: 30-day retention

**AskGov-specific configurations:**

```sql
-- After RDS creation, run these optimizations
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
```

#### Option B: CockroachDB on EC2 (Full Compatibility)

**Why CockroachDB for AskGov:**

* Full compatibility with AskGov's Prisma schemas
* Better horizontal scaling for large deployments
* Built-in geo-replication capabilities

**Minimum 3-node cluster:**

* Instance type: m5.xlarge
* Storage: 500GB SSD per node
* Placement: Different AZs

### Step 3: Cache Layer (Redis)

#### ElastiCache Configuration for AskGov

**Cache strategy:**

* Session storage
* Search results caching (5-minute TTL)
* Rate limiting counters
* Popular questions cache

**Recommended setup:**

* Node type: cache.t3.micro for < 100k users
* Node type: cache.r6g.large for > 100k users
* Parameter group: Custom with `maxmemory-policy allkeys-lru`

### Step 4: Search Engine (Weaviate)

#### Weaviate Deployment Considerations

**Important:** Weaviate requires dedicated EC2 instance (no managed service available)

**Instance requirements:**

* t3.large minimum (4GB RAM for vectorization)
* Persistent EBS volume for data
* Private subnet deployment

**AskGov-specific Weaviate configuration:**

```yaml
# Environment variables for Weaviate
AUTHENTICATION_APIKEY_ENABLED: true
AUTHENTICATION_APIKEY_ALLOWED_KEYS: <generate-strong-key>
DEFAULT_VECTORIZER_MODULE: text2vec-openai  # For production
# Or text2vec-transformers for self-hosted vectorization
ENABLE_MODULES: text2vec-openai,text2vec-transformers
```

### Step 5: Application Deployment

#### Container Configuration

**Docker image considerations:**

* Multi-stage build to minimize size
* Non-root user for security
* Health check endpoint included

#### ECS Task Definition Key Settings

```json
{
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [{
    "name": "askgov",
    "essential": true,
    "portMappings": [{"containerPort": 8080}],
    "healthCheck": {
      "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
      "interval": 30,
      "timeout": 5,
      "retries": 3
    }
  }]
}
```

#### Environment Variables in Secrets Manager

Store these securely:

* `DATABASE_URL` - Full connection string
* `SESSION_SECRET` - 32+ character random string
* `REDIS_URL` - ElastiCache endpoint
* `WEAVIATE_API_KEY` - Weaviate authentication
* `VECTORIZER_API_KEY` - OpenAI/Cohere key if using

### Step 6: Load Balancer Configuration

#### ALB Settings for AskGov

**Target Group Configuration:**

* Health check path: `/health` or `/`
* Health check interval: 30 seconds
* Deregistration delay: 30 seconds (for graceful shutdown)

**Listener Rules:**

* HTTP → HTTPS redirect
* Host-based routing if multiple agencies
* Path-based routing for API vs frontend

### Step 7: Storage Configuration

#### S3 Buckets Required

1. **askgov-uploads** - User file uploads
   * Versioning: Enabled
   * Encryption: SSE-S3
   * Lifecycle: Archive after 90 days
2. **askgov-exports** - Data exports
   * Encryption: SSE-KMS
   * Access: Restricted to app role
3. **askgov-backups** - Database backups
   * Encryption: SSE-KMS
   * Lifecycle: Delete after retention period
   * Cross-region replication recommended

### Step 8: Post-Deployment Tasks

#### 8.1 Database Initialization

```bash
# Connect to ECS task
aws ecs execute-command --cluster askgov --task <task-id> --container askgov --interactive --command "/bin/sh"

# Run migrations
npx prisma migrate deploy

# Seed initial data (if needed)
npx prisma db seed
```

#### 8.2 Weaviate Search Initialization

```bash
# Create search class for questions
curl -X POST https://your-domain/hybrid/classes \
  -H "Authorization: Bearer $SUPERADMIN_TOKEN" \
  -F "className=Question_v1"
```

#### 8.3 Create First Admin User

```bash
# Via application API or database
curl -X POST https://your-domain/api/v1/superadmin/users \
  -H "Authorization: Bearer $SUPERADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"email": "admin@yourgov.com", "role": "superadmin"}'
```

### Step 9: Monitoring Setup

#### CloudWatch Dashboards

Create dashboards for:

* Application metrics (CPU, memory, request count)
* Database performance (connections, query time)
* Search latency (Weaviate response times)
* User activity (questions created, searches performed)

#### Key Alarms to Configure

1. **High CPU Usage** (> 70% for 5 minutes)
2. **Memory Pressure** (> 85%)
3. **Database Connection Pool** (> 80% utilized)
4. **4XX/5XX Error Rate** (> 1% of requests)
5. **Search Latency** (> 1 second p99)

#### Logging Strategy

* Application logs → CloudWatch Logs
* Access logs → S3 with analysis via Athena
* Security events → CloudWatch Logs with alerts
* Audit logs → Separate encrypted log group

### Step 10: Backup and Disaster Recovery

#### Backup Components

1. **Database**: Automated RDS backups + manual snapshots
2. **Application State**: S3 versioning for uploads
3. **Configuration**: Infrastructure as Code in Git
4. **Search Index**: Periodic Weaviate exports

#### Recovery Time Objectives

* **RTO**: 4 hours (full recovery)
* **RPO**: 1 hour (maximum data loss)

### Performance Optimization

#### Scaling Triggers

Configure auto-scaling based on:

* CPU utilization > 70%
* Memory utilization > 80%
* Request count > threshold
* Queue depth (if using SQS)

#### Caching Strategy

1. **CloudFront**: Static assets, 30-day cache
2. **Redis**: Session data, search results (5 min), popular questions (1 hour)
3. **Application**: In-memory cache for frequently accessed data

### Security Hardening

#### AWS-Specific Security

1. **Enable AWS WAF** with managed rule sets
2. **Configure AWS Shield** for DDoS protection
3. **Use AWS Systems Manager** for patch management
4. **Enable VPC Flow Logs** for network monitoring
5. **Configure AWS GuardDuty** for threat detection

#### Compliance Features

* **AWS CloudTrail**: API audit logging
* **AWS Config**: Compliance monitoring
* **AWS Security Hub**: Centralized security view
* **AWS Macie**: Sensitive data discovery (if needed)

### Cost Optimization

#### Cost Reduction Strategies

1. **Use Spot Instances** for non-critical workloads
2. **Reserved Instances** for predictable workloads (up to 72% savings)
3. **S3 Intelligent-Tiering** for automatic cost optimization
4. **Scheduled scaling** for non-production environments
5. **Right-sizing** based on CloudWatch metrics

#### Estimated Monthly Costs

| Deployment Size | Users   | Estimated Cost |
| --------------- | ------- | -------------- |
| Small           | < 100k  | $500-800       |
| Medium          | 100k-1M | $1,500-2,500   |
| Large           | > 1M    | $4,000-8,000   |

*Note: Costs vary by region and usage patterns*

### Troubleshooting

#### Common Issues

**ECS Tasks Not Starting**

* Check task role permissions
* Verify secrets/environment variables
* Review CloudWatch logs
* Ensure health checks pass

**Database Connection Issues**

* Verify security group rules
* Check RDS parameter group settings
* Confirm password in Secrets Manager
* Test from bastion host

**Search Not Working**

* Verify Weaviate is running
* Check API key configuration
* Ensure vectorization module is loaded
* Review Weaviate logs

**High Memory Usage**

* Review Node.js heap settings
* Check for memory leaks
* Scale horizontally
* Optimize database queries

### Migration Checklist

#### Before Going Live

* [ ] All services deployed and healthy
* [ ] Database migrations completed
* [ ] Search index populated
* [ ] SSL certificates active
* [ ] DNS configured and propagated
* [ ] Monitoring alerts configured
* [ ] Backup strategy tested
* [ ] Load testing completed
* [ ] Security scan performed
* [ ] Admin users created
* [ ] Documentation updated

***

{% hint style="success" %}
**Deployment Complete!** Your AskGov instance is now running on AWS.

**Next Steps:**

* Configure monitoring dashboards
* Set up automated backups
* Review Security Guide for additional hardening
* Customize branding (Component Customization)
  {% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://international.open.gov.sg/self-hosting/askgov/aws-deployment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
