BusyBee: AI-Powered Email Automation for Samsung SDS

"Stop being busy, start being BusyBee" Automate repetitive email tasks with AI Samsung SDS Corporate Partnership Project | Excellence Award Winner

Project Overview

BusyBee is an intelligent email classification and automation platform developed as a corporate partnership project with Samsung SDS (October-November 2024). The system processes incoming business emails, classifies their intent using fine-tuned DistilKoBERT, and automates routine responses through a GPT-4o Mini-powered chatbot.

The Problem: Business teams receive hundreds of quote requests, order confirmations, and spam emails daily. Manually processing each email wastes valuable time that could be spent on strategic work.

Our Solution:

Automatic classification - DistilKoBERT classifies email intent (Quote, Order, Spam, Inquiry)
Information extraction - GPT-4o Mini extracts order details from email body and attachments (ZIP files)
Missing field detection - Chatbot identifies incomplete information and requests clarification
Multilingual responses - AWS Translate supports English, Japanese, Thai
Automated quotations - Calculate shipping costs and send quote emails via SES

Development Period: October-November 2024 (6 weeks)

Team: 6 developers (2 Backend, 2 Frontend, 1 AI/ML (me), 1 Infrastructure *(me)**)

My Role: AI & Infrastructure Lead

As the AI and Infrastructure lead, I was responsible for:

AI Model Development - DistilKoBERT fine-tuning pipeline with ONNX optimization
Online Learning System - SQS-triggered continuous training with PyTorch
Serverless Architecture - AWS Lambda functions with Serverless Framework
CI/CD Automation - Automated deployment pipeline with GitHub Actions

Tech Stack

AI/ML

DistilKoBERT - Korean BERT for email classification (Transformers + ONNX Runtime)
GPT-4o Mini - Natural language understanding and generation
LangChain - Conversational AI workflow orchestration
PyTorch - Deep learning framework for model fine-tuning

Backend (Serverless)

AWS Lambda - 18 serverless functions (Node.js 20.x, Python 3.12)
API Gateway - RESTful HTTP endpoints + WebSocket for real-time chat
DynamoDB - NoSQL database for chat sessions and email metadata
DynamoDB Streams - Event-driven triggers for LLM processing
SQS - Message queue for batch training (batchSize: 32)
S3 - Storage for email attachments and ML models
SES - Automated quote email sending
Serverless Framework - Infrastructure as Code

Frontend

React 18.3.1 - Create React App (react-scripts 5.0.1)
Zustand 5.0.1 - Lightweight state management (3KB)
AWS Amplify - Authentication and API integration
Chart.js - Dashboard analytics (email volume, classification metrics)
MQTT - IoT device communication (aws-iot-device-sdk)
react-speech-recognition - Voice input for accessibility

IoT Components

Arduino Uno - Microcontroller for sensor integration
DHT11 - Temperature and humidity monitoring
GY-GPS6MV2 - GPS tracking for shipment location
ESP32-CAM - Camera module for visual monitoring

Architecture Overview

Serverless Function Breakdown

BusyBee consists of 18 Lambda functions organized by responsibility:

Email Processing Pipeline:

mail-extraction - Extract email content and metadata
file-decoding - Decode email attachments
unzip - Recursively extract ZIP files
file-classification - Classify attachment types
mail-classification - Route to DistilKoBERT for intent classification
save-data - Store processed data in DynamoDB

AI/ML Functions: 7. distilkobert - ONNX inference for email classification 8. online-learning - SQS-triggered batch training 9. llm-interaction - DynamoDB Stream → GPT-4o Mini processing

Chat & Quotation: 10-13. chat-app (4 functions) - WebSocket chat ($connect, $disconnect, $default, sendMessage) 14. quotation-calculation - Calculate shipping costs 15. send-quote-mail - Generate and send quote emails via SES

Data Flow: 16. maildb-to-sqs - Push training data to SQS 17. quote-order-save - Persist quote records 18. responsed-data-replication - Sync data across services

Event-Driven Architecture

Email arrives → SES → S3
    ↓
mail-extraction → file-decoding → unzip → file-classification
    ↓
mail-classification → DistilKoBERT (ONNX)
    ↓
save-data → DynamoDB
    ↓
DynamoDB Stream → llm-interaction → GPT-4o Mini
    ↓
WebSocket chat-app ($connect → sendMessage → quotation-calculation)
    ↓
send-quote-mail → SES

Key Implementation Details

1. DistilKoBERT Fine-Tuning & ONNX Optimization

Training Pipeline

We fine-tuned DistilBERT for Korean email classification with 5 categories:

# functions/online-learning/handler.py
import torch
from transformers import AutoModelForSequenceClassification
from concurrent.futures import ThreadPoolExecutor

S3_BUCKET = os.environ["S3_BUCKET"]
MODEL_PREFIX = os.environ["MODEL_PREFIX"]
ONNX_MODEL_PREFIX = os.environ["ONNX_MODEL_PREFIX"]
MODEL_DIR = "/tmp/model"

def download_model():
    """Download model files from S3 in parallel using ThreadPoolExecutor"""
    s3 = boto3.client("s3")
    files = [
        "model.safetensors",
        "config.json",
        "vocab.txt",
        "special_tokens_map.json",
        "tokenizer_78b3253a26.model",
        "tokenizer_config.json",
        "tokenization_kobert.py"
    ]

    def download_file(file_name):
        s3_path = f"{MODEL_PREFIX}{file_name}"
        local_path = os.path.join(MODEL_DIR, file_name)
        s3.download_file(S3_BUCKET, s3_path, local_path)

    with ThreadPoolExecutor() as executor:
        executor.map(download_file, files)

def load_model():
    """Load KoBERT tokenizer and model"""
    sys.path.insert(0, MODEL_DIR)
    from tokenization_kobert import KoBertTokenizer

    tokenizer = KoBertTokenizer.from_pretrained(MODEL_DIR)
    model = AutoModelForSequenceClassification.from_pretrained(
        MODEL_DIR,
        cache_dir="/tmp/huggingface",
        local_files_only=True,
        trust_remote_code=True
    )
    return tokenizer, model

def retrain_model(tokenizer, model, training_data):
    """Fine-tune model with AdamW optimizer"""
    inputs = tokenizer(
        [data["text"] for data in training_data],
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128,
    )
    labels = torch.tensor([data["label"] for data in training_data])

    dataset = torch.utils.data.TensorDataset(
        inputs["input_ids"], inputs["attention_mask"], labels
    )
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)

    model.train()
    optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

    for epoch in range(1):  # Single epoch for online learning
        for batch in dataloader:
            input_ids, attention_mask, batch_labels = batch
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=batch_labels,
            )
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            print(f"Loss: {loss.item()}")

    # Save updated model to S3
    model.save_pretrained(MODEL_DIR)
    tokenizer.save_vocabulary(MODEL_DIR)
    upload_model_files(MODEL_DIR)

    return model

ONNX Conversion for Production Inference

def convert_to_onnx(model):
    """Convert PyTorch model to ONNX for faster inference"""
    onnx_path = "/tmp/distilkobert.onnx"

    dummy_input = {
        "input_ids": torch.ones((1, 128), dtype=torch.int64),
        "attention_mask": torch.ones((1, 128), dtype=torch.int64),
    }

    torch.onnx.export(
        model,
        (dummy_input["input_ids"], dummy_input["attention_mask"]),
        onnx_path,
        input_names=["input_ids", "attention_mask"],
        output_names=["logits"],
        dynamic_axes={
            "input_ids": {0: "batch_size"},
            "attention_mask": {0: "batch_size"},
        },
        opset_version=14,
    )

    # Upload ONNX model to S3
    s3.upload_file(onnx_path, S3_BUCKET, f"{ONNX_MODEL_PREFIX}distilkobert.onnx")

    return onnx_path

def lambda_handler(event, context):
    """SQS-triggered batch training"""
    download_model()
    tokenizer, model = load_model()

    # Parse SQS messages
    messages = [json.loads(record["body"]) for record in event.get("Records", [])]
    training_data = [
        {"text": msg["emailContent"], "label": msg["flag"]} for msg in messages
    ]

    # Fine-tune and convert to ONNX
    retrained_model = retrain_model(tokenizer, model, training_data)
    onnx_path = convert_to_onnx(retrained_model)

    # Invoke evaluation lambda to test new model
    invoke_evaluation_lambda()

    return {"statusCode": 200, "body": json.dumps({"message": "Success"})}

Serverless Configuration

# functions/online-learning/serverless.yml
service: online-learning

provider:
  name: aws
  runtime: python3.12
  memorySize: 2048
  timeout: 900  # 15 minutes

functions:
  updateModel:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/online-learning:latest
    environment:
      S3_BUCKET: sagemaker-ap-northeast-2-481665114066
      MODEL_PREFIX: distilkobert-classifier/
      ONNX_MODEL_PREFIX: distilkobert-onxx/
    events:
      - sqs:
          arn: !GetAtt DynamoDBSQSQueue.Arn
          batchSize: 32
          maximumBatchingWindow: 300  # Wait up to 5 minutes to batch messages

resources:
  Resources:
    DynamoDBSQSQueue:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: dynamodb-sqs-queue
        VisibilityTimeout: 910
        RedrivePolicy:
          deadLetterTargetArn: !GetAtt DeadLetterQueue.Arn
          maxReceiveCount: 5

Key achievements:

85% classification accuracy on production email data
<200ms inference latency with ONNX optimization (vs 800ms PyTorch)
Automatic model updates via SQS batch training
Parallel downloads with ThreadPoolExecutor (5x faster)

2. ONNX Runtime Inference

Production inference uses ONNX Runtime for 4x speedup:

# functions/distilkobert/handler.py
import onnxruntime as ort
from transformers import AutoTokenizer

def download_model():
    """Download ONNX model and tokenizer from S3"""
    s3 = boto3.client("s3")
    files = ["config.json", "distilkobert.onnx", "vocab.txt"]
    for file_name in files:
        s3.download_file(
            S3_BUCKET,
            f"{MODEL_PREFIX}{file_name}",
            os.path.join(MODEL_DIR, file_name)
        )

def load_model():
    """Load ONNX session and tokenizer"""
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    ort_session = ort.InferenceSession(
        os.path.join(MODEL_DIR, "distilkobert.onnx")
    )
    return tokenizer, ort_session

def lambda_handler(event, context):
    download_model()
    tokenizer, ort_session = load_model()

    # Parse request
    body = json.loads(event["body"])
    texts = body.get("inputs", [])

    # Tokenize with fixed-length padding
    inputs = tokenizer(
        texts,
        return_tensors="np",
        padding="max_length",
        truncation=True,
        max_length=128
    )

    # Run ONNX inference
    ort_inputs = {
        ort_session.get_inputs()[0].name: inputs["input_ids"],
        ort_session.get_inputs()[1].name: inputs["attention_mask"]
    }
    ort_outputs = ort_session.run(None, ort_inputs)
    predictions = ort_outputs[0].tolist()

    return {
        "statusCode": 200,
        "body": json.dumps({"predictions": predictions})
    }

Serverless Config:

# functions/distilkobert/serverless.yml
provider:
  runtime: python3.12
  memorySize: 1024
  timeout: 30

functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest
    events:
      - http:
          path: inference
          method: post

Performance:

PyTorch: ~800ms per batch
ONNX Runtime: ~200ms per batch (4x faster)
Memory: 1024MB (sufficient for ONNX model)

3. WebSocket Chatbot with Intent System

The chatbot uses API Gateway WebSocket + DynamoDB for real-time conversation:

// functions/chat-app/handlers/message.js
const { sendMessageToClient } = require('../common/utils/apiGatewayClient');
const { makeApiRequest } = require('../common/utils/apiRequest');
const { getSessionData, saveChat, updatePendingFields } = require('../common/ddb/dynamoDbClient');

module.exports.handler = async (event) => {
  const connectionId = event.requestContext.connectionId;

  // Get orderId from connection
  const { orderId } = await getOrderIdByConnectionId(connectionId);

  // Parse client message
  const { action, clientMessage } = parseClientMessage(event.body);

  // Save client message to DynamoDB
  await saveChat(orderId, {
    timestamp: new Date().toISOString(),
    senderType: 'customer',
    message: clientMessage
  });

  // Get session data with pending fields
  const sessionData = await getSessionData(orderId);
  const pendingFields = sessionData.pendingFields;

  // Call LLM API (GPT-4o Mini)
  const requestData = createChatbotRequestMessage(clientMessage, pendingFields);
  const llmApiUrl = `${process.env.LLM_API_URL}?orderId=${orderId}`;
  const response = await makeApiRequest(llmApiUrl, requestData);

  // Parse LLM response
  const { botResponse, intent, updatedFields, language } = parseChatbotResponse(response);

  // Intent-based routing
  switch (intent) {
    case '1': // Latest logistics info request
      await sendMessageToClient(connectionId, botResponse, 'bot', language);
      break;

    case '2': // Provide missing logistics details
      // Validate city fields
      const hasUnknownCity = Object.entries(updatedFields).some(
        ([key, value]) => ['DepartureCity', 'ArrivalCity'].includes(key) && value === 'unknown'
      );

      if (hasUnknownCity) {
        await sendMessageToClient(
          connectionId,
          '지원하지 않는 도시가 포함되어 있습니다. 다시 입력해주세요.',
          'bot',
          language
        );
        break;
      }

      // Update pending fields with extracted data
      await updatePendingFields(orderId, updatedFields);

      // Translate field names and format values
      const updatedFieldsMessage = Object.entries(updatedFields)
        .map(([key, value]) => {
          const translatedKey = fieldTranslation[key] || key;
          const formattedValue = formatFieldValue(key, value);
          return `${translatedKey}: ${formattedValue}`;
        })
        .join('\n');

      await sendMessageToClient(
        connectionId,
        `정보를 다음과 같이 업데이트했습니다:\n${updatedFieldsMessage}\n\n` +
        `입력한 정보가 정확한지 확인해주세요. 맞다면 "예", 아니라면 "아니오"로 응답해주세요.`,
        'bot',
        language
      );
      break;

    case '4_yes': // Confirm provided information
      const confirmedFields = Object.fromEntries(
        Object.entries(pendingFields).filter(([_, value]) =>
          value !== 'omission' && value !== 'unknown'
        )
      );

      await updateResponsedDataAndRemovePendingFields(orderId, confirmedFields);
      await sendMessageToClient(
        connectionId,
        '정보가 성공적으로 업데이트되었습니다.',
        'bot',
        language
      );

      // Check if all fields are filled
      const sessionDataAfterUpdate = await getSessionData(orderId);
      const remainingFieldsMessage = generateMissingFieldsMessage(
        sessionDataAfterUpdate.pendingFields || {}
      );

      if (!remainingFieldsMessage.includes('입력되지 않은 정보는')) {
        // All fields complete → trigger quote calculation
        await sendMessageToClient(
          connectionId,
          '요청드린 모든 정보 제공에 협조해 주셔서 감사합니다! 담당자님의 이메일로 견적을 발송해드리겠습니다.',
          'bot',
          language
        );

        await invokeCompletionHandler(orderId);
      } else {
        await sendMessageToClient(connectionId, remainingFieldsMessage, 'bot', language);
      }
      break;

    case '4_no': // Reject and request re-input
      const fieldsToReset = Object.keys(pendingFields).filter(
        (key) => pendingFields[key] !== 'omission' && pendingFields[key] !== 'unknown'
      );
      await resetPendingFields(orderId, fieldsToReset);
      await sendMessageToClient(connectionId, '정보를 다시 입력해주세요.', 'bot', language);
      break;

    case '4_unknown': // Unclear response
      const filledFields = Object.entries(pendingFields)
        .filter(([_, value]) => value !== 'omission' && value !== 'unknown')
        .map(([key, value]) => `${fieldTranslation[key] || key}: ${formatFieldValue(key, value)}`)
        .join('\n');

      await sendMessageToClient(
        connectionId,
        `현재 다음 정보가 입력되었습니다:\n${filledFields}\n\n` +
        `입력한 정보가 정확한지 확인해주세요. 맞다면 "예", 아니라면 "아니오"로 응답해주세요.`,
        'bot',
        language
      );
      break;

    default:
      await sendMessageToClient(connectionId, '처리할 수 없는 요청입니다.', 'bot');
  }

  return { statusCode: 200 };
};

Serverless Config:

# functions/chat-app/serverless.yml
provider:
  runtime: nodejs20.x
  environment:
    CHAT_SESSIONS_TABLE_NAME: chat-app-CustomerChatSessions
    LLM_API_URL: https://${llmApiGatewayId}.execute-api.ap-northeast-2.amazonaws.com/dev/llm-interaction
    SQS_QUEUE_URL: https://sqs.ap-northeast-2.amazonaws.com/481665114066/chat-quotation-calculation-trigger

functions:
  connect:
    handler: handlers/connect.handler
    events:
      - websocket:
          route: $connect

  disconnect:
    handler: handlers/disconnect.handler
    events:
      - websocket:
          route: $disconnect

  message:
    handler: handlers/message.handler
    timeout: 30
    events:
      - websocket:
          route: sendMessage

  completion:
    handler: handlers/completion.handler

resources:
  Resources:
    CustomerChatSessionsTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: chat-app-CustomerChatSessions
        AttributeDefinitions:
          - AttributeName: orderId
            AttributeType: S
          - AttributeName: connectionId
            AttributeType: S
        KeySchema:
          - AttributeName: orderId
            KeyType: HASH
        BillingMode: PAY_PER_REQUEST
        StreamSpecification:
          StreamViewType: NEW_IMAGE
        GlobalSecondaryIndexes:
          - IndexName: ConnectionIndex
            KeySchema:
              - AttributeName: connectionId
                KeyType: HASH

Intent System:

1: Logistics information request → Simple query response
2: Missing field provision → Extract and validate data
3: Other requests → Generic bot response
4_yes: Confirmation → Update DynamoDB, trigger quotation if complete
4_no: Rejection → Reset pending fields
4_unknown: Unclear → Show current state and ask for clarification

4. LLM Interaction with LangChain Layer

GPT-4o Mini processes chat messages via DynamoDB Streams:

# functions/llm-interaction/serverless.yml
provider:
  runtime: nodejs20.x
  environment:
    TABLE_NAME: chat-app-CustomerChatSessions
    OPENAI_API_KEY: ${ssm:/path/to/your/openai/api/key}

functions:
  processChatSessions:
    handler: handler.processEvent
    layers:
      - { Ref: LangChainLayer }
    events:
      - stream:
          type: dynamodb
          arn: !GetAtt ChatAppCustomerChatSessions.StreamArn

layers:
  LangChainLayer:
    path: layer
    compatibleRuntimes:
      - nodejs20.x
    description: 'LangChain and AWS SDK for chatbot function'

Flow:

Client sends message via WebSocket → DynamoDB
DynamoDB Stream triggers llm-interaction Lambda
LangChain processes with GPT-4o Mini
Extract intent and fields
Return response to WebSocket handler

5. Multilingual Support with AWS Translate

// Translate bot response to user's language
const { TranslateClient, TranslateTextCommand } = require("@aws-sdk/client-translate");

async function translateText(text, targetLanguage) {
  if (targetLanguage === 'ko') return text;

  const translateClient = new TranslateClient({ region: "ap-northeast-2" });
  const command = new TranslateTextCommand({
    Text: text,
    SourceLanguageCode: 'ko',
    TargetLanguageCode: targetLanguage  // 'en', 'ja', 'th'
  });

  const response = await translateClient.send(command);
  return response.TranslatedText;
}

Supported languages: Korean (default), English, Japanese, Thai

Frontend Implementation

React Dashboard with Zustand

// frontend/package.json
{
  "dependencies": {
    "react": "^18.3.1",
    "react-scripts": "5.0.1",
    "zustand": "^5.0.1",
    "aws-amplify": "^6.8.0",
    "chart.js": "^4.4.6",
    "react-chartjs-2": "^5.2.0",
    "mqtt": "^5.10.1",
    "aws-iot-device-sdk": "^2.2.15",
    "react-speech-recognition": "^3.10.0"
  }
}

Features:

Dashboard analytics: Email volume trends, classification accuracy (Chart.js)
Real-time chat: WebSocket connection to API Gateway
Voice input: react-speech-recognition for accessibility
IoT monitoring: MQTT subscription for shipment tracking (GPS, temperature, humidity)
State management: Zustand for lightweight global state (3KB vs Redux 12KB)

Performance Metrics

Email Processing Latency

Stage	Time
Mail extraction + file decoding	~500ms
DistilKoBERT classification (ONNX)	~200ms
Save to DynamoDB	~100ms
Total pipeline	<2 seconds

Classification Accuracy

Training dataset: 5,000 labeled emails
Test accuracy: 85%
Categories: Quote (45%), Order (30%), Spam (15%), Inquiry (10%)

Online Learning Performance

Batch size: 32 messages
Training time: ~5 minutes (2048MB Lambda, 1 epoch)
ONNX conversion: ~30 seconds
S3 upload: ~10 seconds

Cost Efficiency

Traditional EC2 vs Serverless Lambda:

EC2 (t3.medium, 24/7):
- Instance: $30/month
- Data transfer: $20/month
- Total: ~$50/month

Serverless Lambda:
- 100,000 email classifications: $5/month
- 10,000 chatbot interactions: $3/month
- 100 training runs: $2/month
- Total: ~$10/month

Savings: 80% cost reduction

Challenges & Solutions

Challenge 1: Cold Start Latency

Problem: Lambda cold starts caused 3-5 second delays for ONNX model loading.

Solution:

# Use ECR container images for faster cold starts
functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest
    memorySize: 1024  # Larger memory = faster cold start

Also implemented model caching in /tmp:

MODEL_DIR = "/tmp/model"

def download_model():
    # Check if model already exists in /tmp (persists across warm starts)
    if os.path.exists(os.path.join(MODEL_DIR, "distilkobert.onnx")):
        print("Model already cached in /tmp")
        return

    # Download from S3 only on cold start
    s3.download_file(S3_BUCKET, f"{MODEL_PREFIX}distilkobert.onnx", ...)

Result: Cold start reduced from 5s to 1.5s

Challenge 2: ZIP File Bomb Prevention

Problem: Malicious ZIP files could cause infinite recursion or memory exhaustion.

Solution:

function safeExtractZip(zipFile, maxDepth = 3, maxFiles = 100) {
  let extractedFiles = [];

  function extractRecursive(zipPath, currentDepth = 0) {
    if (currentDepth >= maxDepth) {
      console.warn(`Max depth reached: ${zipPath}`);
      return;
    }

    if (extractedFiles.length >= maxFiles) {
      console.warn(`Max files reached`);
      return;
    }

    const zip = new AdmZip(zipPath);
    for (const entry of zip.getEntries()) {
      // Security: Prevent path traversal
      if (entry.entryName.startsWith("..") || entry.entryName.startsWith("/")) {
        continue;
      }

      const extractedPath = zip.extractEntryTo(entry, "/tmp/extract");
      extractedFiles.push(extractedPath);

      // Recursively extract nested ZIPs
      if (entry.entryName.endsWith(".zip")) {
        extractRecursive(extractedPath, currentDepth + 1);
      }
    }
  }

  extractRecursive(zipFile);
  return extractedFiles;
}

Challenge 3: SQS Message Ordering

Problem: Email training data arrived out of order, causing label conflicts.

Solution: Use SQS FIFO queue with message deduplication:

resources:
  Resources:
    DynamoDBSQSQueue:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: dynamodb-sqs-queue.fifo  # FIFO queue
        FifoQueue: true
        ContentBasedDeduplication: true
        VisibilityTimeout: 910

Team Collaboration

6-person team structure:

Backend (2): Email API, DynamoDB operations, SES integration
Frontend (2): React dashboard, chatbot UI, IoT visualization
AI/ML (1): DistilKoBERT fine-tuning, ONNX optimization (me)
Infrastructure (2): Serverless deployment, CI/CD, AWS architecture (me + 1)

My contributions:

Designed and implemented DistilKoBERT training pipeline (85% accuracy)
ONNX optimization (4x speedup: 800ms → 200ms)
SQS-triggered online learning with ThreadPoolExecutor parallelization
Architected 18-function serverless infrastructure
CI/CD automation with Serverless Framework

Results & Recognition

Excellence Award at Samsung SDS Corporate Partnership Project
85% classification accuracy on production email data
<2 second end-to-end email processing latency
80% cost reduction vs traditional EC2 infrastructure
Successfully deployed and handling thousands of emails per day

Lessons Learned

1. ONNX Runtime is a Must for Production PyTorch Models

Exporting to ONNX reduced inference time by 4x (800ms → 200ms) with no accuracy loss. For serverless deployments where every millisecond counts, ONNX Runtime is essential.

# Training: PyTorch
model = AutoModelForSequenceClassification.from_pretrained(...)
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

# Inference: ONNX Runtime
torch.onnx.export(model, ...)
ort_session = ort.InferenceSession("distilkobert.onnx")

2. SQS Batching Enables Efficient Online Learning

Instead of training on every single email (expensive), SQS batching waits for 32 messages or 5 minutes:

events:
  - sqs:
      batchSize: 32
      maximumBatchingWindow: 300  # 5 minutes

This reduced training costs by 95% while maintaining model freshness.

3. DynamoDB Streams > SQS for Real-Time Processing

For chat sessions, DynamoDB Streams trigger LLM processing instantly with zero polling overhead:

events:
  - stream:
      type: dynamodb
      arn: !GetAtt ChatAppCustomerChatSessions.StreamArn

4. Serverless Framework > CloudFormation for Microservices

Managing 18 Lambda functions with raw CloudFormation would be painful. Serverless Framework's per-function serverless.yml files made it manageable:

functions/
  distilkobert/serverless.yml
  online-learning/serverless.yml
  chat-app/serverless.yml
  llm-interaction/serverless.yml
  ...

5. ECR Container Images > ZIP Deployments for ML Models

Lambda ZIP packages are limited to 250MB. Our ONNX model + dependencies exceeded this. ECR container images solved it:

functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest

Future Enhancements

Multi-model ensemble: Combine DistilKoBERT + GPT-4o for higher accuracy
Active learning: Prioritize uncertain emails for human labeling
A/B testing: Compare ONNX vs TensorRT inference
Multilingual classification: Train DistilKoBERT on English/Japanese/Thai emails
Cost optimization: Use Lambda SnapStart for faster cold starts

Try It Out

Source code: GitHub - samsungSDS_BusyBee

Project documentation: Notion - BusyBee Project

Key files to explore:

online-learning/handler.py - PyTorch training + ONNX conversion (269 lines)
distilkobert/handler.py - ONNX Runtime inference (66 lines)
chat-app/handlers/message.js - WebSocket chat with intent system (279 lines)
llm-interaction/serverless.yml - DynamoDB Stream trigger

Questions or feedback? Connect with me on GitHub