BusyBee: AI-Powered Email Automation for Samsung SDS

"Stop being busy, start being BusyBee" Automate repetitive email tasks with AI Samsung SDS Corporate Partnership Project | Excellence Award Winner

Project Overview

BusyBee is an intelligent email classification and automation platform developed as a corporate partnership project with Samsung SDS (October-November 2024). The system processes incoming business emails, classifies their intent using fine-tuned DistilKoBERT, and automates routine responses through a GPT-4o Mini-powered chatbot.

The Problem: Business teams receive hundreds of quote requests, order confirmations, and spam emails daily. Manually processing each email wastes valuable time that could be spent on strategic work.

Our Solution:

  1. Automatic classification - DistilKoBERT classifies email intent (Quote, Order, Spam, Inquiry)
  2. Information extraction - GPT-4o Mini extracts order details from email body and attachments (ZIP files)
  3. Missing field detection - Chatbot identifies incomplete information and requests clarification
  4. Multilingual responses - AWS Translate supports English, Japanese, Thai
  5. Automated quotations - Calculate shipping costs and send quote emails via SES

Development Period: October-November 2024 (6 weeks)

Team: 6 developers (2 Backend, 2 Frontend, 1 AI/ML (me), 1 Infrastructure *(me)**)

My Role: AI & Infrastructure Lead

As the AI and Infrastructure lead, I was responsible for:

  1. AI Model Development - DistilKoBERT fine-tuning pipeline with ONNX optimization
  2. Online Learning System - SQS-triggered continuous training with PyTorch
  3. Serverless Architecture - AWS Lambda functions with Serverless Framework
  4. CI/CD Automation - Automated deployment pipeline with GitHub Actions

Tech Stack

AI/ML

  • DistilKoBERT - Korean BERT for email classification (Transformers + ONNX Runtime)
  • GPT-4o Mini - Natural language understanding and generation
  • LangChain - Conversational AI workflow orchestration
  • PyTorch - Deep learning framework for model fine-tuning

Backend (Serverless)

  • AWS Lambda - 18 serverless functions (Node.js 20.x, Python 3.12)
  • API Gateway - RESTful HTTP endpoints + WebSocket for real-time chat
  • DynamoDB - NoSQL database for chat sessions and email metadata
  • DynamoDB Streams - Event-driven triggers for LLM processing
  • SQS - Message queue for batch training (batchSize: 32)
  • S3 - Storage for email attachments and ML models
  • SES - Automated quote email sending
  • Serverless Framework - Infrastructure as Code

Frontend

  • React 18.3.1 - Create React App (react-scripts 5.0.1)
  • Zustand 5.0.1 - Lightweight state management (3KB)
  • AWS Amplify - Authentication and API integration
  • Chart.js - Dashboard analytics (email volume, classification metrics)
  • MQTT - IoT device communication (aws-iot-device-sdk)
  • react-speech-recognition - Voice input for accessibility

IoT Components

  • Arduino Uno - Microcontroller for sensor integration
  • DHT11 - Temperature and humidity monitoring
  • GY-GPS6MV2 - GPS tracking for shipment location
  • ESP32-CAM - Camera module for visual monitoring

Architecture Overview

Serverless Function Breakdown

BusyBee consists of 18 Lambda functions organized by responsibility:

Email Processing Pipeline:

  1. mail-extraction - Extract email content and metadata
  2. file-decoding - Decode email attachments
  3. unzip - Recursively extract ZIP files
  4. file-classification - Classify attachment types
  5. mail-classification - Route to DistilKoBERT for intent classification
  6. save-data - Store processed data in DynamoDB

AI/ML Functions: 7. distilkobert - ONNX inference for email classification 8. online-learning - SQS-triggered batch training 9. llm-interaction - DynamoDB Stream → GPT-4o Mini processing

Chat & Quotation: 10-13. chat-app (4 functions) - WebSocket chat ($connect, $disconnect, $default, sendMessage) 14. quotation-calculation - Calculate shipping costs 15. send-quote-mail - Generate and send quote emails via SES

Data Flow: 16. maildb-to-sqs - Push training data to SQS 17. quote-order-save - Persist quote records 18. responsed-data-replication - Sync data across services

Event-Driven Architecture

Email arrives → SES → S3
    ↓
mail-extraction → file-decoding → unzip → file-classification
    ↓
mail-classification → DistilKoBERT (ONNX)
    ↓
save-data → DynamoDB
    ↓
DynamoDB Stream → llm-interaction → GPT-4o Mini
    ↓
WebSocket chat-app ($connect → sendMessage → quotation-calculation)
    ↓
send-quote-mail → SES

Key Implementation Details

1. DistilKoBERT Fine-Tuning & ONNX Optimization

Training Pipeline

We fine-tuned DistilBERT for Korean email classification with 5 categories:

# functions/online-learning/handler.py
import torch
from transformers import AutoModelForSequenceClassification
from concurrent.futures import ThreadPoolExecutor

S3_BUCKET = os.environ["S3_BUCKET"]
MODEL_PREFIX = os.environ["MODEL_PREFIX"]
ONNX_MODEL_PREFIX = os.environ["ONNX_MODEL_PREFIX"]
MODEL_DIR = "/tmp/model"

def download_model():
    """Download model files from S3 in parallel using ThreadPoolExecutor"""
    s3 = boto3.client("s3")
    files = [
        "model.safetensors",
        "config.json",
        "vocab.txt",
        "special_tokens_map.json",
        "tokenizer_78b3253a26.model",
        "tokenizer_config.json",
        "tokenization_kobert.py"
    ]

    def download_file(file_name):
        s3_path = f"{MODEL_PREFIX}{file_name}"
        local_path = os.path.join(MODEL_DIR, file_name)
        s3.download_file(S3_BUCKET, s3_path, local_path)

    with ThreadPoolExecutor() as executor:
        executor.map(download_file, files)

def load_model():
    """Load KoBERT tokenizer and model"""
    sys.path.insert(0, MODEL_DIR)
    from tokenization_kobert import KoBertTokenizer

    tokenizer = KoBertTokenizer.from_pretrained(MODEL_DIR)
    model = AutoModelForSequenceClassification.from_pretrained(
        MODEL_DIR,
        cache_dir="/tmp/huggingface",
        local_files_only=True,
        trust_remote_code=True
    )
    return tokenizer, model

def retrain_model(tokenizer, model, training_data):
    """Fine-tune model with AdamW optimizer"""
    inputs = tokenizer(
        [data["text"] for data in training_data],
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128,
    )
    labels = torch.tensor([data["label"] for data in training_data])

    dataset = torch.utils.data.TensorDataset(
        inputs["input_ids"], inputs["attention_mask"], labels
    )
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=32)

    model.train()
    optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

    for epoch in range(1):  # Single epoch for online learning
        for batch in dataloader:
            input_ids, attention_mask, batch_labels = batch
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=batch_labels,
            )
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            print(f"Loss: {loss.item()}")

    # Save updated model to S3
    model.save_pretrained(MODEL_DIR)
    tokenizer.save_vocabulary(MODEL_DIR)
    upload_model_files(MODEL_DIR)

    return model

ONNX Conversion for Production Inference

def convert_to_onnx(model):
    """Convert PyTorch model to ONNX for faster inference"""
    onnx_path = "/tmp/distilkobert.onnx"

    dummy_input = {
        "input_ids": torch.ones((1, 128), dtype=torch.int64),
        "attention_mask": torch.ones((1, 128), dtype=torch.int64),
    }

    torch.onnx.export(
        model,
        (dummy_input["input_ids"], dummy_input["attention_mask"]),
        onnx_path,
        input_names=["input_ids", "attention_mask"],
        output_names=["logits"],
        dynamic_axes={
            "input_ids": {0: "batch_size"},
            "attention_mask": {0: "batch_size"},
        },
        opset_version=14,
    )

    # Upload ONNX model to S3
    s3.upload_file(onnx_path, S3_BUCKET, f"{ONNX_MODEL_PREFIX}distilkobert.onnx")

    return onnx_path

def lambda_handler(event, context):
    """SQS-triggered batch training"""
    download_model()
    tokenizer, model = load_model()

    # Parse SQS messages
    messages = [json.loads(record["body"]) for record in event.get("Records", [])]
    training_data = [
        {"text": msg["emailContent"], "label": msg["flag"]} for msg in messages
    ]

    # Fine-tune and convert to ONNX
    retrained_model = retrain_model(tokenizer, model, training_data)
    onnx_path = convert_to_onnx(retrained_model)

    # Invoke evaluation lambda to test new model
    invoke_evaluation_lambda()

    return {"statusCode": 200, "body": json.dumps({"message": "Success"})}

Serverless Configuration

# functions/online-learning/serverless.yml
service: online-learning

provider:
  name: aws
  runtime: python3.12
  memorySize: 2048
  timeout: 900  # 15 minutes

functions:
  updateModel:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/online-learning:latest
    environment:
      S3_BUCKET: sagemaker-ap-northeast-2-481665114066
      MODEL_PREFIX: distilkobert-classifier/
      ONNX_MODEL_PREFIX: distilkobert-onxx/
    events:
      - sqs:
          arn: !GetAtt DynamoDBSQSQueue.Arn
          batchSize: 32
          maximumBatchingWindow: 300  # Wait up to 5 minutes to batch messages

resources:
  Resources:
    DynamoDBSQSQueue:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: dynamodb-sqs-queue
        VisibilityTimeout: 910
        RedrivePolicy:
          deadLetterTargetArn: !GetAtt DeadLetterQueue.Arn
          maxReceiveCount: 5

Key achievements:

  • 85% classification accuracy on production email data
  • <200ms inference latency with ONNX optimization (vs 800ms PyTorch)
  • Automatic model updates via SQS batch training
  • Parallel downloads with ThreadPoolExecutor (5x faster)

2. ONNX Runtime Inference

Production inference uses ONNX Runtime for 4x speedup:

# functions/distilkobert/handler.py
import onnxruntime as ort
from transformers import AutoTokenizer

def download_model():
    """Download ONNX model and tokenizer from S3"""
    s3 = boto3.client("s3")
    files = ["config.json", "distilkobert.onnx", "vocab.txt"]
    for file_name in files:
        s3.download_file(
            S3_BUCKET,
            f"{MODEL_PREFIX}{file_name}",
            os.path.join(MODEL_DIR, file_name)
        )

def load_model():
    """Load ONNX session and tokenizer"""
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    ort_session = ort.InferenceSession(
        os.path.join(MODEL_DIR, "distilkobert.onnx")
    )
    return tokenizer, ort_session

def lambda_handler(event, context):
    download_model()
    tokenizer, ort_session = load_model()

    # Parse request
    body = json.loads(event["body"])
    texts = body.get("inputs", [])

    # Tokenize with fixed-length padding
    inputs = tokenizer(
        texts,
        return_tensors="np",
        padding="max_length",
        truncation=True,
        max_length=128
    )

    # Run ONNX inference
    ort_inputs = {
        ort_session.get_inputs()[0].name: inputs["input_ids"],
        ort_session.get_inputs()[1].name: inputs["attention_mask"]
    }
    ort_outputs = ort_session.run(None, ort_inputs)
    predictions = ort_outputs[0].tolist()

    return {
        "statusCode": 200,
        "body": json.dumps({"predictions": predictions})
    }

Serverless Config:

# functions/distilkobert/serverless.yml
provider:
  runtime: python3.12
  memorySize: 1024
  timeout: 30

functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest
    events:
      - http:
          path: inference
          method: post

Performance:

  • PyTorch: ~800ms per batch
  • ONNX Runtime: ~200ms per batch (4x faster)
  • Memory: 1024MB (sufficient for ONNX model)

3. WebSocket Chatbot with Intent System

The chatbot uses API Gateway WebSocket + DynamoDB for real-time conversation:

// functions/chat-app/handlers/message.js
const { sendMessageToClient } = require('../common/utils/apiGatewayClient');
const { makeApiRequest } = require('../common/utils/apiRequest');
const { getSessionData, saveChat, updatePendingFields } = require('../common/ddb/dynamoDbClient');

module.exports.handler = async (event) => {
  const connectionId = event.requestContext.connectionId;

  // Get orderId from connection
  const { orderId } = await getOrderIdByConnectionId(connectionId);

  // Parse client message
  const { action, clientMessage } = parseClientMessage(event.body);

  // Save client message to DynamoDB
  await saveChat(orderId, {
    timestamp: new Date().toISOString(),
    senderType: 'customer',
    message: clientMessage
  });

  // Get session data with pending fields
  const sessionData = await getSessionData(orderId);
  const pendingFields = sessionData.pendingFields;

  // Call LLM API (GPT-4o Mini)
  const requestData = createChatbotRequestMessage(clientMessage, pendingFields);
  const llmApiUrl = `${process.env.LLM_API_URL}?orderId=${orderId}`;
  const response = await makeApiRequest(llmApiUrl, requestData);

  // Parse LLM response
  const { botResponse, intent, updatedFields, language } = parseChatbotResponse(response);

  // Intent-based routing
  switch (intent) {
    case '1': // Latest logistics info request
      await sendMessageToClient(connectionId, botResponse, 'bot', language);
      break;

    case '2': // Provide missing logistics details
      // Validate city fields
      const hasUnknownCity = Object.entries(updatedFields).some(
        ([key, value]) => ['DepartureCity', 'ArrivalCity'].includes(key) && value === 'unknown'
      );

      if (hasUnknownCity) {
        await sendMessageToClient(
          connectionId,
          '지원하지 않는 도시가 포함되어 있습니다. 다시 입력해주세요.',
          'bot',
          language
        );
        break;
      }

      // Update pending fields with extracted data
      await updatePendingFields(orderId, updatedFields);

      // Translate field names and format values
      const updatedFieldsMessage = Object.entries(updatedFields)
        .map(([key, value]) => {
          const translatedKey = fieldTranslation[key] || key;
          const formattedValue = formatFieldValue(key, value);
          return `${translatedKey}: ${formattedValue}`;
        })
        .join('\n');

      await sendMessageToClient(
        connectionId,
        `정보를 다음과 같이 업데이트했습니다:\n${updatedFieldsMessage}\n\n` +
        `입력한 정보가 정확한지 확인해주세요. 맞다면 "예", 아니라면 "아니오"로 응답해주세요.`,
        'bot',
        language
      );
      break;

    case '4_yes': // Confirm provided information
      const confirmedFields = Object.fromEntries(
        Object.entries(pendingFields).filter(([_, value]) =>
          value !== 'omission' && value !== 'unknown'
        )
      );

      await updateResponsedDataAndRemovePendingFields(orderId, confirmedFields);
      await sendMessageToClient(
        connectionId,
        '정보가 성공적으로 업데이트되었습니다.',
        'bot',
        language
      );

      // Check if all fields are filled
      const sessionDataAfterUpdate = await getSessionData(orderId);
      const remainingFieldsMessage = generateMissingFieldsMessage(
        sessionDataAfterUpdate.pendingFields || {}
      );

      if (!remainingFieldsMessage.includes('입력되지 않은 정보는')) {
        // All fields complete → trigger quote calculation
        await sendMessageToClient(
          connectionId,
          '요청드린 모든 정보 제공에 협조해 주셔서 감사합니다! 담당자님의 이메일로 견적을 발송해드리겠습니다.',
          'bot',
          language
        );

        await invokeCompletionHandler(orderId);
      } else {
        await sendMessageToClient(connectionId, remainingFieldsMessage, 'bot', language);
      }
      break;

    case '4_no': // Reject and request re-input
      const fieldsToReset = Object.keys(pendingFields).filter(
        (key) => pendingFields[key] !== 'omission' && pendingFields[key] !== 'unknown'
      );
      await resetPendingFields(orderId, fieldsToReset);
      await sendMessageToClient(connectionId, '정보를 다시 입력해주세요.', 'bot', language);
      break;

    case '4_unknown': // Unclear response
      const filledFields = Object.entries(pendingFields)
        .filter(([_, value]) => value !== 'omission' && value !== 'unknown')
        .map(([key, value]) => `${fieldTranslation[key] || key}: ${formatFieldValue(key, value)}`)
        .join('\n');

      await sendMessageToClient(
        connectionId,
        `현재 다음 정보가 입력되었습니다:\n${filledFields}\n\n` +
        `입력한 정보가 정확한지 확인해주세요. 맞다면 "예", 아니라면 "아니오"로 응답해주세요.`,
        'bot',
        language
      );
      break;

    default:
      await sendMessageToClient(connectionId, '처리할 수 없는 요청입니다.', 'bot');
  }

  return { statusCode: 200 };
};

Serverless Config:

# functions/chat-app/serverless.yml
provider:
  runtime: nodejs20.x
  environment:
    CHAT_SESSIONS_TABLE_NAME: chat-app-CustomerChatSessions
    LLM_API_URL: https://${llmApiGatewayId}.execute-api.ap-northeast-2.amazonaws.com/dev/llm-interaction
    SQS_QUEUE_URL: https://sqs.ap-northeast-2.amazonaws.com/481665114066/chat-quotation-calculation-trigger

functions:
  connect:
    handler: handlers/connect.handler
    events:
      - websocket:
          route: $connect

  disconnect:
    handler: handlers/disconnect.handler
    events:
      - websocket:
          route: $disconnect

  message:
    handler: handlers/message.handler
    timeout: 30
    events:
      - websocket:
          route: sendMessage

  completion:
    handler: handlers/completion.handler

resources:
  Resources:
    CustomerChatSessionsTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: chat-app-CustomerChatSessions
        AttributeDefinitions:
          - AttributeName: orderId
            AttributeType: S
          - AttributeName: connectionId
            AttributeType: S
        KeySchema:
          - AttributeName: orderId
            KeyType: HASH
        BillingMode: PAY_PER_REQUEST
        StreamSpecification:
          StreamViewType: NEW_IMAGE
        GlobalSecondaryIndexes:
          - IndexName: ConnectionIndex
            KeySchema:
              - AttributeName: connectionId
                KeyType: HASH

Intent System:

  • 1: Logistics information request → Simple query response
  • 2: Missing field provision → Extract and validate data
  • 3: Other requests → Generic bot response
  • 4_yes: Confirmation → Update DynamoDB, trigger quotation if complete
  • 4_no: Rejection → Reset pending fields
  • 4_unknown: Unclear → Show current state and ask for clarification

4. LLM Interaction with LangChain Layer

GPT-4o Mini processes chat messages via DynamoDB Streams:

# functions/llm-interaction/serverless.yml
provider:
  runtime: nodejs20.x
  environment:
    TABLE_NAME: chat-app-CustomerChatSessions
    OPENAI_API_KEY: ${ssm:/path/to/your/openai/api/key}

functions:
  processChatSessions:
    handler: handler.processEvent
    layers:
      - { Ref: LangChainLayer }
    events:
      - stream:
          type: dynamodb
          arn: !GetAtt ChatAppCustomerChatSessions.StreamArn

layers:
  LangChainLayer:
    path: layer
    compatibleRuntimes:
      - nodejs20.x
    description: 'LangChain and AWS SDK for chatbot function'

Flow:

  1. Client sends message via WebSocket → DynamoDB
  2. DynamoDB Stream triggers llm-interaction Lambda
  3. LangChain processes with GPT-4o Mini
  4. Extract intent and fields
  5. Return response to WebSocket handler

5. Multilingual Support with AWS Translate

// Translate bot response to user's language
const { TranslateClient, TranslateTextCommand } = require("@aws-sdk/client-translate");

async function translateText(text, targetLanguage) {
  if (targetLanguage === 'ko') return text;

  const translateClient = new TranslateClient({ region: "ap-northeast-2" });
  const command = new TranslateTextCommand({
    Text: text,
    SourceLanguageCode: 'ko',
    TargetLanguageCode: targetLanguage  // 'en', 'ja', 'th'
  });

  const response = await translateClient.send(command);
  return response.TranslatedText;
}

Supported languages: Korean (default), English, Japanese, Thai


Frontend Implementation

React Dashboard with Zustand

// frontend/package.json
{
  "dependencies": {
    "react": "^18.3.1",
    "react-scripts": "5.0.1",
    "zustand": "^5.0.1",
    "aws-amplify": "^6.8.0",
    "chart.js": "^4.4.6",
    "react-chartjs-2": "^5.2.0",
    "mqtt": "^5.10.1",
    "aws-iot-device-sdk": "^2.2.15",
    "react-speech-recognition": "^3.10.0"
  }
}

Features:

  • Dashboard analytics: Email volume trends, classification accuracy (Chart.js)
  • Real-time chat: WebSocket connection to API Gateway
  • Voice input: react-speech-recognition for accessibility
  • IoT monitoring: MQTT subscription for shipment tracking (GPS, temperature, humidity)
  • State management: Zustand for lightweight global state (3KB vs Redux 12KB)

Performance Metrics

Email Processing Latency

Stage Time
Mail extraction + file decoding ~500ms
DistilKoBERT classification (ONNX) ~200ms
Save to DynamoDB ~100ms
Total pipeline <2 seconds

Classification Accuracy

  • Training dataset: 5,000 labeled emails
  • Test accuracy: 85%
  • Categories: Quote (45%), Order (30%), Spam (15%), Inquiry (10%)

Online Learning Performance

  • Batch size: 32 messages
  • Training time: ~5 minutes (2048MB Lambda, 1 epoch)
  • ONNX conversion: ~30 seconds
  • S3 upload: ~10 seconds

Cost Efficiency

Traditional EC2 vs Serverless Lambda:

EC2 (t3.medium, 24/7):
- Instance: $30/month
- Data transfer: $20/month
- Total: ~$50/month

Serverless Lambda:
- 100,000 email classifications: $5/month
- 10,000 chatbot interactions: $3/month
- 100 training runs: $2/month
- Total: ~$10/month

Savings: 80% cost reduction

Challenges & Solutions

Challenge 1: Cold Start Latency

Problem: Lambda cold starts caused 3-5 second delays for ONNX model loading.

Solution:

# Use ECR container images for faster cold starts
functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest
    memorySize: 1024  # Larger memory = faster cold start

Also implemented model caching in /tmp:

MODEL_DIR = "/tmp/model"

def download_model():
    # Check if model already exists in /tmp (persists across warm starts)
    if os.path.exists(os.path.join(MODEL_DIR, "distilkobert.onnx")):
        print("Model already cached in /tmp")
        return

    # Download from S3 only on cold start
    s3.download_file(S3_BUCKET, f"{MODEL_PREFIX}distilkobert.onnx", ...)

Result: Cold start reduced from 5s to 1.5s

Challenge 2: ZIP File Bomb Prevention

Problem: Malicious ZIP files could cause infinite recursion or memory exhaustion.

Solution:

function safeExtractZip(zipFile, maxDepth = 3, maxFiles = 100) {
  let extractedFiles = [];

  function extractRecursive(zipPath, currentDepth = 0) {
    if (currentDepth >= maxDepth) {
      console.warn(`Max depth reached: ${zipPath}`);
      return;
    }

    if (extractedFiles.length >= maxFiles) {
      console.warn(`Max files reached`);
      return;
    }

    const zip = new AdmZip(zipPath);
    for (const entry of zip.getEntries()) {
      // Security: Prevent path traversal
      if (entry.entryName.startsWith("..") || entry.entryName.startsWith("/")) {
        continue;
      }

      const extractedPath = zip.extractEntryTo(entry, "/tmp/extract");
      extractedFiles.push(extractedPath);

      // Recursively extract nested ZIPs
      if (entry.entryName.endsWith(".zip")) {
        extractRecursive(extractedPath, currentDepth + 1);
      }
    }
  }

  extractRecursive(zipFile);
  return extractedFiles;
}

Challenge 3: SQS Message Ordering

Problem: Email training data arrived out of order, causing label conflicts.

Solution: Use SQS FIFO queue with message deduplication:

resources:
  Resources:
    DynamoDBSQSQueue:
      Type: AWS::SQS::Queue
      Properties:
        QueueName: dynamodb-sqs-queue.fifo  # FIFO queue
        FifoQueue: true
        ContentBasedDeduplication: true
        VisibilityTimeout: 910

Team Collaboration

6-person team structure:

  • Backend (2): Email API, DynamoDB operations, SES integration
  • Frontend (2): React dashboard, chatbot UI, IoT visualization
  • AI/ML (1): DistilKoBERT fine-tuning, ONNX optimization (me)
  • Infrastructure (2): Serverless deployment, CI/CD, AWS architecture (me + 1)

My contributions:

  1. Designed and implemented DistilKoBERT training pipeline (85% accuracy)
  2. ONNX optimization (4x speedup: 800ms → 200ms)
  3. SQS-triggered online learning with ThreadPoolExecutor parallelization
  4. Architected 18-function serverless infrastructure
  5. CI/CD automation with Serverless Framework

Results & Recognition

  • Excellence Award at Samsung SDS Corporate Partnership Project
  • 85% classification accuracy on production email data
  • <2 second end-to-end email processing latency
  • 80% cost reduction vs traditional EC2 infrastructure
  • Successfully deployed and handling thousands of emails per day

Lessons Learned

1. ONNX Runtime is a Must for Production PyTorch Models

Exporting to ONNX reduced inference time by 4x (800ms → 200ms) with no accuracy loss. For serverless deployments where every millisecond counts, ONNX Runtime is essential.

# Training: PyTorch
model = AutoModelForSequenceClassification.from_pretrained(...)
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

# Inference: ONNX Runtime
torch.onnx.export(model, ...)
ort_session = ort.InferenceSession("distilkobert.onnx")

2. SQS Batching Enables Efficient Online Learning

Instead of training on every single email (expensive), SQS batching waits for 32 messages or 5 minutes:

events:
  - sqs:
      batchSize: 32
      maximumBatchingWindow: 300  # 5 minutes

This reduced training costs by 95% while maintaining model freshness.

3. DynamoDB Streams > SQS for Real-Time Processing

For chat sessions, DynamoDB Streams trigger LLM processing instantly with zero polling overhead:

events:
  - stream:
      type: dynamodb
      arn: !GetAtt ChatAppCustomerChatSessions.StreamArn

4. Serverless Framework > CloudFormation for Microservices

Managing 18 Lambda functions with raw CloudFormation would be painful. Serverless Framework's per-function serverless.yml files made it manageable:

functions/
  distilkobert/serverless.yml
  online-learning/serverless.yml
  chat-app/serverless.yml
  llm-interaction/serverless.yml
  ...

5. ECR Container Images > ZIP Deployments for ML Models

Lambda ZIP packages are limited to 250MB. Our ONNX model + dependencies exceeded this. ECR container images solved it:

functions:
  inference:
    image: 481665114066.dkr.ecr.ap-northeast-2.amazonaws.com/distilkobert-lambda:latest

Future Enhancements

  • Multi-model ensemble: Combine DistilKoBERT + GPT-4o for higher accuracy
  • Active learning: Prioritize uncertain emails for human labeling
  • A/B testing: Compare ONNX vs TensorRT inference
  • Multilingual classification: Train DistilKoBERT on English/Japanese/Thai emails
  • Cost optimization: Use Lambda SnapStart for faster cold starts

Try It Out

Source code: GitHub - samsungSDS_BusyBee

Project documentation: Notion - BusyBee Project

Key files to explore:


Questions or feedback? Connect with me on GitHub