Data Science / Machine Learning / Professional internship

Insurance Payment Anomaly Detection

PyTorch/Azure ML neural network for DSN company insurance anomaly detection.

Role
Data science intern
When
Jul 2023 — Dec 2023
Status
professional research project

A research-oriented ML project at Klesia to detect anomalies in company DSN declarations and improve the current non-ML anomaly detection algorithm.

From raw DSN to red flags

Company payroll declarations (DSN) go in; anomaly probabilities come out. The network below draws itself the way the project came together — data first, layers next, alerts last. (Illustrative — the real data is confidential.)

DSN features hidden layers (PyTorch) anomaly alert

The pipeline

  1. 01
    Extract

    Query Klesia's warehouses with SAS Enterprise Guide.

  2. 02
    Clean

    Turn messy DSN declarations into modelling datasets (SAS + Python).

  3. 03
    Train

    PyTorch neural networks on Azure ML experiments.

  4. 04
    Compare

    Benchmark against the existing rule-based detector to find where ML wins.

Features

  • DSN anomaly detection
  • Company insurance payment anomaly prediction
  • SAS/Python preprocessing pipeline
  • PyTorch neural network
  • Azure ML experimentation

What I did

  • Queried large Klesia datasets using SAS Enterprise Guide.
  • Created clean modelling datasets from raw DSN data.
  • Combined SAS Enterprise Guide and Python preprocessing.
  • Built and trained PyTorch neural network models in Azure ML.
  • Compared ML anomaly detection with existing business algorithms.

Project timeline

Data science internship timeline from SAS data extraction to dataset construction and PyTorch anomaly-detection experiments on Azure ML.

  1. Jul 2023 Planning / Research

    Research internship started

    Started the Klesia data science internship focused on company insurance payment anomaly detection.

  2. Jul 2023 Planning / Research

    SAS and database discovery

    Learned SAS Enterprise Guide and queried Klesia databases to understand available DSN data.

  3. Aug 2023 ~ Development

    Dataset created from raw data

    Created machine-learning datasets from uncleaned enterprise data using SAS and Python processing.

  4. Sep 2023 ~ Development

    Embedding and feature experiments

    Explored feature engineering and embedding techniques for DSN anomaly detection.

  5. Oct 2023 ~ Development

    PyTorch model on Azure ML

    Developed neural-network experiments with PyTorch and Azure ML to predict or detect payment anomalies.

  6. Dec 2023 Release / Delivery

    Final internship results

    Delivered the internship research work comparing ML-based anomaly detection against the existing non-ML algorithmic approach.

Built with

  • SAS Enterprise Guide
  • Python
  • PyTorch
  • Azure ML
  • Machine Learning
  • Data cleaning
  • Feature engineering