Dec 12, 2020 • 15 min read ☕ (Last updated: May 22, 2022)

About ME

Profile

Alternative Military Service Status : on duty (2020/11/27 ~ 2023/09/26)

CV : [PDF] (as of Mar. 2022)

Email kozistr@gmail.com
Github https://github.com/kozistr
Kaggle https://www.kaggle.com/kozistr
Linkedin https://www.linkedin.com/in/kozistr

Interests

  • Lots of real-world challenges like Kaggle
  • Audio/Speech Domains
    • End to End Speaker Diarization
    • Speaker Verifications
  • Computer Vision Domains
    • especially the medical domain

Previously, I was also interested in offensive security, kind of Reverse Engineering, Linux Kernel Exploitation.


Challenges & Awards

Machine Learning

Hacking

  • Boot2Root CTF 2018 :: 2nd place (Demon + alpha)

  • Harekaze CTF 2017 :: 3rd place (SeoulWesterns)

  • WhiteHat League 1 (2017) :: 2nd place (Demon)

    • Awarded by 한국정보기술연구원 Received an award of $3,000

Work Experience

Company

Data Scientist, Toss core, (2021.12.06 ~ present)

  • Working as full-time.
  • Developed the text classification model to categorize users' reviews.
    • Visualize the summarized trend of the keywords, which show the point of the opinion.
    • Boost to analyze the users who give feedback with rich information. (may help to boost NPS score)
  • Developed the robust captcha model to predict numeric captchas.
    • light-weighted CNN model for real-time inference (about ~ 1000 TPS for batch transaction, ~ 50 TPS for a sample on CPU)
    • Build augmentations which fit in the domain to build a robust model.
    • Save $10,000 ~ 30,000 / year
    • In A/B (online) test, google vision OCR vs New Captcha Model
      • Accuracy : improved 50%p (49% to 95%)
      • latency (p95) : reduced by x80 (about 1000ms to 12ms)
  • Developed the model to forecast the transactions' category to purchase next month & few weeks.
    • Transformer-based architecture (customed with newly proposed methods).
    • Calibration-aware training.
    • In A/B (online) test, previous ML model vs AdsClassifier (statistically significant p-value < 0.05)
      • Conversion : soon!
      • CTR : soon!
  • Developed the loan overdue prediction model for BNPL (CSS model)
    • EDA to find the useful features correlated with the overdue user.
    • Build the robustness CV & ensemble strategy in an aspect of the on/offline performance.
  • Developed the card category classification model.
    • Transformer-based architecture, about 900 TPS on a single GPU.
    • Handle noisy-text (transaction) & label, class-imbalanced problem.
  • Contributed to the team culture (e.g. collaboration tools, style-guides, etc).

Machine Learning Researcher, Watcha, (2020.06.22 ~ 2021.12.03)

  • Worked as full-time.
  • Developed a new sequential recommendation architecture. (named Trans4Rec)
    • Newly proposed transformer architecture to improve the performance in a genernal manner.
    • Apply proper post-processing logic into the model.
    • In A/B (online) test, FutureFLAT vs Trans4Rec (statistically significant p-value < 0.01)
      • Click Ratio : improved 1.01%
  • Developed a music recommendation system (prototype)
  • Developed a training recipe to train sequential recommendation architecture. (named FutureFLAT)
    • Build Future module to understand better at the time of inference.
    • Apply augmentations to the various features, leads to performance gain & robustness.
    • In A/B (online) test, FLAT vs FutureFLAT (statistically significant p-value < 0.05)
      • Compared to the previous model (FLAT), there’s no (statistically significant) improvments.
      • However, it still seems to be better on the offline metrics & training stability. So, we chose to use it.
    • In A/B (online) test, Div2Vec vs FutureFLAT (statistically significant p-value < 0.05)
      • *Viewing Days (mean) : improved 1.012%
      • *Viewing Minutes (median) : improved 1.015%
  • Developed a model to predict expected users' view-time of the contents.
    • Predict how many and how much time people are going to watch the content before the content supplied.
    • Find out which features impact users' watches.
  • Developed a pipeline to recognize main actors from the poster and still-cut images.
    • Utilize SOTA face detector & recognizer.
    • Optimize pre/post processing routines for low latency.
  • Developed a novel sequential recommendation architecture to recommend what content to watch next. (named FLAT)
    • In A/B (online) test, previous algorithms vs FLAT (statistically significant p-value < 0.05)
      • Paid Conversion : improved 1.39%p+
      • *Viewing Days (mean) : improved 0.25%p+
      • *Viewing Minutes (median) : improved 4.10%p+
      • Click Ratio : improved 4.30%p+
      • Play Ratio : improved 2.32%p+
  • Developed Image Super-Resolution model to upscale movie & tv poster, still-cut images.
    • Optimize the codes for low latency & memory-efficiency on CPU.
    • An internal evaluation (qualitative evaluation by the designers), catches details better & handles higher resolution & takes a little time.

% *Viewing Days : how many days are users active on an app each month.

% *Viewing Minutes : how many minutes the user watched the contents.

Machine Learning Engineer, Rainist, (2019.11.11 ~ 2020.06.19)

  • Worked as full-time.
  • Developed the card & bank account transaction category classification models, designed light-weight purpose for the low latency. (now on service)
    • In A/B (online) test (statistically significant p-value < 0.05)
      • *Accuracy : improved about 25 ~ 30%p
  • Developed the RESTful API server to serve (general purpose) machine learning models.
    • about 1M MAU service, 500K ~ 1M transactions / day (1 transaction = (median) about 100 samples).
    • Utilized inference-aware framework (onnx) to reduce the latency.
      • median 100 ~ 200ms / transaction.
    • zero failure rate (0 40x, 50x errors)
    • Deployed & managed with Kubernetes, utilized open source project.
  • Developed the classification model for forecasting the possibility of loan overdue.

% *Accuracy : how many people don't update/change their transactions' category.

Machine Learning Engineer, VoyagerX, (2019.01.07 ~ 2019.10.04)

  • Worked as an intern.
  • Developed speaker verification, diarization models & logic for recognizing the arbitrary speakers recorded from the noisy (real-world) environment.
  • Developed a hair image semantic segmentation / image in-paint / i2i domain transfer model for swapping hair domains naturally.

Penetration Tester, ELCID, (2016.07 ~ 2016.08)

  • Worked as a part-time job.
  • Penetrated some products related to network firewall and anti-virus products.

Out Sourcing

  • Developed Korean University Course Information Web Parser (About 40 Universities). 2 times, (2017.7 ~ 2018.3)
  • Developed AWS CloudTrail logger analyzer / formatter. (2019.09 ~ 2019.10)

Lab

HPC Lab, KoreaTech, Undergraduate Researcher, (2018.09 ~ 2018.12)

  • Wrote a paper about an improved TextCNN model to predict a movie rate.

Publications

Paper

[1] Kim et al, CNN Architecture Predicting Movie Rating, 2020. 01.

  • Wrote about the CNN Architecture, which utilizes a channel-attention method (SE Module) to TextCNN model, brings performance gain over the task while keeping its latency, generally.
  • Handling un-normalized text with various convolution kernel sizes and spatial dropout
  • Selected as one of the highlight papers for the first half of 2020

Conferences/Workshops

[1] kozistr_team, presentation NAVER NLP Challenge 2018 SRL Task

  • SRL Task, challenging w/o any domain knowledge. Presented about trials & errors during the competition

Journals

[1] zer0day, Windows Anti-Debugging Techniques (CodeEngn 2016) Sep. 2016. PDF

  • Wrote about lots of anti-reversing / debugging (A to Z) techniques avail on window executable binary

Posts

[1] kozistr (as a part of team, Dragonsong) towarddatascience

  • Wrote about audio classifier with deep learning based on the Kaggle challenge where we participated

Personal Projects

Machine/Deep Learning

Generative Models

  • GANs-tensorflow :: Lots of GAN codes :) :: Generative Adversary Networks

    • ACGAN-tensorflow :: Auxiliary Classifier GAN in tensorflow :: code
    • StarGAN-tensorflow :: Unified GAN for multi-domain :: code
    • LAPGAN-tensorflow :: Laplacian Pyramid GAN in tensorflow :: code
    • BEGAN-tensorflow :: Boundary Equilibrium in tensorflow :: code
    • DCGAN-tensorflow :: Deep Convolutional GAN in tensorflow :: code
    • SRGAN-tensorflow :: Super-Resolution GAN in tensorflow :: code
    • WGAN-GP-tensorflow :: Wasserstein GAN w/ gradient penalty in tensorflow :: code
    • ... lots of GANs (over 20) :)

Super Resolution

  • Single Image Super Resolution :: Single Image Super-Resolution (SISR)

    • rcan-tensorflow :: RCAN implementation in tensorflow :: code
    • ESRGAN-tensorflow :: ESRGAN implementation in tensorflow :: code
    • NatSR-pytorch :: NatSR implementation in pytorch :: code

I2I Translation

  • Improved Content Disentanglement :: tuned version of 'Content Disentanglement' in pytorch :: code

Style Transfer

  • Image-Style-Transfer :: Image Neural Style Transfer

    • style-transfer-tensorflow :: Image Style-Transfer in tensorflow :: code

Text Classification/Generation

  • movie-rate-prediction :: Korean sentences classification in tensorflow :: code
  • KoSpacing-tensorflow :: Automatic Korean sentences spacing in tensorflow :: code
  • text-tagging :: Automatic Korean articles categories classification in tensorflow :: code

Speech Synthesis

  • Tacotron-tensorflow :: Text To Sound (TTS)

    • tacotron-tensorflow :: lots of TTS models in tensorflow :: code

Optimizer

  • pytorch-optimizer :: Bunch of optimizer implementations in PyTorch

    • pytorch_optimizer :: Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas. Most of the implementations are based on the original paper, but I added some tweaks. :: code
  • AdaBound :: Optimizer that trains as fast as Adam and as good as SGD

    • AdaBound-tensorflow :: AdaBound Optimizer implementation in tensorflow :: code
  • RAdam :: On The Variance Of The Adaptive Learning Rate And Beyond in tensorflow

    • RAdam-tensorflow :: RAdam Optimizer implementation in tensorflow :: code

R.L

  • Rosseta Stone :: Hearthstone simulator using C++ with some reinforcement learning :: code

Open Source Contributions

Plug-Ins

IDA-pro plug-in - Golang ELF binary (x86, x86-64), RTTI parser

  • Recover stripped symbols & information and patch byte-codes for being able to hex-ray

Security, Hacking

CTFs, Conferences

  • POC 2016 Conference Staff
  • HackingCamp 15 CTF Staff, Challenge Maker
  • CodeGate 2017 OpenCTF Staff, Challenge Maker
  • HackingCamp 16 CTF Staff, Challenge Maker
  • POX 2017 CTF Staff, Challenge Maker
  • KID 2017 CTF Staff, Challenge Maker
  • Belluminar 2017 CTF Staff
  • HackingCamp 17 CTF Staff, Challenge Maker
  • HackingCamp 18 CTF Staff, Challenge Maker

Teams

Hacking Team, Fl4y. Since 2017.07 ~

Hacking Team, Demon by POC. Since 2014.02 ~ 2018.08


Educations

Senior in Computer Engineering from KUT

Presentations

2018

[2] Artificial Intelligence ZeroToAll, Apr 2018.

[1] Machine Learning ZeroToAll, Mar 2018.

2015

[1] Polymorphic Virus VS AV Detection, Oct 2015.

2014

[1] Network Sniffing & Detection, Oct, 2014.