Dec 12, 202011 min read ☕ (Last updated: Dec 24, 2020)

About ME

Profile

Senior in Computer Engineering from KUT

Alternative Military Service Status : on-going (2020/11/27 ~ 2023/09/26)

CV : [PDF] (as of Dec. 2020)

Email kozistr@gmail.com
Github https://github.com/kozistr
Kaggle https://www.kaggle.com/kozistr
Linkedin https://www.linkedin.com/in/kozistr

Interests

  • Lots of challenges like Kaggle
  • (light-weighted) Single Image / Video Super Resolution (SISR)
  • End to End Speaker Diarization (E2E SD)

Previously, I'm also interested in offensive security, kind of Reverse Engineering, Linux Kernel Exploitation.


Challenges & Awards

Machine Learning

Hacking

  • Boot2Root CTF 2018 :: 2nd place (Demon + alpha)
  • Harekaze CTF 2017 :: 3rd place (SeoulWesterns)
  • WhiteHat League 1 (2017) :: 2nd place (Demon)

    • Awarded by 한국정보기술연구원 Received an award of $3,000

Work Experience

Company

Machine Learning Researcher, Watcha, (2020.06.22 ~ Present)

  • Working as a full time.
  • Developed a pipeline to recognize all tv & movie actors from the poster and still-cut images.

    • Utilized SOTA face detector & recognizer.
    • Optimized pre/post processing routines to consider inference time.
  • Developed a novel sequential recommendation architecture to recommend what content to watch next.

    • Achieved SOTA performance compared to previous SOTA models (SASRec, BERT4Rec).
    • In A/B (online) test (statistically significant p-value < 0.05)

      • Paid Conversion : improved 1.39%p
      • Viewing Days : improved 0.25%p
      • Viewing Minutes (median) : improved 4.10%
      • Click Ratio : improved 4.30%p
      • Play Ratio : improved 2.32%p
  • Developed Image Super Resolution model to upscale movie & tv poster, still-cut images.

    • Optimized the codes for fast inference time & memory-efficiency on cpu.
    • In internal evaluation (qualitative evaluation by the designers), it catches details better & handles higher resolution & takes a little time.

Machine Learning Engineer, Rainist, (2019.11.11 ~ 2020.06.19)

  • Worked as a full time.
  • Developed the card & bank account transaction category classification models, designed light-weight purpose for the low latency. (now on service)

    • In A/B (online) test (statistically significant p-value < 0.05)
    • *accuracy : improved about 25 ~ 30%p
  • Developed the machine learning model serving RESTful API server (utilizing k8s + open source project)

    • zero failure rate (0 40x, 50x errors)
  • Developed the classification model for forecasting possibility of loan overdue.

% *accuracy : how many people don't update/change their transactions' category.

Machine Learning Engineer, VoyagerX, (2019.01.07 ~ 2019.10.04)

  • Worked as an intern.
  • Developed speaker verification, diarization models & logic for recognizing the arbitrary speakers recorded from the noisy (real-world) environment.
  • Developed a hair image semantic segmentation / image in-paint / i2i domain transfer model for swapping hair domain naturally.

Penetration Tester, ELCID, (2016.07 ~ 2016.08)

  • Worked as a part-time job.
  • Penetrated some products related to network firewall and anti-virus product.

Out Sourcing

  • Developed Korean University Course Information Web Parser (About 40 Universities). 2 times, (2017.7 ~ 2018.3)
  • Developed AWS CloudTrail logger analyzer / formatter. (2019.09 ~ 2019.10)

Lab

HPC Lab, KoreaTech, Undergraduate Researcher, (2018.09 ~ 2018.12)

  • Wrote a paper about improved TextCNN model for predicting a movie rate.

Publications

Paper

[1] Kim et al, CNN Architecture Predicting Movie Rating, 2020. 01.

  • Wrote about the CNN Architecture, which utilizes a channel-attention method (SE Module) to TextCNN model, brings performance gain over the task while keeping its latency, generally.
  • Handling un-normalized text w/ various convolution kernel size and dropout

Conferences/Workshops

[1] kozistrteam, [NAVER NLP Challenge 2018 SRL Task](https://github.com/naver/nlp-challenge/raw/master/slides/Naver.NLP.Workshop.SRL.kozistrteam.pdf)

  • SRL Task, challenging w/o any domain knowledge. Presented about trails & errors during the competition

Journals

[1] zer0day, Windows Anti-Debugging Techniques (CodeEngn 2016) Sep. 2016. PDF

  • Wrote about lots of anti-reversing / debugging (A to Z) techniques avail on window executable binary

Posts

[1] kozistr (as a part of team, Dragonsong) towarddatascience

  • Wrote about audio classifier with deep learning based on the kaggle challenge where we participated

Personal Projects

Computer Languages

Python
> C/C++
Assembly (x86, x86-64, arm, ...)
> experienced with more than 10 languages

Machine Learning

Generative Models

  • GANs-tensorflow :: Lots of GAN codes :) :: Generative Adversary Networks

    • ACGAN-tensorflow :: Auxiliary Classifier GAN in tensorflow :: code
    • StarGAN-tensorflow :: Unified GAN for multi-domain :: code
    • LAPGAN-tensorflow :: Laplacian Pyramid GAN in tensorflow :: code
    • BEGAN-tensorflow :: Boundary Equilibrium in tensorflow :: code
    • DCGAN-tensorflow :: Deep Convolutional GAN in tensorflow :: code
    • SRGAN-tensorflow :: Super Resolution GAN in tensorflow :: code
    • WGAN-GP-tensorflow :: Wasserstein GAN w/ gradient penalty in tensorflow :: code
    • ... lots of GANs (over 20) :)

Super Resolution

  • Single Image Super Resolution :: Single Image Super Resolution (SISR)

    • rcan-tensorflow :: RCAN implementation in tensorflow :: code
    • ESRGAN-tensorflow :: ESRGAN implementation in tensorflow :: code
    • NatSR-pytorch :: NatSR implementation in pytorch :: code

I2I Translation

  • Improved Content Disentanglement :: tuned version of 'Content Disentanglement' in pytorch :: code

Style Transfer

  • Image-Style-Transfer :: Image Neural Style Transfer

    • style-transfer-tensorflow :: Image Style-Transfer in tensorflow :: code

Text Classification/Generation

  • movie-rate-prediction :: Korean sentences classification in tensorflow :: code
  • KoSpacing-tensorflow :: Automatic Korean sentences spacing in tensorflow :: code
  • text-tagging :: Automatic Korean articles categories classification in tensorflow :: code

Speech Synthesis

  • Tacotron-tensorflow :: Text To Sound (TTS)

    • tacotron-tensorflow :: lots of TTS models in tensorflow :: code

Speech Recognition :: Speech Recognition

  • [private] :: noisy acoustic speech recognition system in tensorflow :: code

Optimizer

  • AdaBound :: Optimizer that trains as fast as Adam and as good as SGD

    • AdaBound-tensorflow :: AdaBound Optimizer implementation in tensorflow :: code
  • RAdam :: On The Variance Of The Adaptive Learning Rate And Beyond in tensorflow

    • RAdam-tensorflow :: RAdam Optimizer implementation in tensorflow :: code

R.L

  • Rosseta Stone :: Hearthstone simulator using C++ with some reinforcement learning :: code

Plug-Ins

IDA pro plug-in - Golang ELF binary (x86, x86-64), RTTI parser

  • Recover stripped symbols & information and patch byte-codes for being able to hex-ray

Open Source Contributions

  • syzkaller :: New Generation of Linux Kernel Fuzzer :: Minor contribution #575
  • simpletransformers :: Transformers made simple w/ training, evaluating, and prediction possible w/ one line each. :: Minor contribution #290

Security, Hacking

CTFs, Conferences

  • POC 2016 Conference Staff
  • HackingCamp 15 CTF Staff, Challenge Maker
  • CodeGate 2017 OpenCTF Staff, Challenge Maker
  • HackingCamp 16 CTF Staff, Challenge Maker
  • POX 2017 CTF Staff, Challenge Maker
  • KID 2017 CTF Staff, Challenge Maker
  • Belluminar 2017 CTF Staff
  • HackingCamp 17 CTF Staff, Challenge Maker
  • HackingCamp 18 CTF Staff, Challenge Maker

Teams

Hacking Team, Fl4y. Since 2017.07 ~

Hacking Team, Demon by POC. Since 2014.02 ~ 2018.08


Presentations

2018

[2] Artificial Intelligence ZeroToAll, Apr 2018.

[1] Machine Learning ZeroToAll, Mar 2018.

2015

[1] Polymorphic Virus VS AV Detection, Oct 2015.

2014

[1] Network Sniffing & Detection, Oct, 2014.