profile

Sanghyuk Chun

I am currently serving as a lead research scientist at ML Research team in NAVER AI Lab, where my focus lies in the domains of machine learning, multi-modal learning (e.g., vision-language, language-audio, and audio-visual), and computer vision. At NAVER, my primary research goal aims to the development of generalizable machine learning models to challenging yet practical scenarios. Prior to joining NAVER, I held a position as a research engineer at KAKAO Corp from 2016 to 2018, where my work focused on recommendation systems and machine learning applications.

NOTICE! NAVER AI Lab is hiring research scientists (full-time) and internship students (maximum 6 months). Please check the job description in the ML Research introduction page for more details. Our other teams (Backbone Research, Generation Research, Language Research and HCI Research) are also hiring!

News

See older news
  • 12/2023 : Giving a talk at Dankook University (topic: "Probabilistic Image-Text Representations") [slide]
  • 12/2013: We finally released SynthTriplets18 dataset!
  • 11/2013: Being nominated as NeurIPS 2023 top reviewers (10%).
  • _9/2023 : Giving a talk at HUST AI Summer School on "Modern Machine Learning: Foundations and Applications" (topic: "Probabilistic Image-Text Representations") [slide]
  • _9/2023 : 1 paper [PCME++ short] is accepted at the non-archival track of ICCV 2023 Workshop on Closing The Loop Between Vision And Language (CLVL).
  • _8/2023 : Giving a talk at Yonsei University (topic: "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion") [slide]
  • _7/2023 : 1 paper [SeiT] is accepted at ICCV 2023.
  • _7/2023 : Serving as a TMLR Action Editor.
  • _6/2023 : Being nominated as a TMLR Expert Reviewer.
  • _6/2023 : Giving a talk at Sogang University (topic: "Probabilistic Image-Text Representations") [slide]
  • _5/2023 : Serving as an area chair at NeurIPS 2023 Datasets and Benchmarks Track.
  • _4/2023 : We released "Graphit: A Unified Framework for Diverse Image Editing Tasks" [GitHub] [Graphit], The technical report will be released soon!
  • _4/2023 : 1 paper [3D-Pseudo-Gts] is accepted at CVPR 2023 Workshop on Computer Vision for Mixed Reality (CV4MR).
  • _1/2023 : 1 paper [FairDRO] is accepted at ICLR 2023.
  • _9/2022 : Giving a talk at Sogang University (topic: "ECCV Caption") [slide]
  • _9/2022 : 1 paper [MSDA theorem] is accepted at NeurIPS 2022.
  • _8/2022 : Starting a new chapter in life with Song Park πŸ€΅β€οΈπŸ‘°.
  • _7/2022 : 1 paper [LF-Font journal] is accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
  • _7/2022 : 2 papers [ECCV Caption] [MIRO] are accepted at ECCV 2022.
  • _7/2022 : Giving a talk at UNIST AIGS (topic: "Towards Reliable Machine Learning: Challenges, Examples, Solutions") [slide]
  • _6/2022 : Giving a tutorial on "Shortcut learning in Machine Learning: Challenges, Analysis, Solutions" at FAccT 2022. [ tutorial homepage | slide | video ]
  • _5/2022 : Receiving an outstanding reviewer award at CVPR 2022 [link].
  • _5/2022 : 1 paper [DCC] is accepted at ICML 2022.
  • _4/2022 : 1 paper [WSOL Eval journal] is accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
  • _4/2022 : Organizing ICLR 2022 ML in Korea Social
  • _3/2022 : Giving guest lectures at KAIST and SNU (topic: "Towards Reliable Machine Learning") [slide]
  • _3/2022 : Co-organizing FAccT 2022 Translation/Dialogue Tutorial: "Shortcut learning in Machine Learning: Challenges, Analysis, Solutions" (slides, videos and web pages will be released soon)
  • _3/2022 : 1 paper [CGL] is accepted at CVPR 2022.
  • _2/2022 : Giving a talk at POSTECH AI Research (PAIR) ML Winter Seminar 2022 (topic: "Shortcut learning in Machine Learning: Challenges, Examples, Solutions") [slide]
  • _1/2022 : 2 papers [ViDT] [WCST-ML] are accepted at ICLR 2022.
  • 12/2021 : Co-hosting NeurIPS'21 workshop on ImageNet: Past, Present, and Future with 400+ attendees!
  • 12/2021 : Giving a talk at University of Seoul (topic: "Realistic challenges and limitations of AI") [slide]
  • 11/2021 : Giving a talk at NAVER and NAVER Labs Europe (topic: Mitigating dataset biases in Real-world ML applications) [slide]
  • 11/2021 : Giving a guest lecture at UNIST (topic: Limits and Challenges in Deep Learning Optimizers) [slide]
  • 10/2021 : Releasing an unified few-shot font generation framework! [code]
  • _9/2021 : 2 papers [SWAD] [NHA] are accepted at NeurIPS 2021.
  • _8/2021: Reaching a research milestone of 1,000 citations at Google Scholar and Semantic Scholar!
  • _7/2021 : Co-organizing the NeurIPS Workshop on ImageNet: Past, Present, and Future! [webpage]
  • _7/2021 : 2 papers [MX-Font] [PiT] are accepted at ICCV 2021.
  • _7/2021 : Giving a talk at Computer Vision Centre (CVC), UAB (topic: PCME and AdamP) [info] [slide]
  • _6/2021 : Giving a talk at KSIAM 2021 (topic: AdamP). [slide]
  • _6/2021 : Giving a guest lecture at Seoul National University (topic: few-shot font generation) .[slide]
  • _5/2021 : Receiving an outstanding reviewer award at CVPR 2021 [link].
  • _4/2021 : 1 paper [LF-Font] is accepted at CVPR 2021 workshop (also appeared at AAAI).
  • _3/2021 : 2 papers [PCME] [ReLabel] are accepted at CVPR 2021.
  • _1/2021 : 1 paper [AdamP] is accepted at ICLR 2021.
  • 12/2020 : 1 paper [LF-Font] is accepted at AAAI 2021.
  • _7/2020 : 1 paper [DM-Font] is accepted at ECCV 2020.
  • _6/2020 : Receiving the best paper runner-up award at AICCW CVPR 2020.
  • _6/2020 : Receiving an outstanding reviewer award at CVPR 2020 [link].
  • _6/2020 : Giving a talk at CVPR 2020 NAVER interative session.
  • _6/2020 : 1 paper [ReBias] is accepted at ICML 2020.
  • _4/2020 : 1 paper [DM-Font short] is accepted at CVPR 2020 workshop.
  • _2/2020 : 1 paper [wsoleval] is accepted at CVPR 2020.
  • _1/2020 : 1 paper [HCNN] is accepted at ICASSP 2020.
  • 10/2019 : 1 paper [HCNN short] is accpeted at ISMIR late break demo.
  • 10/2019 : Working at Naver Labs Europe as a visiting researcher (Oct - Dec 2019)
  • _7/2019 : 2 papers [CutMix] [WCT2] are accepted at ICCV 2019 (1 oral presentation).
  • _6/2019 : Giving a talk at ICML 2019 Expo workshop.
  • _5/2019 : 2 papers [MTSA] [RegEval] are accepted at ICML 2019 workshops (1 oral presentation).
  • _5/2019 : Giving a talk at ICLR 2019 Expo talk.
  • _3/2019 : 1 paper [PRM] is accepted at ICLR 2019 workshop.

Research: Scalable and Reliable Machine Learning with Language-guided Representation Learning

Ensuring the real-world applicability of machine learning (ML) models poses a primary challenge, namely, the ability to generalize effectively to unseen scenarios encountered beyond the training phase. There are three prominent scenarios frequently encountered in practical applications: (1) when input data significantly differs from the training data; (2) when the model faces the target behavior beyond the scope of training targets, such as unexplored labels; and (3) when the application needs human opinions or subjective value judgments. Addressing all three scenarios relies on more than just massive large-scale datasets; it demands the inclusion of human knowledge that extends beyond web-crawled content. Yet, the question remains: How can we effectively integrate large-scale training and human knowledge guidance? To answer the question, my research aims to develop large-scale ML models exhibiting greater controllability and interpretability, thereby enabling human intervention to guide model behavior, even beyond the training phase. My work revolves around three main research themes towards this goal: Language-combined Representation Learning, Machine learning reliability and Optimization techniques for large-scale ML.

More detailed statement can be found in my research statement.

Click here to read more about my research

Language-combined Representation Learning. Language serves as the most natural method for encoding human knowledge. If our ML model can comprehend human language alongside the target modality, we can understand the model better by interventing the space with human language. However, as language descriptions are the product of conscious choices of the key relevant concepts to report from input data, language-combined representation learning methods often suffer from the multiplicity (or many-to-many problem) between modalities. My recent works address this problem through understanding and addressing the multiplicity problem by probabilistic representation learning. In this paradigm, an input is mapped to a probabilistic distribution, rather than a deterministic vector. This approach enhances the interpretability of datasets and user controllability. Likewise, as another possible direction, I have explored adding more information or modality to the existing language-X models which enables more controllability. Furthermore, I have worked on establishing a robust evaluation framework for vision-language models in terms of their multiplicity and robustness.

How can we make a model comprehend human language alongside the target modality? To answer the question, I recently have worked on text-conditioned diffusion models. Especially, I am interested in utilizing the power of recent diffusion models for text-conditioned feature transforms or data augmentation. However, we need more versatility and controllability to adopt diffusion models to the desired tasks, e.g., localized conditions via providing region masks. My recent works have focused on the versatility and controllability of diffusion models, and applying diffusion models to non-generative downstream tasks, such as composed image retrieval (CIR) tasks.

Machine learning reliability. Existing machine learning models cannot understand the problem itself [Shortcut learning tutorial]. This causes many realistic problems, such as discrimination by machines, poor generalizability to unseen (or minor) corruptions / environments / groups. Current state-of-the-art machines only do "predict", rather than "logical thinking based on logical reasoning". As models prefer to learn by shortcuts [WCST-ML], just training models as usual will lead to biased models. One of my research interest is to investigate these phenomena with various tools.

If it is difficult to make machines understand the problem itself, what can we do? Our model should not learn undesirable shortcut features [ReBias] [StyleAugment], or should be robust to unseen corruptions [CutMix] [RegEval] [ReLabel] [PiT] or significant distribution shifts [SWAD] [MIRO]. Also we need to make a machine not discriminative to certain demographic groups [CGL] [FairDRO]. We expect a model says "I don't know" when they get unexpected inputs [PCME] [PCME++]. At least, we expect a model can explain why it makes a such decision [MTSA] [MTSA WS] [WSOL eval] [WSOL Eval journal], how different model design choices will change model decisions [NetSim] and how it can be fixed (e.g., More data collection? More annotations? Filtering?). My research focuses on expanding machine knowledge from "just prediction" to "logical reasoning". Especially, my recent researches have contentrated to tackle various generalization down stream tasks, such as de-biasing, domain generalization, algorithmic fairness and adversarial robustness.

Correct and fair evaluation is crucial for research development. However, existing evaluation protocols and metrics often lack reliability in measuring how machines learn proper knowledge. I also have actively engaged in addressing this issue by working with fair evaluation benchmarks and metrics.

Optimization techniques for large-scale ML. Last but not least, I have actively worked on developing general optimization techniques for large-scale machine learning models, including data augmentation, optimizer, network architecture, objective function. My research emphasizes two key objectives: empirical impact and theoretical soundness. Especially, my aim is to develop easy-to-use techniques that seamlessly function as plug-and-play solutions.

Lastly, I also have worked on domain specific optimization techniques by utilizing properties of the given data, e.g., compositionaly of Korean/Chinese letters, low- and high- frequency information for better audio understanding, or harmonic information for multi-source audio understanding.


Publications

(C: peer-reviewed conference, W: peer-reviewed workshop, A: arxiv preprint, O: others)
(authors contributed equally)

See also at my Google Scholar.


Code and Data

Datasets

  • CUB v2 and OpenImages 30k [C5] [J2]: These datasets were designed and collected for the WSOL evaluation project. We newly collected bird images for CUB v2 (200 classes) and re-organize OpenImages V5 for OpenImages 30k (100 classes)
  • Biased MNIST, 9-Class ImageNet and ImageNet cluster labels [C6]: We proposed these datasets for measuring how ML models can be generalized to bias shift. Biased MNIST is a synthetic dataset based on MNIST, while each image has background colors which highly correlates with the labels (controllable). ImageNet-9 contains 9 super-classes (dog, cat, frog, turtle, bird, monkey, fish, crab, insect) and 57k training samples. We also proposes "unbiased accuracy" by using cluster labels (K=9), which empicially matches to Shell, Grass, Close-up, Eye, Human, Sand and Mammal.
  • CUB Caption [C11]: CUB was not majorly used for image-text matching (ITM) retrieval. However, while doing the PCME project, we proposed to use the CUB dataset as a ITM benchmark for measuring the impact of many-to-many correspondence; if an image and a description are in the same class, then we treat it as "positive" otherwise "negative". We carefully devide the training-validation (150 classes) and test splits (50 classes) following Xian et al. 2017.
  • ECCV Caption [C21]: Although CUB Caption can evaluate the impact of many-to-many correspondence in ITM, this dataset is still "synthetic". For a more practical usage, we proposed the ECCV Caption benchmark, by correcting the false negatives (FNs) in the MS-COCO Caption dataset. By machine and human annotators, we collected x8.47 positive images and x3.58 positive captions compared to the original COCO Caption.
  • RoCOCO [W9]: Is an ITM model robust to a malicious manipulation on captions or images? Here, we investigated the impact of altered captions (same concept, different concept, rand voca attack, and dangerous word) and mixed images. Even state-of-the-art ITM models showed substantial performance degradations on our benchmark.

Softwares

  • Few-shot Font Generation Benchmark.
    • Song Park, Sanghyuk Chun
    • open source. code
  • Graphit: A Unified Framework for Diverse Image Editing Tasks.
    • Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Sangdoo Yun, Yoohoon Kang
    • open source. code | demo

Academic Activities

Professional Service

  • Conference Area Chair:
    • ICLR 2025
    • AISTATS 2025
    • NeurIPS 2024
    • NeurIPS Dataset and Benchmark (D&B) track 2023-2024
  • Tutorial / Workshop / Social Organizer:
    • FAccT 2022 Translation/Dialogue Tutorial: "Shortcut learning in Machine Learning: Challenges, Analysis, Solutions"
    • NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future
      • Co-organized by Zeynep Akata, Lucas Beyer, Sanghyuk Chun, Almut Sophia Koepke, Diane Larlus, Seong Joon Oh, Rafael Sampaio de Rezende, Sangdoo Yun, Xiaohua Zhai
  • Outstanding reviewer:

Awards

  • Top 10% reviewer, NeurIPS 2023
  • TMLR Expert Reviewer (2023)
  • Outstanding reviewer award, CVPR 2022
  • Outstanding reviewer award, CVPR 2021
  • Outstanding reviewer award, CVPR 2020
  • Best paper runner-up award, AI for Content Creation Workshop at CVPR 2020

Talks

Mentoring and Teaching

Mentees / Short-term post-doctoral collaborators / Internship students

Topics: Reliable ML Vision-Language Modality-specific tasks Generative models Other topics

  • _    Sehyun Kwon (Seoul National University, 2024) -- VL representation learning
  • _    Jaeyoo Park (Seoul National University, 2024) -- VL representation learning
  • _    Jungin Park (Visiting researcher, 2024) -- VL representation learning
  • _    Yujin Jeong (Korea University, 2024) [W10] -- AVL representation learning
  • _ _ Heesun Bae (KAIST, 2023) -- VL representation learning under noisy environment
  • _    Jungbeom Lee (Visiting researcher, 2023) [C28] -- VL representation learning
  • _    Eunji Kim (Seoul National University, 2022) -- XAI + Probabilistic Machine (the internship project is published at ICML 2023 [paper])
  • _    Jaehui Hwang (Yonsei University, 2022) [C30] -- Adversarial robustness and XAI
  • _    Chanwoo Park (Seoul National University, 2021-2022) [C22] -- Deep learning theory
  • _    Gyeongsik Moon (Visiting researcher, 2022) [W7] -- Semi-supervised learning for 3D Human Mesh Estimation
  • _    Hongsuk Choi (Visiting researcher, 2022) [W7] -- Semi-supervised learning for 3D Human Mesh Estimation
  • _ _ Seulki Park (Seoul National University, 2022) [W9] -- VL robustness benchmark
  • _    Saehyung Lee (Seoul National University, 2021-2022) [C19] -- Data condensation
  • _    Sangwon Jung (Seoul National University, 2021-2023) [C18] [C19] -- Fairness with not enough group labels, group fairness
  • _    Luca Scimeca (A short-term post-doctoral collaborator, 2021) [C16] [C14] -- Understanding shortcut learning phenomenon in feature space
  • _    Michael Poli (KAIST, 2021) [C14] [C16] -- Neural hybrid automata
  • _    Hyemi Kim (KAIST, 2021) -- Test-time training for robust prediction
  • _    Jun Seo (KAIST, 2021) -- Self-supervised learning
  • _    Song Park (Yonsei University, 2020-2021) [C8/W6] [C12] [A4] [J2] -- Few-shot font generation
  • _    Hyojin Bahng (Korea University, 2019) [C6] -- De-biasing
  • _    Junsuk Choe (Yonsei University, 2019) [C5] [J1] -- Reliable evaluation for WSOL
  • _    Naman Goyal (IIT RPR, 2019) -- Robust representation against shift
  • _    Minz Won (Music Technology Group, Universitat Pompeu Fabra, 2018-2019) [W2] [W4] [A2] [C4] -- Audio representation learning
  • _    Byungkyu Kang (Yonsei University, 2018) [C2] -- Image-to-image translation and style transfer
  • _    Jang-Hyun Kim (Seoul National University, 2018) [A1] -- Audio representation learning
  • _    Jisung Hwang (University of Chicago, 2018) [W1] -- Adversarial robustness
  • _    Younghoon Kim (Seoul National University, 2018) [W1] -- Adversarial robustness

Guest lectures

  • "Probabilistic Image-Text Representations", Sogang University (2023). [slide]
  • "ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO", Sogang University (2022). [slide]
  • "Towards Reliable Machine Learning", Seoul National University (2022). [slide]
  • "Towards Reliable Machine Learning", KAIST (2022). [slide]
  • "Limits and Challenges in Deep Learning Optimizers", UNIST (2021). [slide]
  • "Towards Few-shot Font Generation", Seoul National University (2021). [slide]
  • "Reliable Machine Learning in NAVER AI", Yonsei University (2020). [slide]
Industry Experience

NAVER AI Lab (2018 ~ Now)

  • Hangul
    Hangul
    DM-Font teasor
    Hangul Handwriting Font Generation

    Distributed at 2019 Hangul's day (ν•œκΈ€λ‚ ), [Full font list]

    • Hangul (Korean alphabet, ν•œκΈ€) originally consists of only 24 sub-letters (γ„±, γ…‹, γ„΄, γ„·, γ…Œ, ㅁ, γ…‚, ㅍ, γ„Ή, γ……, γ…ˆ, γ…Š, γ…‡, γ…Ž, γ…‘, γ…£, γ…—, ㅏ, γ…œ, γ…“, γ…›, γ…‘, γ… , γ…•), but by combining them, there exist 11,172 valid characters in Hangul. For example, "ν•œ" is a combination of γ…Ž, ㅏ, and γ„΄, and "쐰" is a combination of γ……, γ……, γ…—, γ…£, and γ„΄. It makes generating a new Hangul font be very expensive and time-consuming. Meanwhile, since 2008, Naver has distributed Korean fonts for free (named Nanum fonts, λ‚˜λˆ” κΈ€κΌ΄).
    • In 2019, we developed a technology for fully-personalized Hangul generation only with 152 characters. We opened an event page where users can submit their own handwriting. The full generated font list can be found in [this link]. Details for the generation technique used for the service was presented in Deview 2019 [Link].
    • This work was also extended to the few-shot generation based on the compositionality. See the papers in AI for Content Creation Workshop (AICCW) at CVPR 2020 (short paper) [Link], ECCV 2020 (full paper) [Link], AAAI 2021 [Link], ICCV 2021 [Link], and journal extension [Link].
    • [BONUS] You can play with my handwriting here
  • example sticker
    Example emoji from LINE sticker shop.
    Emoji Recommendation (LINE Timeline)

    Deployed in Jan. 2019

    • LINE is a major messenger player in east asia (Japan, Taiwan, Thailand, Indonesia, and Korea). In the application, users can buy and use numerous emoijs a.k.a. LINE Sticker.
    • In this project, we recommended emojis to users based on their profile picture (cross-domain recommendation).
    • I developed and researched the entire pipeline of the cross-domain recommendation system and operation tools.

Kakao Advanced Recommendation Technology (ART) team (2016 ~ 2018)

  • Kakao
    Recommender Systems (Kakao services)

    Feb. 2016 - Feb. 2018

    • I developed and maintained a large-scale real-time recommender system (Toros [PyCon Talk] [AI Report]) for various services in Daum and Kakao. I mainly worked with content-based representation modeling (for textual, visual, and musical data), collaborative filtering modeling, user embedding, user clustering, and ranking system based on Multi-armed Bandit.
    • Textual domain: Daum News similar article recommendation, Brunch (blog service) similar post recommendation, Daum Cafe (community service) hit item recommendation.
    • Visual domain: Daum Webtoon and Kakao Page similar item recommendation, video recommendation for a news article (cross-domain recommendation).
    • Audio domain: music recommendation for Kakao Mini (smart speaker), Melon and Kakao Music.
    • Online to offline: Kakao Hairshop style recommendation.
  • IPPN
    System overview.
    Personalized Push Notification with User History (Daum, Kakao Page)

    Deployed in 2017

    • The mobile push service (or alert system) is widely-used in mobile applications to attain a high user retention rate. However, a freqeunt push notification makes a user feel fatigue, resulting on the application removal. Usually, the push notification system is a rule-based system, and managed by human labor. In this project, we researched and developed a personalized push notification system based on user activity and interests. The system has been applied to Daum an Kakao Page mobile applications. More details are in our paper.
  • Daum Shopping
    Large-Scale Item Categorization in e-Commerce (Daum Shopping)

    Deployed in 2017

    • An accurate categorization helps users to search desired items in e-Commerce based on the category, e.g., clothes / shoes / sneakers. However, the categorization is usually performed based on rule-based systems or human labor, which leads to low coverage of categorized items. Even the automatic item categorization is difficult due to its web-scale data size, the highly unbalanced annotation distribution, and noisy labels. I developed a large-scale item categorization system for Daum Shopping based on a deep network, from the operation tool to the categorization API.

Internship

  • Naver Labs
    Research internship (Naver Labs)

    Aug. 2015 - Dec. 2015

    • During the internship, I implemented batch normalization (BN) to AlexNet, Inception v2 and VGG on ImageNet using Caffe. I also researched batch normalization for sequential models, e.g., RNN using Lua Torch.
  • IUM-SOCIUS
    Software engineer (IUM-SOCIUS)

    Jun. 2012 - Jan. 2013

    • I worked as web developer at IUM-SOCIUS. During the internship, I developed and maintained internal batch services (JAVA spring batch), internal statistics service (Python Flask, MongoDB), internal admin tools (Python Django, MySQL), and main service systems (JAVA spring, Ruby on Rails, MariaDB).
Education and Career
  • M.S. (2014.03 - 2016.02), School of Electrical Engineering, KAIST
  • B.S. (2009.03 - 2014.02), School of Electrical Engineering and School of Management Science (double major), KAIST