|
Hsin-Ping Huang
I am a Research Engineer at Apple, where I work on multimodal generative modeling to power Apple Intelligence's image generation experiences.
Prior to this, I received my Ph.D. from University of California, Merced, under the supervision of Prof. Ming-Hsuan Yang in the Vision and Learning Lab. I completed my M.S. in Computer Science from The University of Texas at Austin and my B.S. in Electrical Engineering from National Taiwan University.
My research interests lie in computer vision and machine learning, with a focus on image, video and 3D generation and manipulation. I am fortunate to intern at Google with Deqing Sun, Yu-Chuan Su, Hexiang Hu, Lu Jiang, Charles Herrmann, Yaojie Liu, and Xinyi Wang, at Adobe Research with Zhan Xu and Yang Zhou, and to receive advice from Hung-Yu Tseng and Jia-Bin Huang.
I am honored to be a finalist for the Meta PhD Research Fellowship.
Email /
CV /
Google Scholar /
LinkedIn /
GitHub
|
National Taiwan University B.S. in EE 2013 - 2017
|
University of Texas at Austin M.S. in CS 2017 - 2020
|
University of California, Merced Ph.D. in EECS 2020 - 2025
|
Amazon Applied Scientist Intern May 2020 - Aug. 2020
|
Google Research / DeepMind Student Researcher May 2021 - May 2024
|
Adobe Research Research Intern Jun. 2024 - Nov. 2024
|
Apple Research Engineer Aug. 2025 - Present
|
|---|
|
|
|
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities
Hsin-Ping Huang,
Xinyi Wang,
Yonatan Bitton,
Hagai Taitelbaum,
Gaurav Singh Tomar,
Ming-Wei Chang,
Xuhui Jia,
Kelvin C.K. Chan,
Hexiang Hu,
Yu-Chuan Su,
Ming-Hsuan Yang
arXiv 2024
KITTEN is a benchmark for evaluating text-to-image models' ability to generate real-world visual entities, highlighting that even advanced models struggle with entity fidelity.
Project Page /
Paper
|
|
|
Generating Long-take Videos via Effective Keyframes and Guidance
Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang
WACV 2025
We propose a framework for generating long-take videos with multiple coherent events by decoupling video generation into keyframe generation and frame interpolation.
Paper /
Media (AK)
|
|
|
Unsupervised and Semi-Supervised Few-Shot Acoustic Event Classification
Hsin-Ping Huang, Krishna C. Puvvada, Ming Sun, Chao Wang
ICASSP 2021
We study semi-supervised few-shot acoustic event classification, learning audio representations from a large amount of unlabeled data and using these representations for classification.
Paper
|
|
|
Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification
Hsin-Ping Huang, Junyi Jessy Li
CoNLL 2019
We present an unsupervised adversarial domain adaptive network with a reconstruction component that leverages explicit discourse relations to classify implicit discourse relations.
Paper
|
Professional Activities
Journal Reviewer: TPAMI, IJCV, CVIU, Computer Graphics Forum
Conference Reviewer (Computer Vision): ECCV'24, ICCV'23, CVPR'23, ECCV'22, CVPR'22, ICCV'21
Conference Reviewer (Artificial Intelligence): IJCAI'24, AAAI'24, IJCAI'23, AAAI'23
|
This page borrows designs from Jon Barron's website.
|
|