Zeyu Jin

Senior Research Scientist

San Francisco

Zeyu is a senior research scientist at Adobe Research in San Francisco. His research area is in deep generative models for speech, on topics such as studio-quality speech enhancement, speech quality assessment and personalized voice generation. He is also interested in HCI for audio applications and music generation.

He received a Ph.D. degree in computer science from Princeton University adviced by Adam Finkelstein and M.S in music technology in Carnegie Mellon University. Between 2015 and 2017, he interned at Adobe for three times and presented his primary research project – VoCo – at Adobe MAX Sneaks (link to video) in 2016.

Publications

Efficient Spoken Language Recognition Via Multilabel Classification

Nieto, O., Jin, Z., Dernoncourt, F., Salamon, J. (Aug. 24, 2023)

Interspeech 2023

Audio Similarity is Unreliable as a Proxy for Audio Quality

Manocha, P., Jin, Z., Finkelstein, A. (Sep. 18, 2022)

Interspeech 2022

HEAR: Holistic Evaluation of Audio Representations

Turian, J., Shier, J., Khan, H., Raj, B., Schuller, B., Steinmetz, C., Malloy, C., Tzanetakis, G., Velarde, G., McNally, K., Henry, M., Pinto, N., Noufi, C., Clough, C., Herremans, D., Fonseca, E., Engel, J., Salamon, J., Esling, P., Manocha, P., Watanabe, S., Jin, Z., Bisk, Y. (Jul. 20, 2022)

NeurIPS 2021

Music Enhancement via Image Translation and Vocoding

Kandpal, N., Nieto, O., Jin, Z. (May. 22, 2022)

ICASSP 2022

SQAPP: No-Reference Speech Quality Assessment Via Pairwise Preference

Manocha, P., Jin, Z., Finkelstein, A. (May. 22, 2022)

ICASSP 2022

Controllable Speech Representation Learning Via Voice Conversion and AIC Loss

Wang, Y., Su, J., Finkelstein, A., Jin, Z. (May. 22, 2022)

ICASSP 2022

Music Enhancement via Image Translation and Vocoding

Kandpal, N., Nieto, O., Jin, Z. (May. 8, 2022)

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Controllable deep melody generation via hierarchical music representation

Dai, S., Jin, Z., Gomes, C., Dannenberg, R. (Nov. 8, 2021)

International Society for Music Information Retrieval Conference

HiFi-GAN-2: Studio-quality speech enhancement via generative adversarial networks conditioned on acoustic features

Su, J., Jin, Z., Finkelstein, A. (Oct. 17, 2021)

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

CDPAM: Contrastive learning for perceptual audio similarity

Manocha, P., Jin, Z., Zhang, R., Finkelstein, A. (Jun. 9, 2021)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Bandwidth Extension is All You Need

Su, J., Wang, Y., Finkelstein, A., Jin, Z. (Jun. 9, 2021)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Context-Aware Prosody Correction for Text-Based Speech Editing

Morrison, M., Rencker, L., Jin, Z., Bryan, N., Caceres, J., Pardo, B. (Jun. 6, 2021)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Controllable Neural Prosody Synthesis

Morrison, M., Jin, Z., Salamon, J., Bryan, N., Mysore, G. (Oct. 26, 2020)

Interspeech 2020

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

Manocha, P., Finkelstein, A., Zhang, R., Bryan, N., Mysore, G., Jin, Z. (Oct. 26, 2020)

Interspeech 2020

Metric Learning vs Classification for Disentangled Music Representation Learning

Lee, J., Bryan, N., Salamon, J., Jin, Z., Nam, J. (Oct. 11, 2020)

International Society for Music Information Retrieval Conference (ISMIR)

Disentangled Multidimensional Metric Learning For Music Similarity

Lee, J., Bryan, N., Salamon, J., Jin, Z., Nam, J. (May. 4, 2020)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Text-based Editing of Talking-head Video

Fried, O., Tewari, A., Zollhofer, M., Finkelstein, A., Shechtman, E., Goldman, D., Genova, K., Jin, Z., Theobalt, C., Agarwala, M. (Aug. 1, 2019)

ACM Transactions on Graphics (Proc. SIGGRAPH'19)

FFTNet: a Real-Time Speaker-Dependent Neural Vocoder

Jin, Z., Finkelstein, A., Mysore, G., Lu, J. (Apr. 15, 2018)

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

VoCo: text-based insertion and replacement in audio narration

Jin, Z., Mysore, G., DiVerdi, S., Lu, J., Finkelstein, A. (Jul. 31, 2017)

ACM Transactions on Graphics (SIGGRAPH)

CUTE: a Concatenative Method for Voice Conversion Using Exemplar-based Unit Selection

Jin, Z., Finkelstein, A., DiVerdi, S., Lu, J., Mysore, G. (Mar. 1, 2016)

The 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

News