Hello world, I am Hao Tan (谭昊). I have joined Adobe Research in Aug 2021. I was a Ph.D. student at UNC CS department from 2016 to 2021, advised by Mohit Bansal. I was supported by Bloomberg Data Science Ph.D. Fellowship for my Ph.D. study. Before joining UNC, I received BS in CS from Shanghai Jiao Tong University. I was a member of ACM honored class.

My current research focus is 3D multimodal. I am working on the following problems: 1) (text-conditioning) 3D generation, 2) single-image reconstruction, 3) 3D representation learning with text supervision, 4) scalable embodied learning, 5) self training from simulators, e.t.c. I previously worked a lot on image-and-text understanding and pre-training. I am continuing investigating this direction. I am especially interested in three problems 1) what is the scalable way to build universal multimodal models? 2) can we use information from other modality to help language understanding? 3) multimodal large language model.

https://www.cs.unc.edu/~airsplay/

Publications

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

Van Nguyen, M., Dernoncourt, F., Yoon, D., Deilamsalehy, H., Tan, H., Rossi, R., Tran, Q., Bui, T., Nguyen, T. (Sep. 5, 2024)

Interspeech 2024

Building Vision-Language Models on Solid Foundations with Masked Distillation

Sameni, S., Kafle, K., Tan, H., Jenni, S. (Jun. 17, 2024)

CVPR 2024

Learning Navigational Visual Representations with Semantic Map Supervision

Hong, Y., Zhou, Y., Zhang, R., Dernoncourt, F., Bui, T., Gould, S., Tan, H. (Oct. 6, 2023)

ICCV 2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Lai, V., Salinas, A., Tan, H., Bui, T., Tran, Q., Yoon, D., Deilamsalehy, H., Dernoncourt, F., Nguyen, T. (Aug. 24, 2023)

Interspeech 2023