I’m interested in Natural Language Processing, particularly representation learning and information extraction. I’m also interested in applications to social media. Most recently I’ve worked on developing robust user representations for author verification and retrieval.
news
Nov 2, 2021
I’ll be a co-lead for the SCALE 2022 workshop at JHU next summer. The workshop will focus on AuthorID related research.
Aug 25, 2021
Our paper in collaboration with Lawrence Livermore National Lab has been accepted to EMNLP ‘21!
Jun 1, 2021
I’ll be working on dense retrieval methods for Cross-Lingual Information retrieval at SCALE this summer.
selected publications
Learning Universal Authorship Representations
Rivera-Soto, Rafael A.,
Miano, Olivia Elizabeth,
Ordonez, Juanita,
Chen, Barry Y.,
Khan, Aleem,
Bishop, Marcus,
and Andrews, Nicholas
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
2021
Determining whether two documents were composed by the same author, also known as authorship verification, has traditionally been tackled using statistical methods. Recently, authorship representations learned using neural networks have been found to outperform alternatives, particularly in large-scale settings involving hundreds of thousands of authors. But do such representations learned in a particular domain transfer to other domains? Or are these representations inherently entangled with domain-specific features? To study these questions, we conduct the first large-scale study of cross-domain transfer for authorship verification considering zero-shot transfers involving three disparate domains: Amazon reviews, fanfiction short stories, and Reddit comments. We find that although a surprising degree of transfer is possible between certain domains, it is not so successful between others. We examine properties of these domains that influence generalization and propose simple but effective methods to improve transfer.
@inproceedings{rivera-soto-etal-2021-learning,bibtex_show={true},selected={true},title={Learning Universal Authorship Representations},author={Rivera-Soto, Rafael A. and Miano, Olivia Elizabeth and Ordonez, Juanita and Chen, Barry Y. and Khan, Aleem and Bishop, Marcus and Andrews, Nicholas},booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},month=nov,year={2021},address={Online and Punta Cana, Dominican Republic},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2021.emnlp-main.70},pages={913--919}}
A Deep Metric Learning Approach to Account Linking
Khan, Aleem,
Fleming, Elizabeth,
Schofield, Noah,
Bishop, Marcus,
and Andrews, Nicholas
In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2021
We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and meta-data of the corresponding document streams. We focus on learning an embedding that maps variable-sized samples of user activity–ranging from single posts to entire months of activity–to a vector space, where samples by the same author map to nearby points. Our approach does not require human-annotated data for training purposes, which allows us to leverage large amounts of social media content. The proposed model outperforms several competitive baselines under a novel evaluation framework modeled after established recognition benchmarks in other domains. Our method achieves high linking accuracy, even with small samples from accounts not seen at training time, a prerequisite for practical applications of the proposed linking framework.
@inproceedings{khan-etal-2021-deep,bibtex_show={true},selected={true},title={A Deep Metric Learning Approach to Account Linking},author={Khan, Aleem and Fleming, Elizabeth and Schofield, Noah and Bishop, Marcus and Andrews, Nicholas},booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},month=jun,year={2021},address={Online},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2021.naacl-main.415},doi={10.18653/v1/2021.naacl-main.415},pages={5275--5287}}