Learning to Bridge Colloquial and Formal Language Applied to Linking and Search of E-Commerce Data
We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We present the utility of the new MiLDA topic model in a recently proposed information retrieval task of linking Pinterest pins to online webshops . We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.
Check out our paper
Learning to Bridge Colloquial and Formal Language Applied to Linking and Search of E-Commerce Data
ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘14)
Ivan Vulic, Susana Zoghbi and Sien Moens
You May Also Like
Inferring User Interests on Social Media From Text and Images
We propose to infer user interests on social media where multi-modal data (text, image etc.) exist. We leverage user-generated data from Pinterest.com as a natural expression of users’ interests. Our main contribution is exploiting a multi-modal space composed of images and text. This is a natural approach since humans express their interests with a combination of modalities. We performed experiments using the state-of-the-art image and textual representations, such as convolutional neural …
Cross-Modal Fashion Search
In this paper we show an online demo that allows bidrectional multimodal queries for garments. Check out our paper Cross-Modal Fashion Search In Lecture Notes in Computer Science (LNCS) Vol. 9517, pp 367-373, 2016 Susana Zoghbi, Geert Heyman, Juan Carlos Gomez, Sien Moens PDF
Are words enough?: a study on text-based representations and retrieval models for linking pins to online shops
User-generated content offers opportunities to learn about people’s interests and hobbies. We can leverage this infor- mation to help users find interesting shops and businesses find interested users. However this content is highly noisy and unstructured as posted on social media sites and blogs. In this work we evaluate different textual representations and retrieval models that aim to make sense of social media data for retail applications. Our task is to link the text of pins (from …
