The manifold hypothesis (real-world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high-dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNN-based classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness and find that it not only outperforms standard classifiers by a large margin but also performs at par with classifiers trained via well-accepted standard adversarial training.
2023
Preprint
Accurate Differential Operators for Hybrid Neural Fields
Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, and Bharath Hariharan
With the rise in popularity of social media platforms like Twitter, having higher influence on these platforms has a greater value attached to it, since it has the power to influence many decisions in the form of brand promotions and shaping opinions. However, blackmarket services that allow users to inorganically gain influence are a threat to the credibility of these social networking platforms. Twitter users can gain inorganic appraisals in the form of likes, retweets, and follows through these blackmarket services either by paying for them or by joining syndicates wherein they gain such appraisals by providing similar appraisals to other users. These customers tend to exhibit a mix of organic and inorganic retweeting behavior, making it tougher to detect them.In this article, we investigate these blackmarket customers engaged in collusive retweeting activities. We collect and annotate a novel dataset containing various types of information about blackmarket customers and use these sources of information to construct multiple user representations. We adopt Weighted Generalized Canonical Correlation Analysis (WGCCA) to combine these individual representations to derive user embeddings that allow us to effectively classify users as: genuine users, bots, promotional customers, and normal customers. Our method significantly outperforms state-of-the-art approaches (32.95% better macro F1-score than the best baseline).
NLP4MusA
Did You “Read” the Next Episode? Using Textual Cues for Predicting Podcast Popularity
Brihi Joshi, Shravika Mittal, and Aditya Chetan
In Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA), Oct 2020
Twitter’s popularity has fostered the emergence of various illegal user activities - one such activity is to artificially bolster visibility of tweets by gaining large number of retweets within a short time span. The natural way to gain visibility is time-consuming. Therefore, users who want their tweets to get quick visibility try to explore shortcuts - one such shortcut is to approach the blackmarket services, and gain retweets for their own tweets by retweeting other customers’ tweets. Thus the users intrinsically become a part of a collusive ecosystem controlled by these services. In this paper, we propose CoReRank, an unsupervised framework to detect collusive users (who are involved in producing artificial retweets), and suspicious tweets (which are submitted to the blackmarket services) simultaneously. CoReRank leverages the retweeting (or quoting) patterns of users, and measures two scores - the ’credibility’ of a user and the ’merit’ of a tweet. We propose a set of axioms to derive the interdependency between these two scores, and update them in a recursive manner. The formulation is further extended to handle the cold start problem. CoReRank is guaranteed to converge in a finite number of iterations and has linear time complexity. We also propose a semi-supervised version of CoReRank (called CoReRank+) which leverages a partial ground-truth labeling of users and tweets. Extensive experiments are conducted to show the superiority of CoReRank compared to six baselines on a novel dataset we collected and annotated. CoReRank beats the best unsupervised baseline method by 269% (20%) (relative) average precision and 300% (22.22%) (relative) average recall in detecting collusive (genuine) users. CoReRank+ beats the best supervised baseline method by 33.18% AUC. CoReRank also detects suspicious tweets with 0.85 (0.60) average precision (recall). To our knowledge, CoReRank is the first unsupervised method to detect collusive users and suspicious tweets simultaneously with theoretical guarantees.
2018
ASONAM
Retweet Us, We will Retweet You: Spotting Collusive Retweeters Involved in Blackmarket Services
Hridoy Sankar Dutta, Aditya Chetan, Brihi Joshi, and Tanmoy Chakraborty
In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Oct 2018