Database Exploration Laboratory

The Database Exploration Lab (DBXLab), \investigates fundamental research issues arising in Big Data. Our research encompasses diverse areas such as

  • data mining
  • information retrieval
  • data uncertainty and probabilistic methods
  • approximate query processing
  • data summarization
  • data analytics and data exploration of hidden web databases
  • social and collaborative media.

Personnel

  • Gautam Das, Ph.D., Director

Current Students

  • Farhadur Rahman (Ph.D.)
  • Sona Hasani (Ph.D.)
  • Jees Augustine (Ph.D.)
  • Sadia Ahmed (Ph.D.)
  • Md. Abdus Salam (Ph.D.)
  • Mary Koone (Ph.D.)
  • Shohedul Hasan (Ph.D.)
  • Suraj Suresh Shetiya (Ph.D.)
  • Adtiya Mone (Ph.D.)
  • Yeshwanth D. Gunasekaran (M.S.)

Alumni

  • Abolfazl Asudeh, Ph.D., 2017 (University of Michigan)
  • Habibur Rahman, Ph.D. , 2017 (eCommerce group at Walmart Labs)
  • Azade Nazi, Ph.D., 2016 (Data Management, Exploration and Mining group at Microsoft Research)
  • Rajeshkumar Kannapali, M.S., 2016 (Cisco Jasper IoT)
  • Saravanan Thirumuruganathan, Ph.D., 2015 (Qatar Computing Research Institute)
  • Mahashweta Das, Ph.D., 2013 (HP Labs)
  • Shrikant Desai, M.S., 2012 (eBay, Inc.)
  • Senjuti Basu Roy, Ph.D., 2011 (University of Washington at Tacoma)
  • Arjun Dasgupta, Ph.D., 2010 (Cross Commerce Media)
  • Muhammed Miah, Ph.D., 2009 (UNC Chapel Hill)
  • Haidong Wang, M.S., 2008 (Microsoft)
  • Amrita Tamrakar, M.S., 2007 (Verizon)
  • Zubin Joseph, M.S., 2006 (Yahoo)
  • Bhushan Chaudhari, M.S., 2006 (Microsoft)
  • Sushruth Puttaswamy, M.S. (Cisco Systems)

Current Research Projects

Projects in P2P MarketPlaces

  • Skyline over Categorical Domains

    Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out while enabling numerous applications.

  • Assisting Service Providers

    Peer to peer marketplaces enable transactional exchange of services directly between people. In such platforms, those providing a service are faced with various choices. For example in travel peer to peer marketplaces, although some amenities (attributes) in a property are fixed, others are relatively flexible and can be provided without significant effort. Providing an attribute is usually associated with a cost. Naturally, different sets of attributes may have a different “gains” (monetary or otherwise) for a service provider. Consequently, given a limited budget, deciding which attributes to offer is challenging. In this project, we propose techniques that help service providers in decision making.

Projects In Location-Based Services

  • Aggregate Estimations over Location Based Services:

    Location-based services have become very popular in recent years. They range from map services (e.g., Google Maps) that store geographic locations of points of interests, to online social networks (e.g., WeChat, Sina Weibo, FourSquare) that leverage user geographic locations to enable various recommendation functions. The public query interfaces of these services may be abstractly modeled as a kNN interface over a database of two dimensional points on a plane: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. In this paper we consider the problem of obtaining approximate estimates of SUM and COUNT aggregates by only querying such databases via their restrictive public interfaces.

  • Density-based Clustering over Location Based Services

    An LBS provides a public (often web-based) search interface over its backend database (of tuples with 2D geolocations), taking as input a 2D query point and returning k tuples in the database that are closest to the query point, where k is usually a small constant such as 20 or 50. In this project, we consider a novel problem of enabling density based clustering over the backend database of an LBS using nothing but limited access to the kNN interface provided by the LBS. In order to address the various types of restrictions imposed by the LBS, our goal here is to mine from the LBS a cluster assignment function f(·), such that for any tuple t in the database (which may or may not have been accessed), f(·) can produce the cluster assignment of t with high accuracy.

Projects in Top-k Representatives

  • Regret-ratio Minimizing Set

    Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. A critical observation is that the convex hull is the subset of tuples that can be used to find the maxima of any linear function. However, in real world applications the convex hull can be a significant portion of the database, and thus its performance is greatly reduced. Thus, computing a subset limited to r tuples that minimizes the regret ratio (a measure of the user’s dissatisfaction with the result from the limited set versus the one from the entire database) is of interest. We make several fundamental theoretical as well as practical advances in developing such a compact set.

Projects In Hidden Web Databases

  • Privacy Implications of Database Ranking

    In recent years, there has been much research in the adoption of Ranked Retrieval model (in addition to the Boolean retrieval model) in structured databases, especially those in a client-server environment (e.g., web databases). With this model, a search query returns top-k tuples according to not just exact matches of selection conditions, but a suitable ranking function. While much research has gone into the design of ranking functions and the efficient processing of top-k queries, this paper studies a novel problem on the privacy implications of database ranking. The motivation is a novel yet serious privacy leakage we found on real-world web databases which is caused by the ranking function design.

  • Query Reranking over Hidden Web Databases

    The ranked retrieval model has rapidly become the de facto way for search query processing in client-server databases, especially those on the web. Despite of the extensive efforts in the database community on designing better ranking functions/mechanisms, many such databases in practice still fail to address the diverse and sometimes contradicting preferences of users on tuple ranking, perhaps (at least partially) due to the lack of expertise and/or motivation for the database owner to design truly effective ranking functions. This project takes a different route on addressing the issue by defining a novel query reranking problem.

  • Rank Analytics over Hidden Databases

    Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. The top-k output constraint prevents many interesting third-party (e.g., mashup) services from being developed over real-world web databases. This research involves developing effective techniques for retrieving more than top-k tuples for any query and support additional rank based analytics such as estimating the rank of a tuple or compare the rank of two arbitrary tuples to determine which of them is highly ranked. Our techniques access the hidden structured databases via their public interfaces and operate without any knowledge of the underlying static ranking function.

  • Suppressing Sensitive Aggregates over Hidden Web Databases

    The objective of this project is to understand, evaluate, and contribute towards the suppression of sensitive aggregates over hidden databases. While owners of hidden databases would like to allow individual search queries, many also want to maintain a certain level of privacy for aggregates over their hidden databases. This has implications in the commercial domain (e.g., to prevent competitors from gaining strategic advantages) as well as in homeland-security related applications (e.g., to prevent potential terrorists from learning flight occupancy distributions). This project investigates techniques to suppress the sensitive aggregates while maintaining the usability of hidden databases for bona fide search users.

  • Data Analytics over Hidden Web Databases

    Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. Although search interfaces for hidden databases are designed with focused search queries in mind, for certain applications it may be advantageous to infer more aggregated views of the data from the returned results of search queries. This research involves developing effective techniques for performing data analytics, especially sampling, over hidden structured databases via their public interfaces. The outcomes include efficient algorithms for sampling hidden databases with a heterogeneous mix of data types, achievability results for sampling different types of search interfaces, and a prototypical toolset which demonstrates the sampling of real-world hidden databases.

Projects In Social Computing

  • Collaborative Social Content Mining

    The widespread use and growing popularity of online collaborative content sites today has created rich resources for users to consult in order to make purchasing decisions on various items such as movies, restaurants, e-commerce products, etc. It has also created new opportunities for content producers of such web items to design new improved items, compose eye-catching advertisement snippets, etc. in order to improve business. This project concerns developing data mining and exploration algorithms for performing aggregate analytics over user feedback (ratings, tags, likes, visits, etc.) available from collaborative content sites in order to benefit experience and decision making of both content producers and consumers. The key challenges exist in the form of information explosion and overload, besides user-item interaction intractability.

  • Group Recommendation

    The ever-expanding volume and increasing complexity of information on the web has made recommendation systems essential tools for users in a variety of information seeking or e-commerce activities. Moreover, new research suggests that every digital comment made by users anywhere - a product review, social book-marking, tweets, blogs, activities on a social network site, e-mails can be mined for hints as to emotions and other thoughts. In this body of work, we intend to design novel query answering models considering the paradigm of recommendation. Our previous and ongoing works in that space consider novel recommendation problems, such as recommending items to a group of users, recommending composite items to a user, and so on. In the modeling of these problems, we tap into the latent social information sources and leverage that in a principled way to enhance query-answering tasks, and analyze that information for future learning and opportunities.

Projects In Crowdsourcing

  • Knowledge Intensive Crowdsourcing

    Crowdsourcing systems have gained popularity in a variety of domains. The next generation crowdsourcing systems will be collaborative and knowledge-intensive in nature. They need to treat the crowdsourcing problem not in optimization silos, but as an adaptive optimization problem by seamlessly handling the three main crowdsourcing processes (worker skill estimation, task assignment, task evaluation) and incorporating the uncertainty stemming from human factors. The main thrust behind this project is to develop algorithms for such an adaptive, knowledge-intensive crowdsourcing scenario by quantifying and incorporating the human factors into the three major crowdsourcing processes.

Selected Publications

2017

  • Md. Farhad Rahman, Abolfazl Asudeh, Nick Koudas, Gautam Das: Efficient Computation of Subspace Skyline over Categorical Domains. Proceedings of CIKM, 2017.
  • Abolfazl Asudeh, Azade Nazi, Nan Zhang, Gautam Das: Efficient Computation of Regret-Ratio Minimizing Set: A Compact Maxima Representative. Proceedings of the ACM SIGMOD, 2017.
  • M. F. Rahman, W. Liu, S. Bin Suhaim, S. Thirumuruganathan, N. Zhang and G. Das: HDBSCAN: Density based Clustering over Location Based Services. Proceedings of the IEEE ICDE, 2017.
  • S. Bin Suhaim, N. Zhang, G. Das and A. Jaoua: HDBExpDetector: Aggregate Sudden-Change Detector over Dynamic Web Databases. Demo paper, IEEE ICDE, 2017.
  • (Keynote Lecture) Gautam Das: Deep Web Mining. In IEEE ICCA, 2017.
  • (Invited Paper) Azade Nazi, Abolfazl Asudeh, Gautam Das, Nan Zhang, Ali Jaoua: Mobiface: Mobile App for Faceted Search over Hidden Web Databases. Invited book chapter in Recent Trends in Computer Applications, Springer Verlag, 2018.
  • Habibur Rahman, Senjuti Basu Roy, Gautam Das: A Probabilistic Framework for Estimating Pairwise Distances Through Crowdsourcing. Proceedings of EDBT, 2017.

2016

  • (Invited Paper) Gautam Das: Aggregate Tracking over Dynamic Deep Web Databases. BIG-PUBSUB 2016 Workshop held in conjunction with ACM DEBS 2016.
  • Davide Mottin, Alice Marascu, Senjuti Basu Roy, Themis Palpanas, Yannis Velegrakis, Gautam Das: A Holistic and Principled Approach for the Empty-Answer Problem, In VLDB Journal 2016.
  • Abolfazl Asudeh, Nan Zhang, Gautam Das: Query Reranking As A Service. In PVLDB, 2016.
  • Abolfazl Asudeh, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das: Discovering the Skyline of Web Databases. In PVLDB, 2016.
  • Rajeshkumar Kannapalli, Azade Nazi, Mahashweta Das, Gautam Das: AD-WIRE: Add-on for Web Item Reviewing System. Demo paper, in PVLDB, 2016.
  • Kosetsu Ikeda, Atsuyuki Morishima, Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, and Gautam Das: Collaborative Crowdsourcing with Crowd4U. Demo paper, in PVLDB, 2016.
  • M. Farhad Rahman, Saad Bin Suhaim, Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das: ANALOC: Efficient ANAlytics over LOCation Based Services. Demo paper, in IEEE ICDE 2016.
  • Zhuojie Zhou, Nan Zhang, Zhiguo Gong, Gautam Das: Faster Random Walks By Rewiring Online Social Networks On-The-Fly. In ACM Transactions on Database Systems (TODS), 2015.

2015

  • (Invited Paper) Yachao Lu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das: Hidden Database Research and Analytics (HYDRA) System. In IEEE Data Engineering Bulletin, 38(3), 2015.
  • Habibur Rahman, Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das: Task Assignment Optimization in Collaborative Crowdsourcing. To appear in ICDM 2015.
  • (Invited Paper) Azade Nazi, Saravanan Thirumuruganathan, Vagelis Hristidis, Nan Zhang, Gautam Das. Querying Hidden Attributes in an Online Community Network with Social Sensing Applications. In Proc. SocialSens 2015, held in conjunction with IEEE MASS 2015.
  • (Keynote Lecture) Gautam Das: Principled Optimization Frameworks for Query Reformulation of Database Queries. In ExploreDB 2015, held in conjunction with SIGMOD 2015.
  • Weimo Liu, M. Farhad Rahman, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das: Aggregate Estimations over Location Based Services. In PVLDB 2015.
  • Mahashweta Das and Gautam Das: Structured Analytics in Social Media. Tutorial, in PVLDB 2015.
  • Habibur Rahman, Saravanan Thirumuruganathan, Senjuti Basu Roy, Sihem Amer-Yahia, and Gautam Das: Worker Skill Estimation in Team-Based Tasks. In PVLDB 2015.
  • Zhuojie Zhou, Nan Zhang, and Gautam Das: Leveraging History for Faster Sampling of Online Social Networks. In PVLDB 2015.
  • M. Farhad Rahman, Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das: Privacy Implications of Database Ranking. In PVLDB 2015.
  • Saravanan Thirumuruganathan, Habibur Rahman, Sofiane Abbar, and Gautam Das: Beyond Itemsets: Mining Frequent Featuresets over Structured Items. In PVLDB 2015.
  • Azade Nazi, Zhuojie Zhou, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das: Walk, Not Wait: Faster Sampling Over Online Social Networks. In PVLDB 2015.
  • Azade Nazi, Mahashweta Das, and Gautam Das: The TagAdvisor: Luring the Lurkers to Review Web Item. In SIGMOD 2015.
  • Azade Nazi, Saravanan Thirumuruganathan, Vagelis Hristidis, Nan Zhang, and Gautam Das: Answering Complex Queries in an Online Community Network. Poster paper, in ICWSM 2015.
  • Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das: Task-Assignment Optimization in Knowledge Intensive Crowdsourcing. In VLDB Journal, 2015.
  • (Invited Short Course) Gautam Das: Mining Deep Web Repositories. In BigDat 2015.
  • Alexios Kotsifakos, Alexandra Stefan, Vassilis Athitsos, Gautam Das, and Panagiotis Papapetrou: DRESS: Dimensionality Reduction for Efficient Sequence Search. In the Data Mining and Knowledge Discovery Journal (DAMI) (also presented at PKDD 2015).

2014

  • Azade Nazi, Saravanan Thirumuruganathan, Vagelis Hristidis, Nan Zhang, Khaled Shaban, and Gautam Das: Query Hidden Attributes in Social Networks. In IDP 2014, held in conjunction with ICDM 2014.
  • Naeemul Hassan, Huadong Feng, Ramesh Venkataraman, Gautam Das, Chengkai Li, Nan Zhang: Anything You Can Do, I Can Do Better: Finding Expert Teams by CrewScout. Demo paper, in CIKM 2014.
  • Milad Eftekhar, Saravanan Thirumuruganathan, Gautam Das, and Nick Koudas: Price Trade-offs in Social Media Advertising. In Proceeding of ACM Conference on Online Social Networks (COSN) 2014.
  • Gautam Das: Exploration and Mining of Web Repositories. Tutorial, at COMAD 2014.
  • (Keynote Lecture) Gautam Das: Data Exploration and Analytics in Social Media and the Deep Web. In IEEE APWC 2014.
  • Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das: Aggregate Estimation Over Dynamic Hidden Web Databases. In PVLDB 2014.
  • Weimo Liu, Saad Bin Suhaim, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das, and Ali Jaoua: HDBTracker: Aggregate Tracking and Monitoring Over Dynamic Web Databases. Demo paper, in PVLDB 2014.
  • Saravanan Thirumuruganathan, Nan Zhang, Vagelis Hristidis, and Gautam Das: Aggregate Estimation Over a Microblog Platform. In Proc. of SIGMOD 2014.
  • Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, Yannis Velegrakis: IQR: An Interactive Query Relaxation System for the Empty-Answer Problem. Demo paper, in SIGMOD 2014.
  • Nan Zhang and Gautam Das. Exploration and Mining of Web Repositories. Tutorial at WSDM 2014.
  • (Invited Paper) Gautam Das: Mining and Analytics of Deep Web Repositories. In ICAA 2014.
  • Senjuti Basu Roy, Saravanan T., Sihem Amer-Yahia, Gautam Das, and Cong Yu. Exploiting Group Recommendation Functions for Flexible Preferences. In Proc. of ICDE 2014.
  • Sofiane Abbar, Habibur Rahman, Saravanan T., Carlos Castillo, and Gautam Das. Ranking Item Features by Mining Online User-Item Interactions. In Proc. of ICDE 2014.
  • Mahashweta Das, Saravanan T., Sihem Amer-Yahia, Gautam Das and Cong Yu. An Expressive Framework and Efficient Algorithms for the Analysis of Collaborative Tagging. In VLDB Journal Special Issue 2014 on Best of VLDB 2012.

2013

  • Davide Mottin, Alice Marascu, Senjuti Basu Roy, Gautam Das, Themis Palpanas, and Yannis Velegrakis: A Probabilistic Optimization Framework for the Empty-Answer Problem. In PVLDB 2013 (to be presented at VLDB 2014).
  • Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das: Rank Discovery From Web Databases. In PVLDB 2013 (to be presented at VLDB 2014).
  • (Conference Best Student Paper) Mingyang Zhang, Nan Zhang, and Gautam Das: Mining a Search Engine's Corpus Without a Query Pool. Full paper, CIKM 2013.
  • Mahashweta Das, Habibur Rahman, Vagelis Hristidis, and Gautam Das: Generating Informative Snippet to Maximize Item Visibility. Short paper, CIKM 2013
  • Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia and Gautam Das: Crowds, not Drones: Modeling Human Factors in Interactive Crowdsourcing. DBCrowd 2013 held in conjunction with VLDB 2013.
  • Saravanan Thirumuruganathan, Nan Zhang, Gautam Das: Breaking the Top-k Barrier of Hidden Web Databases. In Proc. of ICDE 2013.
  • Zhuojie Zhou, Nan Zhang, Zhiguo Gong, Gautam Das: Faster Random Walks By Rewiring Online Social Networks On-The-Fly. In Proc. of ICDE 2013.
  • Chengkai Li, Nan Zhang, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das: On Skyline Groups. To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2013.
  • Manos Papagelis, Gautam Das, Nick Koudas: Sampling Online Social Networks. To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2012.
  • Alexandra Stefan, Vassilis Athitsos, and Gautam Das:The Move-Split-Merge Metric for Time Series. To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2012

2012

  • Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das and Cong Yu: Who Tags What? An Analysis Framework. In PVLDB 2012. (Also in VLDBJ special issue on Best of VLDB 2012)
  • Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das and Cong Yu: MapRat: Meaningful Explanation, Interactive Exploration and Geographic Visualization of Collaborative Ratings. Demo paper, in PVLDB 2012.
  • (Keynote Lecture) Gautam Das: Analytics over Deep Web Repositories. In WebDB 2012, held in conjunction with SIGMOD 2012.
  • Chengkai Li, Nan Zhang, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das: On Skyline Groups. In Proc. of CIKM 2012.
  • Mingyang Zhang, Nan Zhang, and Gautam Das: Aggregate Suppression for Enterprise Search Engines, Full paper, in Proc. of ACM SIGMOD 2012.
  • (Keynote Lecture) Gautam Das: Ranking and Top-k Problems in Collaborative Media and Deep Web Databases. In DBRank 2012, held in conjunction with VLDB 2012.
  • Nan Zhang and Gautam Das: Mining Deep Web Repositories. Tutorial, in ECML-PKDD 2012.
  • N. Zhang, L.O'Neill, G. Das, X. Cheng, and H. Heng: No Silver Bullet: Identifying Security Vulnerabilities In Anonymization Protocols for Hospital Databases. To appear in International Journal of Healthcare Information Systems and Informatics.
  • Foto Afrati, Gautam Das, Aristides Gionis, Heikki Mannila, Taneli Mielikainen, and Panayiotis Tsaparas: Mining Chains of Relations. Book Chapter, Data Mining: Foundations and Intelligent Paradigms, Chapter 11, ISRL 24, Volume 2, 2012.
  • Senjuti Basu Roy, Gautam Das, Sajal Das: Algorithms for Computing Best Coverage Path in the Presence of Obstacles in a Sensor Field. To appear in Journal of Discrete Algorithms (Elsevier), 2012.

2011

  • Mahashweta Das, Gautam Das, and Vagelis Hristidis: Leveraging Collaborative Tagging for Web Item Design. Full paper, in Proc. of ACM SIGKDD 2011 (Acceptance rate 7.8%)
  • Nan Zhang and Gautam Das: Exploration of Deep Web Repositories, Tutorial, In PVLDB 2011.
  • Xin Jin, Aditya Mone, Nan Zhang, and Gautam Das: Randomized Generalization for Aggregate Suppression Over Hidden Web Databases. In PVLDB 2011.
  • Mahashweta Das, Sihem Amer-Yahia, Gautam Das, and Cong Yu: MRI: Meaningful Interpretations of Collaborative Ratings. In PVLDB 2011.
  • Xin Jin, Nan Zhang, and Gautam Das: Attribute Domain Discovery for Hidden Web Databases, In Proc. of SIGMOD 2011.
  • Mingyang Zhang, Nan Zhang, and Gautam Das: Mining a Search Engine Corpus: Efficient Yet Unbiased Sampling and Aggregate Estimation, In Proc. of SIGMOD 2011.
  • Xin Jin, Aditya Mone, Nan Zhang, and Gautam Das: MOBIES: Mobile-Interface Enhancement Service for Hidden Web Databases, Demo paper, in Proc. of SIGMOD 2011.
  • Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das and Cong Yu: Interactive Itinerary Planning, In Proc. of ICDE 2011 (Acceptance rate 19.8%).
  • H. Howie Huang, Nan Zhang, Wei Wang, Gautam Das, and Alex Szalay: Just-In-Time Analytics on Large File Systems, In Proc. of USENIX Conference on File and Storage Technologies, FAST 2011.
  • Xin Jin, Nan Zhang, and Gautam Das: ASAP: Eliminating Algorithm-based Disclosure in Privacy-Preserving Data Publishing, to appear in Information Systems (Elsevier), 2011

2010

  • Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das and Cong Yu. Space Efficiency in Group Recommendations, In VLDB Journal Special Issue on Data Management and Mining for Social Networks and Social Media, 2010.
  • Ning Yan, Chengkai Li, Senjuti Basu Roy, Rakesh Ramegowda, Gautam Das, Facetedpedia: Enabling Query-Dependent Faceted Search for Wikipedia, Demo paper, in Proc. of CIKM 2010.
  • Xin Jin, Mingyang Zhang, Nan Zhang, Gautam Das: Versatile Publishing for Privacy Preservation. Full paper, in Proc. ACM SIGKDD 2010 (Acceptance rate 13.3%).
  • Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das and Cong Yu: Constructing and Exploring Composite Items. In Proc. SIGMOD 2010.
  • Feng Zhao, Gautam Das, Kian-Lee Tan, Anthony K. H. Tung: Call to Order: A Hierarchical Browsing Approach to Eliciting Users' Preference. In Proc. SIGMOD 2010.
  • Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan Zhang, and Gautam Das: Unbiased estimation of size and other aggregates over hidden web databases. In Proc. SIGMOD 2010.
  • Chengkai Li, Ning Yan, Senjuti Basu Roy, Lekhendro Lisham, and Gautam Das: Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia. In Proc. WWW 2010 (Acceptance rate 14%)
  • Arjun Dasgupta, Nan Zhang, and Gautam Das: Turbo-Charging Hidden Database Samplers with Overflowing Queries and Skew Reduction. In Proc. EDBT 2010 (Acceptance rate 18%)
  • Xin Jin, Nan Zhang, and Gautam Das: Algorithm-safe Privacy Preserving Data Publishing, In Proc. EDBT 2010 (Acceptance rate 18%)
  • Benjamin Arai, Gautam Das, Dimitris Gunopulos, Vagelis Hristidis, Nick Koudas: An Access Cost Aware Approach for Object Retrieval over Multiple Sources. In PVLDB 2010.

2009

  • Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das, Cong Yu: Group Recommendation: Semantics and Efficiency. Full paper, in VLDB 2009. (Acceptance rate 17.9%)
  • Nikos Sarkas, Nilesh Bansal, Gautam Das, Nick Koudas: Measure Driven Keyword Query Expansion. Full paper, in VLDB 2009. (Acceptance rate 16.7%)
  • Surajit Chaudhuri, Gautam Das: Keyword Querying and Ranking in Databases. Tutorial at VLDB 2009.
  • Nikos Sarkas, Gautam Das, Nick Koudas: Improved Search for Socially Annotated Data, In PVLDB 2009.
  • Senjuti Basu Roy, Gautam Das: Top-k Implementation Techniques of Minimum Effort Driven Faceted Search for Databases. In Proc. of COMAD 2009. (Acceptance rate 38%)
  • Muhammed Miah, Gautam Das, Vagelis Hristidis, Heikki Mannila: Determining Attributes to Maximize Visibility of Objects. To appear in IEEE Transactions on Data Engineering (TKDE), 2009.
  • Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri: Privacy Preservation of Aggregates in Hidden Databases: Why and How? Full Paper, in Proc. of SIGMOD 2009. (Acceptance rate 15.9%)
  • Anirban Maiti, Arjun Dasgupta, Nan Zhang, Gautam Das: HDSampler: Revealing Data behind Web Form Interfaces. Demo paper, in Proc. of SIGMOD 2009. (Acceptance rate 37%)
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Nick Koudas: Anytime Measures for Top-k Algorithms on Exact and Fuzzy Data Sets. Invited Paper, VLDB Journal 18(2): 407-427 (2009) on special issue of Best Papers of VLDB 2007.
  • Gautam Das: Top-k Algorithms and Applications. Tutorial at DASFAA 2009.
  • Gautam Das, Nan Zhang: Privacy Risks in Health Databases from Aggregate Disclosure. In Proc PSPAE/PETRA 2009.
  • Gautam Das, Nan Zhang: Aggregates Disclosure in Hidden Web Databases: An Urgent Challenge, Position paper, NSF Workshop on Data and Applications Security, 2009.
  • Arjun Dasgupta, Nan Zhang, Gautam Das: Leveraging COUNT Information in Sampling Hidden Databases. Full paper, in Proc. ICDE 2009. (Acceptance rate 17%)
  • Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das, Mukesh Mohania: DynaCet: Building Dynamic Faceted Search Systems over Databases. Demo paper, in Proc. ICDE 2009. (Acceptance rate 28%)
  • Albert Angel, Surajit Chaudhuri, Gautam Das, Nick Koudas: Ranking Objects Based on Relationships and Fixed Associations. In Proc. EDBT 2009. (Acceptance rate 33%)

2008

  • P. Miettinen, T. Mielikainen, A. Gionis, G. Das, H. Mannila: The Discrete Basis Problem. In IEEE Transactions on Data Engineering (TKDE), 2008, pp. 1348-1362.
  • Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania: Minimum-Effort Driven Dynamic Faceted Search in Structured Databases. In Proc. CIKM 2008. (Acceptance rate 17%)
  • G. Das, N. Sarkas, N. Koudas: Categorical Skylines for Streaming Data. In Proc. of SIGMOD 2008. (Acceptance rate 18%)
  • Muhammed Miah, Gautam Das, Vagelis Hristidis, Heikki Mannila: Standing Out in a Crowd: Selecting Attributes for Maximum Visibility. In Proc. ICDE 2008 (Acceptance rate 19%)
  • Song Lin, Benjamin Arai, Dimitrios Gunopulos, Gautam Das: Energy Efficient Adaptive Region Sampling in Sensor Networks. In Proc. ICDE 2008. (Acceptance rate 19%)
  • Gautam Das, Nick Koudas, Manos Papagelis, Sushruth Puttaswamy: Efficient Sampling of Information in Social Networks. In Proc. CIKM/SSM 2008
  • Z. Joseph, G. Das, L. Fegaras: Distinct Value Estimation in Unstructured P2P Databases. In Proc. PETRA 2008

2007

  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Nick Koudas: Anytime Measures for Top-k Algorithms. VLDB 2007: 914-925 (Acceptance rate 17%)
  • Gautam Das, Dimitrios Gunopulos, Nick Koudas, Nikos Sarkas: Ad-hoc Top-k Query Answering for Data Streams. VLDB 2007: 183-194 (Acceptance rate 17%)
  • Arjun Dasgupta, Gautam Das, Heikki Mannila: A random walk approach to sampling hidden databases. SIGMOD 2007: 629-640 (Acceptance rate 14.6%)
  • Nishant Kapoor, Gautam Das, Vagelis Hristidis, S. Sudarshan, Gerhard Weikum: STAR: A System for Tuple and Attribute Ranking of Query Answers. Demo paper at ICDE 2007: 1483-1484
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Vana Kalogeraki: Efficient Approximate Query Processing in Peer-to-Peer Networks. IEEE Trans. Knowl. Data Eng. (TKDE) 19(7): 919-933 (2007)
  • Senjuti Basu Roy, Gautam Das, Sajal Das: Computing Best Coverage Path in the Presence of Obstacles in a Sensor Field. WADS 2007: 577-588
  • Surajit Chaudhuri, Gautam Das, Vivek Narasayya. Optimized Stratified Sampling for Approximate Query Processing. ACM Transactions on Database Systems (TODS), 32(2): 9 (2007)
  • Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k Queries Using Views. HDMS 2007
  • G. Das: Random Sampling from Databases and Applications. Invited Tutorial at the Intl. Conference on Information Technology ICIT 2007.

2006

  • Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum. Probabilistic Information Retrieval Approach for Ranking of Database Query Results. ACM Transactions on Database Systems (TODS) 31(3): 1134-1168 (2006)
  • (Conference Best Paper) Pauli Miettinen, Taneli Mielikinen, Aristides Gionis, Gautam Das, Heikki Mannila. The Discrete Basis Problem. PKDD 2006. (Acceptance rate 9%)
  • Gautam Das, Vagelis Hristidis, Nishant Kapoor and S. Sudarshan. Ordering the Attributes of Query Results. SIGMOD 2006. (Acceptance rate 13%)
  • Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k Queries Using Views. VLDB 2006. (Acceptance rate 13%)
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos and Vana Kalogeraki. Approximating Aggregation Queries in Peer-to-Peer Networks. ICDE 2006. (Acceptance rate 19.5%)
  • L. Fegaras, W. He, G. Das, and D. Levine. XML Query Routing in Structured P2P Systems. DBISP2P 2006 workshop in conjunction with VLDB 2006.
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos and Vana Kalogeraki. Approximating Aggregations in Peer-to-Peer Databases. HDMS 2006.

2005

  • Foto Afrati, Gautam Das, Aris Gionis, Heikki Mannila, Taneli Mielikainen, Panayiotis Tsaparas: Mining Chains of Relations. ICDM 2005. (Acceptance rate 28%)
  • Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh, Michail Vlachos, and Gautam Das. Mining Time Series Data. In O. Maimon and Rokach (eds.), Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers. 2005.
  • Gautam Das: Approximate Query Processing. Tutorial, SBBD 2005.
  • Gautam Das: Sampling Methods in Approximate Query Answering Systems. Invited Book Chapter, Encyclopedia of Data Warehousing and Mining. Editor John Wang, Information Science Publishing, 2005.
  • Gautam Das: Approximate Query Processing Techniques. Invited Tutorial at the 11th International Conference on Management of Data COMAD 2005.

2004

  • Yi-Min Wang, Lili Qiu, Chad Verbowski, Dimitris Achlioptas, Gautam Das, Paul Larson: Summary-based Routing for Content-based Event Distribution Networks. Computer Communication Review (CCR) Oct. 2004.
  • Michalis Vlachos, Dimitrios Gunopulos, Gautam Das: Rotation Invariant Measures for Trajectories. KDD 2004. (Acceptance rate 29%)
  • Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum: Probabilistic Ranking of Database Query Results. VLDB 2004. (Acceptance rate 16%)
  • Surajit Chaudhuri, Gautam Das, Utkarsh Srivastava: Effective Use of Block-Level Sampling in Statistics Estimation. SIGMOD 2004. (Acceptance rate 16%)

2003

  • Gautam Das: Survey of Approximate Query Processing Techniques. Invited Tutorial, SSDBM 2003.
  • Brian Babcock, Surajit Chaudhuri, Gautam Das: Dynamic Sample Selection for Approximate Query Processing. SIGMOD 2003. (Acceptance rate 15%)
  • Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis: Automated Ranking of Database Query Results. CIDR 2003.
  • Michail Vlachos, Dimitrios Gunopulos, Gautam Das: Indexing Time-Series Under Conditions of Noise, Invited Chapter in Data Mining in Time Series Data Bases, World Scientific Publishing, 2003.
  • Gautam Das, Dimitrios Gunopulos: Time Series Similarity and Indexing. Invited Chapter in Handbook on Data Mining, Lawrence Erlbaum Associates, 2003.

2002

  • Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, Helen J. Wang. Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. 16th International Symposium on DIStributed Computing DISC 2002. (Acceptance rate 32%)
  • Sanjay Agrawal, Surajit Chaudhuri, Gautam Das: DBXplorer: A System For Keyword-Based Search Over Relational Databases. ICDE 2002. (Acceptance rate 19%)
  • Sanjay Agrawal, Surajit Chaudhuri, Gautam Das: DBXplorer: Enabling Keyword Search over Relational Databases. Demo paper, SIGMOD 2002: 627.
  • Binay K. Bhattacharya, Gautam Das, Asish Mukhopadhyay, Giri Narasimhan: Optimally Computing a Shortest Weakly Visible Line Segment Inside a Simple Polygon. Computational Geometry 23(1): 1-29 (2002).

2001

  • Surajit Chaudhuri, Gautam Das, Vivek Narasayya: A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries. SIGMOD 2001. (Acceptance rate 15%)
  • Surajit Chaudhuri, Gautam Das, Mayur Datar, Rajeev Motwani, Vivek Narasayya: Overcoming Limitations of Sampling for Aggregation Queries. ICDE 2001. (Acceptance rate 17%)
  • Bela Bollobas, Gautam Das, Dimitrios Gunopulos, Heikki Mannila: Time-Series Similarity Problems and Well-Separated Geometric Sets. Nordic Journal of Computing, 8(4):409-423, 2001.
  • Dimitrios Gunopulos, Gautam Das: Time Series Similarity Measures and Time Series Indexing. Tutorial, SIGMOD 2001.
  • Danny Z. Chen, Gautam Das, Michiel H. M. Smid: Lower Bounds for Computing Geometric Spanners and Approximate Shortest Paths. Discrete Applied Mathematics 110(2-3): 151-167 (2001).

2000

  • Dimitrios Gunopulos, Gautam Das: Time Series Similarity Measures. Tutorial, KDD 2000.
  • Gautam Das, Heikki Mannila: Context-Based Similarity Measures for Categorical Databases. PKDD 2000: 201-210. (Acceptance rate 18%)
  • Gautam Das, Michiel H. Smid: A Lower Bound for Approximating the Geometric Minimum Weight Matching. Information Processing Letters 74(5-6): 253-255 (2000).

1998

  • (Conference Best Paper Runner up) G. Das, K.-I. Lin, H. Mannila, G. Renganathan and P. Smyth: Rule Discovery from Time Series Database: In Proc. of the 4th Intl. Conference on Knowledge Discovery and Data Mining KDD 1998, pp. 16-22. (Acceptance rate 7%)
  • G. Das, H. Mannila and P. Ronkainen: Similarity of Attributes by External Probes. In Proc. of the 4th Intl. Conference on Knowledge Discovery and Data Mining KDD 1998, pp. 23-29. (Acceptance rate 7%)
  • S. Tara, G. Das and K.-I. Lin: Data Mining In Commercial Applications. In 4th Intl. Conference on Knowledge Discovery and Data Mining KDD 1998 Workshop on Keys to Commercial Success in Data Mining, 1998.

1997

  • G. Das, D. Gunopulos and H. Mannila: Finding Similar Time Series. In Lecture Notes in Computer Science - Proc. of Symp. On Principles of Knowledge Discovery and Data Mining PKDD 1997, pp. 88-100.
  • G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos and J. Karkkinen: Episode Matching. In Lecture Notes in Computer Science - Proc. of the 8th Symp. On Combinatorial Pattern Matching CPM 1997, pp. 12-27.
  • B. Bollobas, G. Das, D. Gunopulos and H. Mannila: Time Series Similarity Problems and Well-Separated Geometric Sets. In Proc. of the ACM Symp. on Computational Geometry SOCG 1997. (Acceptance rate 38%)
  • G. Das: The Visibility Graph Contains a Bounded-Degree Spanner. In Proc. of the 9th Canadian Conference on Computational Geometry CCCG 1997, pp.70-75.
  • Gautam Das, G. Narasimhan: A Fast Algorithm for Constructing Sparse Euclidean Spanners. Int. Journal of Computational Geometry and Applications, 7(4): 297-315 (1997)
  • G. Das, S. Kapoor and M. Smid: On the Complexity of Approximating Traveling Salesman Tours and Minimum Spanning Trees. Algorithmica, Vol. 19, 1997, pp. 447-460.
  • G. Das, P. J. Heffernan and G. Narasimhan: LR-Visibility in Polygons. Invited Paper, in special issue of Computational Geometry: Theory and Applications, Vol. 7, 1997, pp. 37-57.
  • G. Das and M. Goodrich: On the Complexity of Optimization Problems for Three-Dimensional Convex Polyhedra and Decision Trees. Invited Paper, in special issue of Computational Geometry: Theory and Applications, Vol. 8, 1997, pp. 123-137.

1996

  • S. Arikati, D. Chen, L. P. Chew, G. Das, M. Smid and C. D. Zaroliagis: Planar Spanners and Approximate Shortest Path Queries among Obstacles in the Plane. In Lecture Notes in Computer Science - Proc. of the 4th European Symposium on Algorithms, ESA 1996, pp. 514-528.
  • G. Das, S. Kapoor and M. Smid: On the Complexity of Approximating Traveling Salesman Tours and Minimum Spanning Trees. In Lecture Notes in Computer Science - Proc. of the Conf. On Foundations of Software Technology and Theoretical Computer Science, FSTTCS 1996.
  • Danny Z. Chen, G. Das and M. Smid: Lower Bounds for Computing Geometric Spanners and Approximate Shortest Paths: In Proc. of the 8th Canadian Conference on Computational Geometry, CCCG 1996, pp. 155-160.
  • G. Das and P. J. Heffernan: Constructing Degree-3 Spanners with other Sparseness Properties. Invited Paper, in special issue of Intl. Journal of Foundations of Computer Science, Vol. 7, No. 2, 1996, pp. 121-135.

1995

  • G. Das and M. Goodrich: On the Complexity of Approximating and Illuminating Three-Dimensional Convex Polyhedra. In Lecture Notes in Computer Sciences, Proc. of WADS 1995, pp.74-85.
  • S. Arya, G. Das, D. Mount, J. S. Salowe and M. Smid: Euclidean Spanners: Short, Thin and Lanky. In Proc. of the 27th ACM Symposium on Theory of Computing, STOC 1995, pp. 489-494.
  • G. Das, G. Narasimhan and J. S. Salowe: A New Way to Weigh Malnourished Euclidean Graphs. In Proc. of the 6th SIAM-ACM Symposium on Discrete Algorithms, SODA 1995, pp. 215-222.
  • G. Das and G. Narasimhan: Short Cuts in Higher Dimensional Space. In Proc. of the 7th Canadian Conference on Computational Geometry, CCCG 1995, pp. 103-108.
  • B. Chandra, G. Das, G. Narasimhan and J. Soares: New Sparseness Results on Graph Spanners. Invited Paper, in special issue of Intl. Journal of Computational Geometry and Applications, Vol. 5, Nos 1 & 2, 1995, pp. 125-144.

1994

  • G. Das, P. J. Heffernan and G. Narasimhan: Finding all Weakly-Visible Chords of a Polygon in Linear Time: In Nordic Journal of Computing, 1(4), 1994, pp. 433-457.
  • G. Das and G. Narasimhan: Optimal Linear-Time Algorithm for the Shortest Illuminating Line Segment in a Polygon. In Proc. of the 10th ACM Symp. on Computational Geometry, SOCG 1994, pp. 259-268.
  • G. Das, P. J. Heffernan and G. Narasimhan: Finding all Weakly-Visible Chords of a Polygon in Linear Time: In Lecture Notes in Computer Science - Proc. of SWAT 1994.

1993

  • G. Das, P. J. Heffernan and G. Narasimhan: Optimally Sparse Spanners of Euclidean Graphs in 3-Dimensional Space. In Proc. of 9th ACM Symp. on Computational Geometry, SOCG 1993, pp. 53-62.
  • G. Das and P. J. Heffernan: Constructing Degree-3 Spanners with other Sparseness Properties. In Lecture Notes in Computer Science, Proc. of the Intl. Symposium on Algorithms and Computation, ISAAC 1993.
  • G. Das, P. J. Heffernan and G. Narasimhan: LR-Visibility in Polygons. In Proc. of the 5th Canadian Conference on Computational Geometry, CCCG 1993, pp. 303-308.
  • I. Althofer, G. Das, D. P. Dobkin, D. A. Joseph and J. Soares: On Sparse Spanners of Weighted Graphs. In Discrete and Computational Geometry, Vol. 9, 1993, pp. 81-100.

1992

  • B. Chandra, G. Das, G. Narasimhan and J. Soares: New Sparseness Results on Graph Spanners. In Proc. of the ACM Symp. on Computational Geometry, SOCG 1992.
  • G. Das: Approximating Graphs and Polyhedra. In IMACS Intl. Symposium on Mathematical Modeling and Scientific Computation, 1992.
  • G. Das and D. A. Joseph: Minimum Vertex Hulls for Polyhedral Domains. Invited Paper, in special issue of Theoretical Computer Science, Vol. 103, 1992, pp. 107-136.

1991

  • G. Das and G. Narasimhan: Geometric Searching and Link Distance. In Lecture Notes in Computer Science - Proc. of WADS 1991, pp. 261-272.

1990

  • I. Althofer, G. Das, D. P. Dobkin, and D. A. Joseph: Generating Sparse Spanners of Weighted Graphs. In Lecture Notes in Computer Science, Proc. of SWAT 1990, pp. 26-37.
  • G. Das and D. A. Joseph: Minimum Vertex Hulls for Polyhedral Domains. In Lecture Notes in Computer Science, Proc. of the Symposium on Theoretical Aspects of Computer Science, STACS 1990.
  • G. Das and D. P. Dobkin: Generating Small Planar Graphs that Approximate Complete Graphs. In 1st Great Lakes Computer Science Conference, 1990.
  • G. Das and D. A. Joseph: The Complexity of Minimum Convex Nested Polyhedra. In Proc. of the 2nd Canadian Conference on Computational Geometry, CCCG 1990, pp. 296-301.

1989

  • G. Das and D. A. Joseph: Which Triangulations Approximate the Complete Graph? In Lecture Notes in Computer Science - Proc. of the 2nd Intl. Symp. on Optimal Algorithms, ISOA 1989, pp. 168-192.