14. KDD 2008: Las Vegas, Nevada, USA

Ying Li, Bing Liu, Sunita Sarawagi (Eds.): Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008. ACM 2008, ISBN 978-1-60558-193-4

Benjamin Edelman, Michael Schwarz:
Internet advertising and optimal auction design. 1
Thore Graepel, Ralf Herbrich:
Large scale data analysis and modelling in online services and advertising. 2
Trevor Hastie, Jerome Friedman, Robert Tibshirani:
Regularization paths and coordinate descent. 3
Jitendra Malik:
The future of image search. 4
Udo Miletzki:
Genesis of postal address reading, current state and future prospects: thirty years of pattern recognition on duty of postal services. 5-6

Research papers

Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian:
Influence and correlation in social networks. 7-15
Luca Becchetti, Paolo Boldi, Carlos Castillo, Aristides Gionis:
Efficient semi-streaming algorithms for local triangle counting in massive graphs. 16-24
Indrajit Bhattacharya, Shantanu Godbole, Sachindra Joshi:
Structured entity identification and document categorization: two tasks with one joint model. 25-33
Albert Bifet, Ricard Gavaldà:
Mining adaptively frequent closed unlabeled rooted trees in data streams. 34-42
Mustafa Bilgic, Lise Getoor:
Effective label acquisition for collective classification. 43-51
Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis:
Topical query decomposition. 52-60
Christos Boutsidis, Michael W. Mahoney, Petros Drineas:
Unsupervised feature selection for principal components analysis. 61-69
Justin Brickell, Vitaly Shmatikov:
The cost of privacy: destruction of data-mining utility in anonymized data publishing. 70-78
Deepayan Chakrabarti, Ravi Kumar, Kunal Punera:
Generating succinct titles for web URLs. 79-87
Soumen Chakrabarti, Rajiv Khanna, Uma Sawant, Chiru Bhattacharyya:
Structured learning for non-smooth ranking losses. 88-96
Ming-wei Chang, Wen-tau Yih, Christopher Meek:
Partitioned logistic regression for spam filtering. 97-105
Jianhui Chen, Shuiwang Ji, Betul Ceran, Qi Li, Mingrui Wu, Jieping Ye:
Learning subspace kernels for classification. 106-114
WenYen Chen, Dong Zhang, Edward Y. Chang:
Combinational collaborative filtering for personalized community recommendation. 115-123
Xue-wen Chen, Michael Wasikowski:
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems. 124-132
Haibin Cheng, Pang-Ning Tan:
Semi-supervised learning with data calibration for long-term time series forecasting. 133-141
Yong Ju Cho, Naren Ramakrishnan, Yang Cao:
Reconstructing chemical reaction networks: data mining meets system identification. 142-150
Peter Christen:
Automatic record linkage using seeded nearest neighbour and support vector machine classification. 151-159
David J. Crandall, Dan Cosley, Daniel P. Huttenlocher, Jon M. Kleinberg, Siddharth Suri:
Feedback effects between similarity and social influence in online communities. 160-168
Kaustav Das, Jeff G. Schneider, Daniel B. Neill:
Anomaly pattern detection in categorical datasets. 169-176
Atish Das Sarma, Sreenivas Gollapudi, Samuel Ieong:
Bypass rates: reducing query abandonment using negative inferences. 177-185
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar:
De-duping URLs via rewrite rules. 186-194
Jason V. Davis, Inderjit S. Dhillon:
Structured metric learning for high dimensional problems. 195-203
Luc De Raedt, Tias Guns, Siegfried Nijssen:
Constraint programming for itemset mining. 204-212
Charles Elkan, Keith Noto:
Learning classifiers from only positive and unlabeled data. 213-220
Kave Eshghi, Shyamsundar Rajaram:
Locality sensitive hash functions based on concomitant rank order statistics. 221-229
Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure:
Direct mining of discriminative and essential frequent patterns via model-based search tree. 230-238
George Forman, Shyamsundar Rajaram:
Scaling up text classification for large file systems. 239-246
Yasuhiro Fujiwara, Yasushi Sakurai, Masashi Yamamuro:
SPIRAL: efficient and exact model identification for hidden Markov models. 247-255
Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, Christos Faloutsos:
Using ghost edges for classification in sparsely labeled networks. 256-264
Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, Adam Smith:
Composition attacks and auxiliary information in data privacy. 265-273
Venkatesh Ganti, Arnd Christian König, Rares Vernica:
Entity categorization over large document collections. 274-282
Jing Gao, Wei Fan, Jing Jiang, Jiawei Han:
Knowledge transfer via multiple model local structure mapping. 283-291
Gemma C. Garriga, Esa Junttila, Heikki Mannila:
Banded structure in binary matrices. 292-300
Rohit Gupta, Gang Fang, Blayne Field, Michael Steinbach, Vipin Kumar:
Quantitative evaluation of approximate frequent pattern mining algorithms. 301-309
Robert Hall, Charles A. Sutton, Andrew McCallum:
Unsupervised deduplication using cross-field dependencies. 310-317
Meng Hu, Jiong Yang, Wei Su:
Permu-pattern: discovery of mutable permutation patterns with proximity constraint. 318-326
Heng Huang, Chris H. Q. Ding, Dijun Luo, Tao Li:
Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering. 327-335
Woochang Hwang, Taehyong Kim, Murali Ramanathan, Aidong Zhang:
Bridging centrality: graph mining from element level to group level. 336-344
Saara Hyvönen, Pauli Miettinen, Evimaria Terzi:
Interpretable nonnegative matrix decompositions. 345-353
Georgiana Ifrim, Gökhan H. Bakir, Gerhard Weikum:
Fast logistic regression for text categorization with variable-length n-grams. 354-362
Tomoharu Iwata, Takeshi Yamada, Naonori Ueda:
Probabilistic latent semantic visualization: topic model for visualizing documents. 363-371
David D. Jensen, Andrew S. Fast, Brian J. Taylor, Marc E. Maier:
Automatic identification of quasi-experimental designs for discovering causal knowledge. 372-380
Shuiwang Ji, Lei Tang, Shipeng Yu, Jieping Ye:
Extracting shared subspace for multi-label classification. 381-389
Bin Jiang, Jian Pei, Xuemin Lin, David W. Cheung, Jiawei Han:
Mining preferences from superior and inferior examples. 390-398
Ruoming Jin, Muad Abu-Ata, Yang Xiang, Ning Ruan:
Effective and efficient itemset pattern summarization: regression-based approaches. 399-407
S. Sathiya Keerthi, S. Sundararajan, Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin:
A sequential dual method for large scale multi-class linear svms. 408-416
Jerry Kiernan, Evimaria Terzi:
Constructing comprehensive summaries of large event sequences. 417-425
Yehuda Koren:
Factorization meets the neighborhood: a multifaceted collaborative filtering model. 426-434
Gueorgi Kossinets, Jon M. Kleinberg, Duncan J. Watts:
The structure of information pathways in a social communication network. 435-443
Hans-Peter Kriegel, Matthias Schubert, Arthur Zimek:
Angle-based outlier detection in high-dimensional data. 444-452
Srivatsan Laxman, Vikram Tankasali, Ryen W. White:
Stream prediction using a generative model based on frequent episodes in event sequences. 453-461
Jure Leskovec, Lars Backstrom, Ravi Kumar, Andrew Tomkins:
Microscopic evolution of social networks. 462-470
Lei Li, Wenjie Fu, Fan Guo, Todd C. Mowry, Christos Faloutsos:
Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps. 471-479
Charles X. Ling, Jun Du:
Active learning with direct query construction. 480-487
Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, Yong Yu:
Spectral domain-transfer learning. 488-496
Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce R. Schatz:
Mining multi-faceted overviews of arbitrary topics in a text collection. 497-505
Aurelie C. Lozano, Naoki Abe:
Multi-class cost-sensitive boosting with p-norm loss functions. 506-514
Omid Madani, Jian Huang:
On updates that constrain the features' connections during learning. 515-523
Mary McGlohon, Leman Akoglu, Christos Faloutsos:
Weighted graphs and disconnected components: patterns and a generator. 524-532
Gabriela Moise, Jörg Sander:
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. 533-541
Ramesh Nallapati, Amr Ahmed, Eric P. Xing, William W. Cohen:
Joint latent topic models for text and citations. 542-550
Nam Nguyen, Rich Caruana:
Classification with partial labels. 551-559
Dino Pedreschi, Salvatore Ruggieri, Franco Turini:
Discrimination-aware data mining. 560-568
Ian Porteous, David Newman, Alexander T. Ihler, Arthur Asuncion, Padhraic Smyth, Max Welling:
Fast collapsed gibbs sampling for latent dirichlet allocation. 569-577
Hiroto Saigo, Nicole Krämer, Koji Tsuda:
Partial least squares regression for graph mining. 578-586
Issei Sato, Minoru Yoshida, Hiroshi Nakagawa:
Knowledge discovery of semantic relationships between words using nonparametric bayesian graph model. 587-595
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec:
Mobile call graphs: beyond power-law and lognormal distributions. 596-604
Qihong Shao, Yi Chen, Shu Tao, Xifeng Yan, Nikos Anerousis:
Efficient ticket routing by resolution sequence mining. 605-613
Victor S. Sheng, Foster J. Provost, Panagiotis G. Ipeirotis:
Get another label? improving data quality and data mining using multiple, noisy labelers. 614-622
Jin Shieh, Eamonn J. Keogh:
iSAX: indexing and mining terabyte sized time series. 623-631
Ka Cheung Sia, Junghoo Cho, Yun Chi, Belle L. Tseng:
Efficient computation of personal aggregate queries on blogs. 632-640
György J. Simon, Vipin Kumar, Zhi-Li Zhang:
Semi-supervised approach to rapid and reliable labeling of large data sets. 641-649
Ajit Paul Singh, Geoffrey J. Gordon:
Relational learning via collective matrix factorization. 650-658
Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums:
A bayesian mixture model with linear regression mixing proportions. 659-667
Liang Sun, Shuiwang Ji, Jieping Ye:
Hypergraph spectral learning for multi-label classification. 668-676
Lei Tang, Huan Liu, Jianping Zhang, Zohreh Nazeri:
Community evolution in dynamic multi-mode networks. 677-685
Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos:
Colibri: fast mining of large static and dynamic graphs. 686-694
Pedro O. S. Vaz de Melo, Virgílio A. F. Almeida, Antonio Alfredo Ferreira Loureiro:
Can complex network metrics predict the behavior of NBA teams? 695-703
Daniel David Walker, Eric K. Ringger:
Model-based document clustering with a collapsed gibbs sampler. 704-712
Pu Wang, Carlotta Domeniconi:
Building semantic kernels for text classification using wikipedia. 713-721
Michael L. Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew McCallum:
A unified approach for schema matching, coreference and canonicalization. 722-730
Fei Wu, Raphael Hoffmann, Daniel S. Weld:
Information extraction from Wikipedia: moving down the long tail. 731-739
Junjie Wu, Hui Xiong, Jian Chen:
SAIL: summation-based incremental learning for information-theoretic clustering. 740-748
Shan-Hung Wu, Keng-Pei Lin, Chung-Min Chen, Ming-Syan Chen:
Asymmetric support vector machines: low false-positive learning under the user tolerance. 749-757
Yang Xiang, Ruoming Jin, David Fuhry, Feodor F. Dragan:
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. 758-766
Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Philip S. Yu:
Anonymizing transaction databases for publication. 767-775
Jian Yang, Ning Zhong, Yiyu Yao, Jue Wang:
Local peculiarity factor and its application in outlier detection. 776-784
Luh Yen, Marco Saerens, Amin Mantrach, Masashi Shimbo:
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. 785-793
Chun-Nam John Yu, Thorsten Joachims:
Training structural svms with kernels using sampled cuts. 794-802
Lei Yu, Chris H. Q. Ding, Steven Loscalzo:
Stable feature selection via dense feature groups. 803-811
Peng Zhang, Xingquan Zhu, Yong Shi:
Categorizing and mining concept drifting data streams. 812-820
Xiang Zhang, Fei Zou, Wei Wang:
Fastanova: an efficient algorithm for genome-wide association study. 821-829
Bin Zhao, Fei Wang, Changshui Zhang:
Cuts3vm: a fast semi-supervised svm algorithm. 830-838
Zheng Zhao, Jiangxin Wang, Huan Liu, Jieping Ye, Yung Chang:
Identifying biologically relevant genes via multiple heterogeneous data sources. 839-847
Wenjun Zhou, Hui Xiong:
Volatile correlation computation: a checkpoint view. 848-856

Industrial papers

Shyam Boriah, Vipin Kumar, Michael Steinbach, Christopher Potter, Steven A. Klooster:
Land cover change detection: a case study. 857-865
Mohamed Bouguessa, Benoît Dumoulin, Shengrui Wang:
Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. 866-874
Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, Hang Li:
Context-aware query suggestion by mining click-through and session data. 875-883
Christine H. Chih, Douglass S. Parker:
The persuasive phase of visualization. 884-892
Richard Chow, Philippe Golle, Jessica Staddon:
Detecting privacy leaks using corpus-based association rules. 893-901
Ying Cui, Jennifer G. Dy, Gregory C. Sharp, Brian M. Alexander, Steve B. Jiang:
Learning methods for lung tumor markerless gating in image-guided radiotherapy. 902-910
Shantanu Godbole, Shourya Roy:
Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry. 911-919
Robert L. Grossman, Yunhong Gu:
Data mining using high performance data clouds: experimental studies using sector and sphere. 920-927
Shen-Shyang Ho, Ashit Talukder:
Automated cyclone discovery and tracking using knowledge sharing in multiple heterogeneous satellite data. 928-936
Noam Koenigstein, Yuval Shavitt, Tomer Tankel:
Spotting out emerging artists using geo-aware analysis of P2P query strings. 937-945
Prem Melville, Saharon Rosset, Richard D. Lawrence:
Customer targeting models using actively-selected web content. 946-953
Fabian Mörchen, Mathäus Dejori, Dmitriy Fradkin, Julien Etienne, Bernd Wachmann, Markus Bundschus:
Anticipating annotations and emerging trends in biomedical literature. 954-962
G. Niklas Norén, Andrew Bate, Johan Hopstadius, Kristina Star, I. Ralph Edwards:
Temporal pattern discovery for trends and transient effects: its application to patient records. 963-971
Nish Parikh, Neel Sundaresan:
Scalable and near real-time burst detection from eCommerce queries. 972-980
Renuka Sindhgatta:
Identifying domain expertise of developers from source code. 981-989
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, Zhong Su:
ArnetMiner: extraction and mining of academic social networks. 990-998
Leonardo Weiss Ferreira Chaves, Erik Buchmann, Klemens Böhm:
Tagmark: reliable estimations of RFID tags for business processes. 999-1007
Gang Wu, Brendan Kitts:
Experimental comparison of scalable online ad serving. 1008-1015
Xintian Yang, Sitaram Asur, Srinivasan Parthasarathy, Sameep Mehta:
A visual-analytic toolkit for dynamic interaction graphs. 1016-1024
Jieping Ye, Kewei Chen, Teresa Wu, Jing Li, Zheng Zhao, Rinkal Patel, Min Bae, Ravi Janardan, Huan Liu, Gene Alexander, Eric Reiman:
Heterogeneous data fusion for alzheimer's disease study. 1025-1033
Shipeng Yu, Glenn Fung, Rómer Rosales, Sriram Krishnan, R. Bharat Rao, Cary Dehing-Oberije, Philippe Lambin:
Privacy-preserving cox regression for survival analysis. 1034-1042
Sai Zeng, Prem Melville, Christian A. Lang, Ioana M. Boier-Martin, Conrad Murphy:
Using predictive analysis to improve invoice-to-cash collection. 1043-1050
Yi Zhang, Arun C. Surendran, John C. Platt, Mukund Narasimhan:
Learning from multi-topic web documents for contextual advertisement. 1051-1059

Panel

Ravi Kumar, Alexander Tuzhilin, Christos Faloutsos, David Jensen, Gueorgi Kossinets, Jure Leskovec, Andrew Tomkins:
Social networks: looking ahead. 1060

Demonstrations

Hendrik Blockeel, Toon Calders, Élisa Fromont, Bart Goethals, Adriana Prado, Céline Robardet:
An inductive database prototype based on virtual mining views. 1061-1064
Peter Christen:
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. 1065-1068
Luigi Di Caro, K. Selçuk Candan, Maria Luisa Sapino:
Using tagflake for condensing navigable tag hierarchies from tag clouds. 1069-1072
Shantanu Godbole, Shourya Roy:
An integrated system for automatic customer satisfaction analysis in the services industry. 1073-1076
Ming Hua, Jian Pei:
DiMaC: a disguised missing data cleaning tool. 1077-1080
Evangelos E. Kotsifakos, Irene Ntoutsi, Yannis Vrahoritis, Yannis Theodoridis:
Pattern-Miner: integrated management and mining over data mining models. 1081-1084
Hongyan Liu, Hui Yang, Wenbo Li, Wei Wei, Jun He, Xiaoyong Du:
CRO: a system for online review structurization. 1085-1088
Emmanuel Müller, Ira Assent, Ralph Krieger, Timm Jansen, Thomas Seidl:
Morpheus: interactive exploration of subspace clustering. 1089-1092
Hill Nguyen, Nish Parikh, Neel Sundaresan:
A software system for buzz-based recommendations. 1093-1096
Shuyi Zheng, Matthew R. Scott, Ruihua Song, Ji-Rong Wen:
Pictor: an interactive system for importing data from a website. 1097-1100