How To Write an IR Paper
Hsin-Hsi Chen and Pu-Jen Cheng
Department of Computer Science and Information Engineering
National Taiwan University Taipei, Taiwan
Related Conferences in IR
From Professor Chengxiang Zhai
SIGIR in ACM Digital Library
• SIGIR is the major international forum for the presentation of new research results and for the
demonstration of new systems and techniques in the broad field of information retrieval (IR). SIGIR welcomes
contributions related to any aspect of IR theory and foundation, techniques, and applications. Relevant topics include, but are not limited to: Document Representation and Content Analysis (e.g., text representation, document structure, linguistic analysis, non-English IR, cross-lingual IR, information extraction, sentiment analysis, clustering, classification, topic models, facets) Queries and Query Analysis (e.g., query representation, query intent, query log analysis, question answering, query suggestion, query reformulation) Users and Interactive IR (e.g., user models, user studies, user feedback, search interface, summarization, task models, personalized search) Retrieval Models and Ranking (e.g., IR theory, language models, probabilistic retrieval models, feature-based models, learning to rank, combining searches, diversity) Search Engine Architectures and Scalability ( e.g., indexing, compression, MapReduce, distributed IR, P2P IR, mobile devices) Filtering and Recommending (e.g., content-based filtering, collaborative filtering, recommender systems, profiles) Evaluation (e.g., test collections, effectiveness measures, experimental design) Web IR and Social Media Search (e.g., link analysis, query logs, social tagging, social network analysis, advertising and search, blog search, forum search, CQA, adversarial IR, vertical and local search) IR and Structured Data (e.g., XML search, ranking in databases, desktop search, entity search) Multimedia IR (e.g., Image search, video search, speech/audio search, music IR) Other Applications (e.g., digital libraries, enterprise search, genomics IR, legal IR, patent search, text reuse) …
http://dl.acm.org/event.cfm?id=RE160&tab=pubs&CFID=68080613&CFTOKEN=55971350
SIGIR publication statistics
(Note: Feb 28, 2012)
SIGIR Accepted Rates
http://portal.acm.org/citation.cfm?id=1835449&picked=source&CFID=6363774&CFTOKEN=77991816
Authors in SIGIR Community
(Note. 2012/2/28)
Research Topics
• Relevant topics include, but are not limited to: Document Representation and
Content Analysis (e.g., text representation, document structure, linguistic analysis, non-English IR, cross-lingual IR, information extraction, sentiment analysis,
clustering, classification, topic models, facets) Queries and Query Analysis (e.g., query representation, query intent, query log analysis, question answering, query suggestion, query reformulation) Users and Interactive IR (e.g., user models, user studies, user feedback, search interface, summarization, task models, personalized search) Retrieval Models and Ranking (e.g., IR theory, language models,
probabilistic retrieval models, feature-based models, learning to rank, combining searches, diversity) Search Engine Architectures and Scalability ( e.g., indexing, compression, MapReduce, distributed IR, P2P IR, mobile devices) Filtering and Recommending (e.g., content-based filtering, collaborative filtering, recommender systems, profiles) Evaluation (e.g., test collections, effectiveness measures,
experimental design) Web IR and Social Media Search (e.g., link analysis, query logs, social tagging, social network analysis, advertising and search, blog search, forum search, CQA, adversarial IR, vertical and local search) IR and Structured Data (e.g., XML search, ranking in databases, desktop search, entity search) Multimedia IR (e.g., Image search, video search, speech/audio search, music IR) Other Applications (e.g., digital libraries, enterprise search, genomics IR, legal IR, patent search, text reuse)
IR Subject Areas
Trends in IR Research and the Topology of Its Community
• What have SIGIR authors been writing about and when
• Where do SIGIR authors come from
• … and together with whom did they write
ACM SIGIR Forum, Vol. 41, No. 2, 2007
http://www.sigir.org/forum/2007D/2007d_sigirforum_hiemstra.pdf
Document Representation and Content Analysis
• text representation
• document structure
• linguistic analysis
• non-English IR
• cross-lingual IR
• information extraction
• sentiment analysis
• clustering
• classification
• topic models
• facets
Queries and Query Analysis
• query representation
• query intent
• query log analysis
• question answering
• query suggestion
• query reformulation
Users and Interactive IR
• user models
• user studies
• user feedback
• search interface
• summarization
• task models
• personalized search
Retrieval Models and Ranking
• IR theory
• language models
• probabilistic retrieval models
• non-probabilistic models
• feature-based models
• learning to rank
• combining searches
• diversity
Search Engine Architectures and Scalability
• indexing
• compression
• MapReduce
• distributed IR
• P2P IR
• mobile devices
Filtering and Recommending
• content-based filtering
• collaborative filtering
• recommender systems
• profiles
Evaluation
• test collections
• effectiveness measures
• experimental design
Web IR and Social Media Search
• link analysis
• query logs
• social tagging
• social network analysis
• advertising and search
• blog search
• forum search
• community-based QA (CQA)
• adversarial IR
IR and Structured Data
• XML search
• ranking in databases
• desktop search
• entity search
Multimedia IR
• Image search
• video search
• speech/audio search
• music IR
Other Applications
• digital libraries
• enterprise search
• vertical search
• genomics IR
• legal IR
• patent search
• text reuse
How to write a good IR paper
• Content Guidelines for Full Papers in SIGIR (http://www.sigir.org/sigir2012/paper-
guidelines.php)
• ECIR Draft Guidelines
(http://irsg.bcs.org/proceedings/ECIR_Draft_G uidelines.pdf)
– Develop a set of guidelines for authors and reviewers of ECIR papers
Content Guidelines for Full Papers
• Novelty
– A submission must represent new, original research
• Format
– Submitted papers should be in the ACM Conference style
– Papers must not exceed 10 pages in 9 point font and must be submitted as PDF files.
– Papers exceeding the limits will be rejected without review.
• Anonymity
– SIGIR review is double-blind.
– All submissions must contain no information identifying the author(s) or their organization(s) – Do not put the author(s) names or affiliation(s) at
the start of the paper, anonymize citations to and mentions of your own prior work in the paper, and do not include funding or other acknowledgments in papers submitted for review.
Guidelines for Experimental Evaluation
• A good SIGIR paper should generally contain
some theory, accompanied by solid experimental evidence.
• Acceptable SIGIR papers can also result from a heuristic combination of methods with solid
experimental evidence if there is sufficient
novelty and analysis of results to provide insights.
• A paper that provides new theoretical results without experimental validation will rarely be acceptable to SIGIR.
Test Collections
• If publicly available test collections are used, experimental results should include a comparison to the best known
baselines for these collections. If no such comparison is made, the choice of a different baseline should be justified.
• If you create your own test collection, you are encouraged to make this collection available to the research community.
SIGIR strongly encourages the creation of new test collections, both for 'classic' retrieval tasks as well as for new types of
tasks.
• The use of proprietary test collections is becoming
increasingly common and may be unavoidable in some cases.
However, experimental results from such collections are more difficult to reproduce. It is important to publish as much detail as possible about the collection (including the queries) and to ensure that the algorithms used can be reproduced.
Prior Publication
• Papers containing substantially similar material may not be submitted to other venues concurrently with SIGIR.
• If a duplicate submission is identified during the review process, it will be rejected without review, and authors will not be permitted to submit papers to the SIGIR conference in the following year.
• The Program Committee will consider the following types of papers to be not published:
– Self-publication on an individual's website, in a technical report, or in an unrefereed archive such as arXiv
– Workshop papers that do not appear in the ACM Digital Library.
arXiv
(http://arxiv.org/)• The arXiv (pronounced "archive", as if the "X" were the Greek letter Chi, χ) is an archive for electronic preprints of scientific papers in the fields of mathematics, physics,
astronomy, computer science, quantitative biology, statistics, and quantitative finance which can be accessed online.
• In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv.
• On 3 October 2008, arXiv.org passed the half-million article milestone, with roughly five thousand new e-prints added every month.
• The preprint archive turned 20 years old on 14 August 2011.
http://en.wikipedia.org/wiki/ArXiv
Review Criteria
• Relevance to SIGIR
• Originality of work
• Quality of work
• Quality of presentation
• Literature Review / Adequacy of Citations
• Confidence in review
• Overall recommendation
ECIR Draft
Paper Guidelines
http://irsg.bcs.org/ecir.php
http://irsg.bcs.org/proceedings/ECIR_
Draft_Guidelines.pdf
Goals
• helping authors wishing to submit to ECIR (especially student and first time authors),
along with aiding reviewers in the refereeing process.
• provide a guide for the following areas:
theoretical, experimental/comparisons, People in IR, applications, conceptual, and evaluation and performance measures.
Theoretical Papers
• A theoretical paper proposes a theory for Information Retrieval
– present a supposition or system of ideas intended to explain some phenomena within IR
– based on general principles independent of the phenomena to be explained
– provide a set of principles on which the practise of an activity is based (i.e. a theory of information
seeking behaviour)
A Good Theoretical Paper
• Go beyond the existing theory already present in the literature
– link to older theory to provide the context of the paper – the relationship between the old and the new should be
related and explained
• Provide the necessary contextualisation of the theory within IR
– what is the relevance of this theory to IR
– the generic application of a machine learning approach, for example, is not relevant.
A Good Theoretical Paper
• The clarity of the presentation is very important
– the arguments presented need to be clear and justified – provide illustrative and practical examples to aid the
reader’s understanding
• Link theory with practise
– does it work in practise
– “proof” that a theory holds is not a necessary
requirement for a theoretical paper to be acceptable – the work is in its early stages; the machinery doesn’t
exist for it to be tested; etc.
When experimental work cannot be provided
• A discussion should be included about the testability of the theory presented
• comments on whether it can be falsified
• its tractability
• how the theory could be tested in practise
• its relationship with experimentation
• whether it is possible to implement or not
Experimental/Comparisons Paper
• An experimental paper compares one or more competing theories/techniques within
Information Retrieval
• An experimental paper should contain the context of the study, a clear statement of the problem addressed, and present clear research hypotheses
A Good Experimental Paper
• state clearly what exactly is new with respect to previous work
– a good set of references should be included to link prior work; and should include those approaches, which can be used as a baseline.
• use publicly available and (preferably) standard test collections
• use parts of test collections or subset of topics are generally considered unacceptable, unless
accompanied by a reasonable justification
A Good Experimental Paper
• a good paper will ensure that the data is
available to enable replication, verification, and/or reproducibility of the work.
• justify the data collection(s) and analysis methods used
• use more than one test collection (if available) to provide more evidence for the hypotheses presented and show how generalisable the techniques examined are
A Good Experimental Paper
• use appropriate statistical or qualitative methods and report appropriate and standard evaluation measures
• simply performing and reporting significance test and so forth is not sufficient without further
explanation of that significance
• use appropriate state-of-the-art baseline(s) to
convince the reader that the proposed technique is superior or not
– provides strong retrieval performance
– well-established and robust across different collections
A Good Experimental Paper
• If the experimental paper proposes a technique, which is computationally expensive, this
inefficiency should be acknowledged and discussed
• indicate the significance of the results and
conclusions made with respect to the practice and/or theory of IR
User studies/Interfaces papers
• “People in IR” paper
– involve humans as a major component in the system, experiment or investigative study being described
• Type 1: those based on laboratory IR investigations
– similar to experimental papers but with the involvement of humans
• Type 2: those investigating information seeking and behaviour
User studies/Interfaces papers
• Type 1: Interactive IR
– evaluation of novel interfaces, and interaction work
– user modelling and predictive or adaptive technologies (with people involved in the evaluation or data collection)
• Type 2: Information Seeking (Information Behaviour)
– information needs and search behaviour of individuals or distinct groups of people
Typical “People in IR” Paper
• Describe a novel methodology specifically created for an individual investigation.
• Methodology
– a coherent set of decisions and investigative components
– the reasons behind the methodology will require explanation
A Good “People in IR” Paper
• General Guideline
– a coherent narrative to describe the research questions motivating the research
– the methodology to investigate these questions
• The evaluation and methodology should be
appropriate given the research hypotheses and objectives
• Common criticisms of such papers are that there are not enough participants, user groups, tasks, or baselines.
A Good “People in IR” Paper
• select results rather than overwhelm the reader with as many as possible
• concentrate on a smaller number of related results and research questions and
investigating them in more depth
• Unexpected or surprising results are worth including as is qualitative information from any participants
A Good Laboratory based IR Paper
• similar to the guideline in experimental papers, but the introduction of people within the studies introduces
some further issues
• the people involved and why their particular
characteristics might influence the results obtained
• the paper needs to describe the components of the study
– source of search tasks – any baseline systems
– instructions given to participants in the study and how these aspects relate to the research questions and results obtained
A Good Information Seeking Paper
• investigate search phenomena in depth rather than just reporting or describing the study and basic results
• seek to investigate the reasons for the results and will present and analysis of the implication of the findings for IR research
A Poor “People in IR” Paper
• not explaining
– why the research is being carried out – how the study was constructed
– which individual results were selected for presentation
– how the research links with other research in the area
Applications and System Prototype Papers
• What is an applications paper?
– Positioning papers: details the motivation and background for an application
– Technical papers: details the description of the architecture, individual components, algorithms, integration of components, etc.
– Demo papers: describe the system in the paper and is usually coupled with a demonstration of the
application
– Test and evaluation papers: report empirical results from the testing of an application
Applications and System Prototype Papers
• Report different stages or phases of a research project and an application prototype and its
development
– requirements and design analysis – Prototyping and implementation – testing and evaluation
– Dissemination
A Good Applications Paper
• Include an explicit system description and take the reader through a user experience or
scenario, if applicable, providing examples such as a walk through of the system and the iterations
• Include the contribution’s references to any prior work
A Good Positioning Paper
• Present a thorough and comprehensive
background along with a detailed motivation for the application
• Explain the uniqueness of the
solution/application should be explained
• Justify how this position was arrived
A Good Demo Paper
• provide a description of the system and how the system would be experienced by the user
• The system description should include the
science and motivation behind the application to justify why the application is novel and
warrants demonstration
• A technical specification states the configuration, hardware and other
requirements that the application requires
A Good Test and Evaluate Paper
• A clear distinction should be made between the testing of the system (through running
experiments, etc) and the evaluation and analysis of the experiments.
• During the test process, the inclusion of a functionality check should be included to detail what is operational in the application and what is not, and to specify any other
limitations relevant to the experimentation.
A Good User Studies Paper
• Describe the experimental conditions under which the user testing took place; for instance pointing out whether real users were involved or whether it was pilot tested on colleagues.
A Poor Applications Paper
• present an idea (new or old) but no evidence or explanation of its perceived need or
uniqueness
Conceptual papers
• A conceptual paper presents concepts dealing with or relating to IR, where a new perspective is obtained or formulated by the combination of a group or class of objects.
• The identification of trends or patterns, which occur in IR, where the contribution to
knowledge is the definition of the concepts and their relationships to the IR process.
A Good Conceptual Paper
• Formally define all the concepts introduced and their relationships/interactions
• An important part of the contribution made by a conceptual paper is how the proposed
conceptual development changes our
understanding of the theory and practice in IR.
Summary
• Please try to point out why a paper you just read was accepted.
• Please try to criticize the paper you just read and point out why it could not be accepted when you were a reviewer.