Special Topics on
Information Retrieval
Hsin-Hsi Chen Pu-Jen Cheng
Department of Computer Science & Information Engineering
National Taiwan University
2012/2/21
Information Retrieval, Extraction & Data Mining
Searching
User Gap Semantic Gap
Query Space
Item Space User
Space
Author Space
Information Needs Item Authority
Retrieval System
Mining/Extraction
IR-related Courses
• Information Retrieval and Extraction (by Prof. Hsin-His Chen)
Web Retrieval and Mining (by Prof. Pu-Jen Cheng)
– Web crawling, retrieval models, link analysis, supervised &
unsupervised learning, info. extraction, query reformulation, log unsupervised learning, info. extraction, query reformulation, log analysis, social media, multimedia retrieval, recommender, …
• Natural Language Processing (by Prof. Hsin-Hsi Chen)
– Words, syntax, semantics, pragmatics, application (MT), …
• Multimedia Analysis and Indexing (by Prof. Winston Hsu)
– Multimedia-feature extraction, high-dimensional indexing, …
Previous Course Project …
• Students get hands-on experience on developing prototype systems or tools
• Many limitations
– Time, background knowledge, solutions,
f i t
performance improvement, paper survey, evaluation, unclear contribution
Goal of the Course
• Students are expected to complete a quality IR-related work
– Originality (important)
– Technical quality (technically sound) – Convincing experiments (well verified) – Clarity (good presentation)
(Team work is required in case we have too many students)
Five Stages/Checkpoints
& Two Outputs for Each Work
Choose a topic
Check related work
Propose Proposal
Propose approaches
Conduct Experiments
Documentation
(Each regular class consists of student presentation in each phase)
Report
time
Five Stages or Checkpoints
Choose a topic Check related work
Propose
The instructors will give some sample topics You may choose your own one
Leveraging existing toolkits is allowed
Literature review is to indentify your position in the research map
Propose approaches
Conduct Experiments
Leveraging existing toolkits is allowed
Knowledge of probability, machine learning, statistics, linear algebra, algorithm, nlp, social network and other areas is a plus
Contact the instructors for specific resources Explanation and discussion are required
Documentation
Both presentation file and report Demonstration is a plusRelated Areas
Library & Info Machine Learning
Pattern Recognition Data Mining
Applications
Web, Bioinformatics…
Models Applications
Information
Retrieval
DatabasesScience g
Natural Language Processing Statistics
Optimization
Software engineering Computer systems
Algorithms
Systems
From C.Zhai’s slide
Publications/Societies
Learning/Mining Applications
ICML, NIPS, UAI RECOMB, PSB
Info. Science Info Retrieval
ICML
ACM SIGKDD
ISMB WWW
WSDM
ACM SIGIR
VLDB, PODS, ICDE
ASIS NLP
Statistics
??
Software/systems
??
COLING, EMNLP, ANLP
HLT
Info Retrieval JCDL
ACM CIKM, TREC Databases ACM SIGMOD
ACL AAAI
From C.Zhai’s slide
Prerequisites, Grading & Textbook
• Prerequisites
– IR-related background knowledge and programming skill are pluses
• Grading
Participation: 10% (show up discussion Q&A ) – Participation: 10% (show up, discussion, Q&A, …) – Presentation: 80% (in each regular class)
– Report: 20% (more details, format as a regular paper) – Individual and team work
• No textbook
– A list of papers as a reference will be given for sample topics