学术搜索评测

GS · RG · Sci-Hub · CNKI · Wanfang

How

How to Build an Effective Search Strategy for Systematic Literature Reviews

A single systematic literature review (SLR) now requires, on average, screening 5,000 to 10,000 records, according to a 2023 analysis by the Cochrane Collabo…

A single systematic literature review (SLR) now requires, on average, screening 5,000 to 10,000 records, according to a 2023 analysis by the Cochrane Collaboration. Yet 73% of published reviews fail to report a reproducible search strategy, as found by a 2022 study in Systematic Reviews (BioMed Central). This gap is costly: wasted hours, missed studies, and editor rejections. For Chinese graduate students and researchers, the challenge is compounded by needing to navigate both global databases like Web of Science and domestic platforms like CNKI (知网). Building an effective search strategy is not just about typing keywords—it requires a structured, replicable method that balances sensitivity (finding everything) with specificity (filtering noise). This guide provides a four-dimensional framework covering database coverage, advanced search syntax, citation export protocols, and API-supported automation, tailored for the tools you actually use in mainland China.

Database Coverage: Mapping Your Sources

Database coverage determines the universe of retrievable literature. A 2024 report from Clarivate indicates Web of Science (WoS) indexes 21,000+ journals, while Scopus covers 27,000+. For Chinese-language research, CNKI (知网) and Wanfang (万方) are non-negotiable: CNKI alone holds over 80 million records as of 2023, per its official statistics. However, no single database is sufficient. A 2021 study by Bramer et al. in the Journal of the Medical Library Association showed that using only PubMed misses 30% of relevant studies in biomedical SLRs.

Selecting Core Databases

For a typical social science SLR, start with Scopus (broader coverage) and Web of Science (higher impact filter). Add CNKI for Chinese policy or education research. For engineering, include IEEE Xplore and Engineering Village. For medicine, PubMed and Cochrane Library are mandatory. Always test your query in 2–3 databases before finalizing.

Checking Coverage Gaps

Use database title lists to verify journal coverage. For example, if your review targets Chinese-language journals indexed in CSSCI, confirm those titles appear in CNKI or Wanfang. A 2023 comparison by the National Science Library of CAS found that CNKI covers 95% of CSSCI journals, while Wanfang covers 78%. Cross-checking both reduces blind spots.

Advanced Search Syntax: Precision Through Operators

Search syntax is the engine of reproducibility. Boolean operators (AND, OR, NOT) are universal, but each database has quirks. A 2022 guideline from the Campbell Collaboration recommends documenting every syntax variation to ensure replicability.

Boolean and Field Codes

In WoS, use TS=(("climate change" OR "global warming") AND "adaptation") to search titles, abstracts, and keywords. In CNKI, the equivalent field code is SU=('气候变化' OR '全球变暖') AND SU='适应' (SU = subject). Phrase searching with quotation marks is critical: "systematic review" returns exact matches, reducing false positives by up to 40% compared to unquoted terms, per a 2020 analysis in Research Synthesis Methods.

Truncation and Wildcards

Use * for multiple characters: therap* captures therapy, therapeutic, therapies. Use ? for single characters: wom?n catches woman and women. Be aware that CNKI does not support wildcards—you must list all variants manually. For example, search (儿童 OR 孩子 OR 青少年) instead of 儿*.

Citation Export: Structured Data for Screening

Citation export formats determine how easily you can move records into reference managers like EndNote, Zotero, or NoteExpress. A 2023 survey by the University of Hong Kong found that 68% of Chinese graduate students use NoteExpress, while 22% use EndNote. Each tool has preferred formats.

Exporting from Global Databases

From WoS, choose “Full Record and Cited References” in RIS format for Zotero or BibTeX for LaTeX users. From Scopus, export as CSV including abstracts and DOI. Always check that the export includes the abstract—missing abstracts cause 15% of screening errors, according to a 2021 study in Systematic Reviews.

Exporting from Chinese Databases

CNKI exports to NoteExpress (.nef) natively, but its RIS export is incomplete (often missing DOIs). Wanfang offers RefWorks format, which imports cleanly into Zotero. A workaround: export CNKI records as Excel and manually map fields to your reference manager. For large reviews (500+ records), use the API-based batch export feature in NoteExpress to avoid manual errors.

API Support: Automating the Search Process

API support enables programmatic search and retrieval, critical for living systematic reviews or large-scale updates. A 2023 report from the Open Science Foundation highlights that API-based searches reduce manual effort by 60% for monthly updates.

PubMed and Scopus APIs

PubMed’s E-utilities API allows you to run esearch and efetch commands via Python. For example, esearch.fcgi?db=pubmed&term=cancer+AND+therapy&retmax=10000 returns up to 10,000 PMIDs. Scopus’s Scopus Search API (requires institutional subscription) supports query=TITLE-ABS-KEY("machine learning") AND PUBYEAR > 2020. Rate limits: PubMed allows 10 requests/second; Scopus allows 20,000 requests per week for standard accounts.

CNKI API Limitations

CNKI does not offer a public REST API. The only workaround is using CNKI Scholar’s web scraping (legally ambiguous) or the NoteExpress plugin that queries CNKI’s internal API. For Chinese researchers, the recommended approach is to use the Wanfang API (available via institutional subscription) for automated searching, then manually supplement with CNKI.

Search String Documentation: The PRISMA Checklist

Search string documentation is the backbone of reproducibility. The PRISMA 2020 statement (Page et al., BMJ, 2021) requires reporting the full search strategy for at least one database, including date of search and any filters.

Template for Each Database

For each database, record: (1) Database name and platform, (2) Date of search, (3) Full search string (copy-pasted), (4) Number of hits, (5) Any filters applied (e.g., language, publication year). Example: Web of Science Core Collection, searched 2024-01-15, TS=((“artificial intelligence” OR “deep learning”) AND “diagnosis”), 2,347 results, limited to English and 2019–2024.

Avoiding Common Errors

Do not use “quick search” or “smart search” features—they apply hidden algorithms. Always use advanced search mode. A 2022 audit by the University of Oxford found that 41% of published SLRs had at least one syntax error in their reported search strings, making them non-reproducible.

Peer Reviewing Your Strategy: Validation Steps

Peer review of the search strategy before execution can catch 90% of errors, per a 2023 study in Research Integrity and Peer Review. Two methods are standard.

PRESS Checklist

The PRESS (Peer Review of Electronic Search Strategies) checklist evaluates six elements: translation of research question, Boolean operators, subject headings, text words, spelling/syntax, and limits. Use it with a colleague. For example, check if your MeSH terms match the database’s controlled vocabulary—PubMed uses MeSH, while CNKI uses its own Chinese Subject Headings.

Test Set Validation

Create a gold standard test set of 10–20 known relevant articles. Run your search string and check if it retrieves all of them. If it misses any, refine the string. A 2021 study in the Journal of Clinical Epidemiology found that test set validation improved recall from 72% to 91% on average.

FAQ

Q1: How many databases should I search for a systematic review?

A minimum of 3 databases is recommended by the Cochrane Handbook (2023 edition). For a comprehensive SLR, use 5–7 databases, including at least one regional database (e.g., CNKI for Chinese research). A 2022 meta-analysis in Systematic Reviews found that using 5 databases recovered 95% of eligible studies, compared to 78% with 3 databases.

Q2: What is the best way to deduplicate records from multiple databases?

Use a reference manager with built-in deduplication. EndNote can remove 85% of duplicates automatically, while Zotero with the Bibtool plugin achieves 92% accuracy. A 2023 comparison by the University of Melbourne found that manual deduplication after automated removal catches an additional 6% of missed duplicates. For Chinese databases, NoteExpress has the highest deduplication accuracy at 88%.

Q3: How do I handle Chinese-language databases when my review is in English?

Search both English and Chinese keywords. For example, for “climate change adaptation,” also search 气候变化适应 in CNKI. A 2020 study in Environmental Research Letters found that adding Chinese database searches increased relevant study retrieval by 18% for global environmental reviews. Use bilingual keywords and document both search strings separately.

参考资料

  • Cochrane Collaboration. 2023. Cochrane Handbook for Systematic Reviews of Interventions (Version 6.4).
  • Page, M.J. et al. 2021. “The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.” BMJ 372: n71.
  • Bramer, W.M. et al. 2021. “Optimal database combinations for literature searches in systematic reviews.” Journal of the Medical Library Association 109(1): 55-63.
  • National Science Library, Chinese Academy of Sciences. 2023. Coverage Analysis of Chinese Academic Databases for CSSCI Journals.
  • Unilink Education. 2024. Database Search Strategy Training Module for Graduate Researchers.