SKAN NLP: Empowering Clinical Research Through Advanced Clinical Note Search

March 24, 2026

In healthcare, precision and accuracy are paramount, especially when it comes to conducting clinical research. Traditionally, the International Classification of Diseases (ICD) has been the cornerstone for identifying and categorizing various medical conditions. While ICD codes offer a standardized way to classify diseases and health-related problems, they can fall short in capturing rare diagnoses and other clinical information that does not fit easily and consistently into an ICD category. Clinical notes, filled with detailed descriptions of symptoms, patient history, and contextual information, hold a treasure trove of insights, which can be accessed using the SKAN NLP self-service tool.

SKAN allows researchers to freely search clinical notes for keyword terms in order to identify a cohort of interest. SKAN leverages Boolean search operators such as “and”, “or”, and “not”, allowing researchers to mix and match, or even combine, multiple queries to further refine results. The aggregate counts and demographic breakdowns returned from a search may be used for feasibility analysis and as inclusion criteria for data requests requiring additional data elements from the EHR and other sources. Researchers may also use it to view Natural Language Processing (NLP) de-identified notes for the purposes of confirming or adjusting their search criteria.

Dr. Ryan Hughes, a radiation oncologist specializing in head and neck cancer, has used SKAN to efficiently identify patient cohorts for retrospective research. In one study, SKAN enabled precise identification of patients with Rosai-Dorfman disease by searching clinical notes, narrowing an ICD-based cohort of over 400 patients to 46 relevant cases and significantly reducing manual chart review time. In a separate project, Dr. Hughes leveraged SKAN to search highly templated pathology reports for surgical specimens from oropharyngeal carcinoma, quickly identifying patients who had undergone surgery for this cancer. This SKAN-enabled approach directly supported a peer-reviewed publication, highlighting how the tool can accelerate research from cohort discovery to published results.

Andy Huang, a Dermatology Research Fellow, leveraged SKAN to address gaps in dermatology patient data amid high demand for skin cancer screenings. By using SKAN to identify patients with documented “skin exam” language in their progress notes and cross-referencing those results with i2b2 data, he was able to isolate first-time screening visits and determine which patients truly required specialty care. This approach helped optimize dermatology resources when structured codes alone were insufficient. In a separate project, Dr. Huang also used SKAN to support medication-focused research by enabling efficient review of progress notes to track stelazine prescribing patterns and usage metrics.

Data-driven insights drive healthcare innovation, and SKAN enhances how we leverage clinical data to advance medical knowledge and improve patient outcomes. By transcending the limitations of traditional coding systems and harnessing the power of NLP, SKAN opens new avenues for exploration and discovery. Clinical note sets from across Advocate’s Southeast region are currently available for text search include radiology, pathology, and progress notes, with more on the way. Cohorts identified by researchers using SKAN can be used when requesting data extraction from the Office of Informatics. Visit our website or attend a twice weekly open consultation session to learn more about SKAN, i2b2, and requesting data to support your research.

Regulatory

Quick Links

Sponsored Research

Quick Links

Helpful Resources

SKAN NLP: Empowering Clinical Research Through Advanced Clinical Note Search