AI Classifier - Features & Implementation Guide

A comprehensive guide to the Zotero AI-assisted literature classification plugin

AI Classifier - Features & Implementation Guide

As researchers, have you ever struggled with managing large collections of academic papers? With hundreds or thousands of articles, efficiently categorizing them into appropriate folders becomes a challenge. AI Classifier is a Zotero plugin designed to solve this exact problem. It leverages Large Language Models (LLM) to intelligently classify your literature, significantly improving your workflow.

This article provides a detailed overview of AI Classifier’s core features and explains the implementation principles behind each function, helping you understand how the plugin works.


1. Core Features Overview

AI Classifier offers the following core features:

FeatureDescription
📂 Intelligent ClassificationAI reads paper titles, abstracts, and keywords, auto-classifying based on confidence thresholds
💾 Save/Restore StructureBackup and restore your classification hierarchy anytime
📤 Export DataExport titles, abstracts, or keywords as JSON
📥 Import HierarchyCreate folder structure from TXT files
🛑 Stop ClassificationInterrupt ongoing classification tasks anytime
🛡️ Privacy-FirstAPI Key stored locally, never uploaded to any server

2. Features & Implementation Principles

2.1 Intelligent Classification (Core Feature)

This is the most important feature of AI Classifier. It uses LLM to understand paper content and automatically categorize them into preset folders.

Implementation Principles

Step 1: Fetch Paper Information

The plugin retrieves the following information from selected papers via Zotero API:

  • Title
  • Abstract
  • Keywords

These form the basis for the LLM to understand the paper’s content.

Step 2: Build Prompt

The plugin combines the user-configured classification hierarchy with paper information to construct a prompt for the LLM. The core prompt logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
You are a literature classification assistant. Here is the paper to classify:
Title: {title}
Abstract: {abstract}
Keywords: {keywords}

Available classification hierarchy:
{folder_structure}

Based on the paper content, choose the most appropriate classification path.
Return format: JSON with the following fields:
- "path": The most appropriate classification path
- "confidence": Confidence score (float between 0-1)
- "reason": Brief explanation for the classification

Step 3: Call LLM API

The plugin sends HTTP requests to the user-configured LLM API (such as SiliconFlow, OpenAI, etc.) with the constructed prompt to get classification results.

The API call uses the standard OpenAI-compatible format:

1
2
3
4
5
6
7
{
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "messages": [
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.1
}

Step 4: Parse Response & Confidence Threshold

After receiving the classification result from LLM, the plugin parses the JSON response to get the classification path and confidence score.

Key Design: Confidence Threshold Mechanism

Users can configure a confidence threshold (default: 0.7). The plugin only automatically moves a paper when the LLM’s confidence score is higher than the threshold. Otherwise, the paper stays in its original location, and the classification result is only logged.

This design is crucial:

  • Prevent Misclassification: Low confidence means AI is uncertain; forcing classification may cause errors
  • User Control: By adjusting the threshold, users can balance between “automation level” and “accuracy”
  • Progressive Optimization: Users can handle high-confidence results first, then review low-confidence ones later

Step 5: Execute Classification

Based on the classification result, the plugin calls Zotero API to move the paper to the target folder:

1
2
// Move item to target collection
await item.moveToCollection(collectionPath);

Technical Highlights

  • Batch Processing: Supports selecting multiple papers, processes them serially to avoid API rate limits
  • Real-time Progress: Updates UI after each paper, showing real-time progress
  • Error Tolerance: Failure on one paper doesn’t affect others; continues processing remaining papers
  • Async Tasks: Classification runs in background without blocking Zotero’s main interface

2.2 Save & Restore Classification Structure

This feature backs up and restores your literature classification system, preventing loss due to misoperations.

Implementation Principles

Saving Structure

The plugin traverses all collections in Zotero, recursively extracts the folder hierarchy, and serializes it to a JSON file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Pseudo-code: Save structure
function saveStructure() {
  const collections = Zotero.Collections.get();
  const structure = traverseCollections(collections, null);
  const json = JSON.stringify(structure, null, 2);
  saveToFile(json, backupPath);
}

// Recursively traverse collections
function traverseCollections(collections, parentId) {
  return collections
    .filter(c => c.parentID === parentId)
    .map(c => ({
      name: c.name,
      children: traverseCollections(collections, c.id)
    }));
}

Restoring Structure

Reads the backup JSON file and recursively creates collections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Pseudo-code: Restore structure
function restoreStructure(json) {
  const structure = JSON.parse(json);
  createCollections(structure, null);
}

function createCollections(nodes, parentId) {
  nodes.forEach(node => {
    const collection = Zotero.Collections.create({
      name: node.name,
      parentID: parentId
    });
    if (node.children) {
      createCollections(node.children, collection.id);
    }
  });
}

Notes

  • Restore Structure: Uses additive mode, only adding folders that exist in the backup but not in the current structure. It won’t delete existing collections or affect already-classified papers.
  • Import from TXT: Will clear all existing collections before rebuilding (papers themselves won’t be deleted).
  • The plugin prompts for confirmation before importing; proceed with caution.

2.3 Export Paper Data

This feature exports paper titles, abstracts, or keywords to JSON files for further analysis or migration.

Implementation Principles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Pseudo-code: Export data
function exportData(type) {
  const items = Zotero.Items.getAll();
  const data = items.map(item => {
    switch(type) {
      case 'title': return item.getField('title');
      case 'abstract': return item.getField('abstractNote');
      case 'keywords': return item.getTags().map(t => t.tag);
    }
  });
  saveToFile(JSON.stringify(data, null, 2), exportPath);
}

Export format example (titles):

1
2
3
4
5
[
  "Deep Learning for Scientific Discovery",
  "A Survey of Natural Language Processing",
  "Quantum Computing: Principles and Applications"
]

2.4 Import Hierarchy from TXT

This feature allows batch creation of folder structures from text files, perfect for establishing a complete classification system at once.

Implementation Principles

TXT File Format

The plugin supports hierarchical indentation format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
1. Physics
1.1 Quantum Mechanics
1.2 Thermodynamics
1.3 Relativity
1.3.1 General Relativity
1.3.2 Special Relativity
2. Chemistry
2.1 Organic Chemistry
2.2 Inorganic Chemistry
2.3 Analytical Chemistry

Indentation can use spaces or tabs; number prefixes are used for sorting.

Parsing Logic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Pseudo-code: Parse TXT
function parseHierarchy(text) {
  const lines = text.split('\n');
  const stack = [];
  
  lines.forEach(line => {
    const match = line.match(/^(\s*)(\d+\.)?\s*(.+)/);
    if (!match) return;
    
    const indent = match[1].length;
    const name = match[3];
    const level = Math.floor(indent / 2) + 1;
    
    // Determine parent based on indentation level
    while (stack.length >= level) stack.pop();
    
    const node = { name, children: [] };
    if (stack.length === 0) {
      root.push(node);
    } else {
      stack[stack.length - 1].children.push(node);
    }
    stack.push(node);
  });
  
  return root;
}

Create Collections

After parsing, recursively call Zotero API to create collections:

1
2
3
4
5
6
7
8
9
function createCollections(nodes, parentId) {
  nodes.forEach(node => {
    const collection = Zotero.Collections.create({
      name: node.name,
      parentID: parentId
    });
    createCollections(node.children, collection.id);
  });
}

Notes

  • Importing hierarchy will clear all existing collections before rebuilding (papers in collections won’t be deleted, but will become unfiled)
  • The plugin prompts for confirmation before importing; proceed with caution

2.5 Stop Classification Task

When processing large numbers of papers, users may need to stop the classification process midway. This feature safely interrupts ongoing classification tasks.

Implementation Principles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Use flag to control task
let isStopping = false;

async function classifyPapers(items) {
  for (const item of items) {
    if (isStopping) {
      console.log("Classification stopped");
      break;
    }
    
    await classifyOnePaper(item);
    updateProgress();
  }
}

function stopClassification() {
  isStopping = true;
}

Design Considerations

  • Graceful Stop: Instead of forcefully terminating threads, it sets a flag to allow natural exit after current batch completes
  • State Reset: The flag needs to be reset after stopping so the next classification task can run normally

2.6 Privacy-First: Local Storage

AI Classifier is designed with strong privacy considerations, ensuring your API key and literature data remain secure.

Implementation Principles

Config File Location

All configuration (including API Key) is stored in a JSON file within the Zotero data directory:

OSPath
Windows%APPDATA%\Zotero\zotero_ai_config.json
macOS~/Library/Application Support/Zotero/zotero_ai_config.json
Linux~/.config/zotero/zotero_ai_config.json

Data Flow

1
2
3
User → Plugin → LLM API → Return result
   Local config file (API Key)

Key Points:

  • API Key only exists in local config files, not transmitted through any third-party servers
  • When calling LLM API, the plugin communicates directly with the API provider (e.g., SiliconFlow, OpenAI)
  • The plugin itself doesn’t run any backend services and doesn’t collect any user data
  • Paper content (title, abstract, keywords) is only sent to LLM API during classification; no copies are stored

3. Configuration Guide

3.1 Config File Structure

zotero_ai_config.json contains the following fields:

1
2
3
4
5
6
7
8
{
  "apiUrl": "https://api.siliconflow.cn/v1/chat/completions",
  "apiKey": "your-api-key-here",
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "logPath": "/path/to/log/file.log",
  "confidenceThreshold": 0.7,
  "customPrompt": ""
}

3.2 Supported LLM Providers

Theoretically supports all OpenAI API-compatible providers. Common choices:

  • SiliconFlow: Fast access in China, affordable pricing
  • OpenAI: Official service, reliable quality
  • Anthropic: Claude series models
  • Local Deployment: Ollama, LM Studio, etc.

3.3 Custom Prompts

Under “Settings” → “Prompt Configuration”, users can customize the prompt sent to the LLM. Advanced users can optimize prompts based on their field to improve classification accuracy.


4. Getting Started

4.1 Installation

  1. Download the latest .xpi file from GitHub Releases
  2. Open Zotero, click ToolsAdd-ons
  3. Click the gear icon → Install Add-on From File…
  4. Select the downloaded .xpi file
  5. Restart Zotero

4.2 Initial Configuration

  1. Click ToolsAI ClassifierSet Log File Location to choose log save path
  2. Click ToolsAI ClassifierSettingsAPI Configuration
  3. Fill in:
    • API URL: e.g., https://api.siliconflow.cn/v1/chat/completions
    • API Key: Your key
    • Model Name: e.g., Qwen/Qwen2.5-7B-Instruct
  4. Click “Test Connection”, then save

4.3 Complete Workflow

Step 1: Create Classification Hierarchy

Create a TXT file with your classification structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
1. Machine Learning
1.1 Deep Learning
1.2 Reinforcement Learning
1.3 Natural Language Processing
2. Computer Vision
2.1 Image Classification
2.2 Object Detection
2.3 Image Segmentation
3. Physics
3.1 Quantum Mechanics
3.2 Fluid Dynamics

💡 Tip: If you’re unsure how to build your classification hierarchy, you can first use the plugin’s “Export” feature to export your library information (titles, abstracts, etc.), then submit it to an AI to get recommendations for a reasonable classification structure. This is more efficient than brainstorming on your own.

Step 2: Import Hierarchy

Click AI ClassifierImport Hierarchy from TXT, select the TXT file.

Step 3: Select Papers and Classify

  1. Select multiple papers to classify in Zotero
  2. Click AI ClassifierLLM Model Classification
  3. Set confidence threshold (e.g., 0.7)
  4. Start classification and watch progress

Step 4: Backup Structure

Before important operations, click Save Current Structure to backup.


5. FAQ

Q1: Classification results are inaccurate. What should I do?

  1. Adjust confidence threshold - lowering it classifies more papers but may increase errors
  2. Customize prompts to provide more detailed classification guidance
  3. Use a more powerful LLM model

Q2: Classification is too slow. What should I do?

  1. Choose an API provider with faster response (e.g., SiliconFlow)
  2. Use smaller models (e.g., Qwen2.5-3B)
  3. Reduce the number of papers per classification batch

Q3: API calls failing. What should I do?

  1. Check if API Key is correct
  2. Check network connection
  3. Check logs (Tools → AI Classifier → Log Viewer) to troubleshoot

6. Conclusion

AI Classifier deeply integrates Large Language Models with Zotero’s literature management, enabling intelligent classification of academic papers. It not only has complete functionality but also carefully considers user privacy, data security, and user experience.

Understanding the implementation principles behind each feature helps you use the plugin more effectively and optimize it based on your needs. I hope this article helps you leverage AI Classifier to make literature management easier and more efficient.


🪐 本站总访问量 次 | 📖 本文阅读量
Built with Hugo
Theme Stack designed by Jimmy