AI Classifier - Features & Implementation Guide

As researchers, have you ever struggled with managing large collections of academic papers? With hundreds or thousands of articles, efficiently categorizing them into appropriate folders becomes a challenge. AI Classifier is a Zotero plugin designed to solve this exact problem. It leverages Large Language Models (LLM) to intelligently classify your literature, significantly improving your workflow.

This article provides a detailed overview of AI Classifier’s core features and explains the implementation principles behind each function, helping you understand how the plugin works.

1. Core Features Overview

AI Classifier offers the following core features:

Feature	Description
📂 Intelligent Classification	AI reads paper titles, abstracts, and keywords, auto-classifying based on confidence thresholds
💾 Save/Restore Structure	Backup and restore your classification hierarchy anytime
📤 Export Data	Export titles, abstracts, or keywords as JSON
📥 Import Hierarchy	Create folder structure from TXT files
🛑 Stop Classification	Interrupt ongoing classification tasks anytime
🛡️ Privacy-First	API Key stored locally, never uploaded to any server

2. Features & Implementation Principles

2.1 Intelligent Classification (Core Feature)

This is the most important feature of AI Classifier. It uses LLM to understand paper content and automatically categorize them into preset folders.

Implementation Principles

Step 1: Fetch Paper Information

The plugin retrieves the following information from selected papers via Zotero API:

Title
Abstract
Keywords

These form the basis for the LLM to understand the paper’s content.

Step 2: Build Prompt

The plugin combines the user-configured classification hierarchy with paper information to construct a prompt for the LLM. The core prompt logic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
You are a literature classification assistant. Here is the paper to classify:
Title: {title}
Abstract: {abstract}
Keywords: {keywords}

Available classification hierarchy:
{folder_structure}

Based on the paper content, choose the most appropriate classification path.
Return format: JSON with the following fields:
- "path": The most appropriate classification path
- "confidence": Confidence score (float between 0-1)
- "reason": Brief explanation for the classification

Step 3: Call LLM API

The plugin sends HTTP requests to the user-configured LLM API (such as SiliconFlow, OpenAI, etc.) with the constructed prompt to get classification results.

The API call uses the standard OpenAI-compatible format:

1
2
3
4
5
6
7
{
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "messages": [
    {"role": "user", "content": "..."}
  ],
  "temperature": 0.1
}

Step 4: Parse Response & Confidence Threshold

After receiving the classification result from LLM, the plugin parses the JSON response to get the classification path and confidence score.

Key Design: Confidence Threshold Mechanism

Users can configure a confidence threshold (default: 0.7). The plugin only automatically moves a paper when the LLM’s confidence score is higher than the threshold. Otherwise, the paper stays in its original location, and the classification result is only logged.

This design is crucial:

Prevent Misclassification: Low confidence means AI is uncertain; forcing classification may cause errors
User Control: By adjusting the threshold, users can balance between “automation level” and “accuracy”
Progressive Optimization: Users can handle high-confidence results first, then review low-confidence ones later

Step 5: Execute Classification

Based on the classification result, the plugin calls Zotero API to move the paper to the target folder:

1
2
// Move item to target collection
await item.moveToCollection(collectionPath);

Technical Highlights

Batch Processing: Supports selecting multiple papers, processes them serially to avoid API rate limits
Real-time Progress: Updates UI after each paper, showing real-time progress
Error Tolerance: Failure on one paper doesn’t affect others; continues processing remaining papers
Async Tasks: Classification runs in background without blocking Zotero’s main interface

2.2 Save & Restore Classification Structure

This feature backs up and restores your literature classification system, preventing loss due to misoperations.

Implementation Principles

Saving Structure

The plugin traverses all collections in Zotero, recursively extracts the folder hierarchy, and serializes it to a JSON file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Pseudo-code: Save structure
function saveStructure() {
  const collections = Zotero.Collections.get();
  const structure = traverseCollections(collections, null);
  const json = JSON.stringify(structure, null, 2);
  saveToFile(json, backupPath);
}

// Recursively traverse collections
function traverseCollections(collections, parentId) {
  return collections
    .filter(c => c.parentID === parentId)
    .map(c => ({
      name: c.name,
      children: traverseCollections(collections, c.id)
    }));
}

Restoring Structure

Reads the backup JSON file and recursively creates collections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Pseudo-code: Restore structure
function restoreStructure(json) {
  const structure = JSON.parse(json);
  createCollections(structure, null);
}

function createCollections(nodes, parentId) {
  nodes.forEach(node => {
    const collection = Zotero.Collections.create({
      name: node.name,
      parentID: parentId
    });
    if (node.children) {
      createCollections(node.children, collection.id);
    }
  });
}

Notes

Restore Structure: Uses additive mode, only adding folders that exist in the backup but not in the current structure. It won’t delete existing collections or affect already-classified papers.
Import from TXT: Will clear all existing collections before rebuilding (papers themselves won’t be deleted).
The plugin prompts for confirmation before importing; proceed with caution.

2.3 Export Paper Data

This feature exports paper titles, abstracts, or keywords to JSON files for further analysis or migration.

Implementation Principles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Pseudo-code: Export data
function exportData(type) {
  const items = Zotero.Items.getAll();
  const data = items.map(item => {
    switch(type) {
      case 'title': return item.getField('title');
      case 'abstract': return item.getField('abstractNote');
      case 'keywords': return item.getTags().map(t => t.tag);
    }
  });
  saveToFile(JSON.stringify(data, null, 2), exportPath);
}

Export format example (titles):

1
2
3
4
5
[
  "Deep Learning for Scientific Discovery",
  "A Survey of Natural Language Processing",
  "Quantum Computing: Principles and Applications"
]

2.4 Import Hierarchy from TXT

This feature allows batch creation of folder structures from text files, perfect for establishing a complete classification system at once.

Implementation Principles

TXT File Format

The plugin supports hierarchical indentation format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
1. Physics
1.1 Quantum Mechanics
1.2 Thermodynamics
1.3 Relativity
1.3.1 General Relativity
1.3.2 Special Relativity
2. Chemistry
2.1 Organic Chemistry
2.2 Inorganic Chemistry
2.3 Analytical Chemistry

Indentation can use spaces or tabs; number prefixes are used for sorting.

Parsing Logic

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Pseudo-code: Parse TXT
function parseHierarchy(text) {
  const lines = text.split('\n');
  const stack = [];
  
  lines.forEach(line => {
    const match = line.match(/^(\s*)(\d+\.)?\s*(.+)/);
    if (!match) return;
    
    const indent = match[1].length;
    const name = match[3];
    const level = Math.floor(indent / 2) + 1;
    
    // Determine parent based on indentation level
    while (stack.length >= level) stack.pop();
    
    const node = { name, children: [] };
    if (stack.length === 0) {
      root.push(node);
    } else {
      stack[stack.length - 1].children.push(node);
    }
    stack.push(node);
  });
  
  return root;
}

Create Collections

After parsing, recursively call Zotero API to create collections:

1
2
3
4
5
6
7
8
9
function createCollections(nodes, parentId) {
  nodes.forEach(node => {
    const collection = Zotero.Collections.create({
      name: node.name,
      parentID: parentId
    });
    createCollections(node.children, collection.id);
  });
}

Notes

Importing hierarchy will clear all existing collections before rebuilding (papers in collections won’t be deleted, but will become unfiled)
The plugin prompts for confirmation before importing; proceed with caution

2.5 Stop Classification Task

When processing large numbers of papers, users may need to stop the classification process midway. This feature safely interrupts ongoing classification tasks.

Implementation Principles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Use flag to control task
let isStopping = false;

async function classifyPapers(items) {
  for (const item of items) {
    if (isStopping) {
      console.log("Classification stopped");
      break;
    }
    
    await classifyOnePaper(item);
    updateProgress();
  }
}

function stopClassification() {
  isStopping = true;
}

Design Considerations

Graceful Stop: Instead of forcefully terminating threads, it sets a flag to allow natural exit after current batch completes
State Reset: The flag needs to be reset after stopping so the next classification task can run normally

2.6 Privacy-First: Local Storage

AI Classifier is designed with strong privacy considerations, ensuring your API key and literature data remain secure.

Implementation Principles

Config File Location

All configuration (including API Key) is stored in a JSON file within the Zotero data directory:

OS	Path
Windows	`%APPDATA%\Zotero\zotero_ai_config.json`
macOS	`~/Library/Application Support/Zotero/zotero_ai_config.json`
Linux	`~/.config/zotero/zotero_ai_config.json`

Data Flow

1
2
3
User → Plugin → LLM API → Return result
         ↓
   Local config file (API Key)

Key Points:

API Key only exists in local config files, not transmitted through any third-party servers
When calling LLM API, the plugin communicates directly with the API provider (e.g., SiliconFlow, OpenAI)
The plugin itself doesn’t run any backend services and doesn’t collect any user data
Paper content (title, abstract, keywords) is only sent to LLM API during classification; no copies are stored

3. Configuration Guide

3.1 Config File Structure

zotero_ai_config.json contains the following fields:

1
2
3
4
5
6
7
8
{
  "apiUrl": "https://api.siliconflow.cn/v1/chat/completions",
  "apiKey": "your-api-key-here",
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "logPath": "/path/to/log/file.log",
  "confidenceThreshold": 0.7,
  "customPrompt": ""
}

3.2 Supported LLM Providers

Theoretically supports all OpenAI API-compatible providers. Common choices:

SiliconFlow: Fast access in China, affordable pricing
OpenAI: Official service, reliable quality
Anthropic: Claude series models
Local Deployment: Ollama, LM Studio, etc.

3.3 Custom Prompts

Under “Settings” → “Prompt Configuration”, users can customize the prompt sent to the LLM. Advanced users can optimize prompts based on their field to improve classification accuracy.

4. Getting Started

4.1 Installation

Download the latest .xpi file from GitHub Releases
Open Zotero, click Tools → Add-ons
Click the gear icon → Install Add-on From File…
Select the downloaded .xpi file
Restart Zotero

4.2 Initial Configuration

Click Tools → AI Classifier → Set Log File Location to choose log save path
Click Tools → AI Classifier → Settings → API Configuration
Fill in:
- API URL: e.g., https://api.siliconflow.cn/v1/chat/completions
- API Key: Your key
- Model Name: e.g., Qwen/Qwen2.5-7B-Instruct
Click “Test Connection”, then save

4.3 Complete Workflow

Step 1: Create Classification Hierarchy

Create a TXT file with your classification structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
1. Machine Learning
1.1 Deep Learning
1.2 Reinforcement Learning
1.3 Natural Language Processing
2. Computer Vision
2.1 Image Classification
2.2 Object Detection
2.3 Image Segmentation
3. Physics
3.1 Quantum Mechanics
3.2 Fluid Dynamics

💡 Tip: If you’re unsure how to build your classification hierarchy, you can first use the plugin’s “Export” feature to export your library information (titles, abstracts, etc.), then submit it to an AI to get recommendations for a reasonable classification structure. This is more efficient than brainstorming on your own.

Step 2: Import Hierarchy

Click AI Classifier → Import Hierarchy from TXT, select the TXT file.

Step 3: Select Papers and Classify

Select multiple papers to classify in Zotero
Click AI Classifier → LLM Model Classification
Set confidence threshold (e.g., 0.7)
Start classification and watch progress

Step 4: Backup Structure

Before important operations, click Save Current Structure to backup.

5. FAQ

Q1: Classification results are inaccurate. What should I do?

Adjust confidence threshold - lowering it classifies more papers but may increase errors
Customize prompts to provide more detailed classification guidance
Use a more powerful LLM model

Q2: Classification is too slow. What should I do?

Choose an API provider with faster response (e.g., SiliconFlow)
Use smaller models (e.g., Qwen2.5-3B)
Reduce the number of papers per classification batch

Q3: API calls failing. What should I do?

Check if API Key is correct
Check network connection
Check logs (Tools → AI Classifier → Log Viewer) to troubleshoot

6. Conclusion

AI Classifier deeply integrates Large Language Models with Zotero’s literature management, enabling intelligent classification of academic papers. It not only has complete functionality but also carefully considers user privacy, data security, and user experience.

Understanding the implementation principles behind each feature helps you use the plugin more effectively and optimize it based on your needs. I hope this article helps you leverage AI Classifier to make literature management easier and more efficient.

GitHub Repository: https://github.com/KeqiYe/Zotero-AI-Classifier
Author Profile: https://github.com/KeqiYe

AI Classifier - Features & Implementation Guide

1. Core Features Overview

2. Features & Implementation Principles

2.1 Intelligent Classification (Core Feature)

Implementation Principles

Technical Highlights

2.2 Save & Restore Classification Structure

Implementation Principles

Notes

2.3 Export Paper Data

Implementation Principles

2.4 Import Hierarchy from TXT

Implementation Principles

Notes

2.5 Stop Classification Task

Implementation Principles

Design Considerations

2.6 Privacy-First: Local Storage

Implementation Principles

3. Configuration Guide

3.1 Config File Structure

3.2 Supported LLM Providers

3.3 Custom Prompts

4. Getting Started

4.1 Installation

4.2 Initial Configuration

4.3 Complete Workflow

5. FAQ

Q1: Classification results are inaccurate. What should I do?

Q2: Classification is too slow. What should I do?

Q3: API calls failing. What should I do?

6. Conclusion

Related Links