AI Classifier - Features & Implementation Guide
As researchers, have you ever struggled with managing large collections of academic papers? With hundreds or thousands of articles, efficiently categorizing them into appropriate folders becomes a challenge. AI Classifier is a Zotero plugin designed to solve this exact problem. It leverages Large Language Models (LLM) to intelligently classify your literature, significantly improving your workflow.
This article provides a detailed overview of AI Classifier’s core features and explains the implementation principles behind each function, helping you understand how the plugin works.
1. Core Features Overview
AI Classifier offers the following core features:
| Feature | Description |
|---|---|
| 📂 Intelligent Classification | AI reads paper titles, abstracts, and keywords, auto-classifying based on confidence thresholds |
| 💾 Save/Restore Structure | Backup and restore your classification hierarchy anytime |
| 📤 Export Data | Export titles, abstracts, or keywords as JSON |
| 📥 Import Hierarchy | Create folder structure from TXT files |
| 🛑 Stop Classification | Interrupt ongoing classification tasks anytime |
| 🛡️ Privacy-First | API Key stored locally, never uploaded to any server |
2. Features & Implementation Principles
2.1 Intelligent Classification (Core Feature)
This is the most important feature of AI Classifier. It uses LLM to understand paper content and automatically categorize them into preset folders.
Implementation Principles
Step 1: Fetch Paper Information
The plugin retrieves the following information from selected papers via Zotero API:
- Title
- Abstract
- Keywords
These form the basis for the LLM to understand the paper’s content.
Step 2: Build Prompt
The plugin combines the user-configured classification hierarchy with paper information to construct a prompt for the LLM. The core prompt logic:
| |
Step 3: Call LLM API
The plugin sends HTTP requests to the user-configured LLM API (such as SiliconFlow, OpenAI, etc.) with the constructed prompt to get classification results.
The API call uses the standard OpenAI-compatible format:
| |
Step 4: Parse Response & Confidence Threshold
After receiving the classification result from LLM, the plugin parses the JSON response to get the classification path and confidence score.
Key Design: Confidence Threshold Mechanism
Users can configure a confidence threshold (default: 0.7). The plugin only automatically moves a paper when the LLM’s confidence score is higher than the threshold. Otherwise, the paper stays in its original location, and the classification result is only logged.
This design is crucial:
- Prevent Misclassification: Low confidence means AI is uncertain; forcing classification may cause errors
- User Control: By adjusting the threshold, users can balance between “automation level” and “accuracy”
- Progressive Optimization: Users can handle high-confidence results first, then review low-confidence ones later
Step 5: Execute Classification
Based on the classification result, the plugin calls Zotero API to move the paper to the target folder:
| |
Technical Highlights
- Batch Processing: Supports selecting multiple papers, processes them serially to avoid API rate limits
- Real-time Progress: Updates UI after each paper, showing real-time progress
- Error Tolerance: Failure on one paper doesn’t affect others; continues processing remaining papers
- Async Tasks: Classification runs in background without blocking Zotero’s main interface
2.2 Save & Restore Classification Structure
This feature backs up and restores your literature classification system, preventing loss due to misoperations.
Implementation Principles
Saving Structure
The plugin traverses all collections in Zotero, recursively extracts the folder hierarchy, and serializes it to a JSON file.
| |
Restoring Structure
Reads the backup JSON file and recursively creates collections:
| |
Notes
- Restore Structure: Uses additive mode, only adding folders that exist in the backup but not in the current structure. It won’t delete existing collections or affect already-classified papers.
- Import from TXT: Will clear all existing collections before rebuilding (papers themselves won’t be deleted).
- The plugin prompts for confirmation before importing; proceed with caution.
2.3 Export Paper Data
This feature exports paper titles, abstracts, or keywords to JSON files for further analysis or migration.
Implementation Principles
| |
Export format example (titles):
| |
2.4 Import Hierarchy from TXT
This feature allows batch creation of folder structures from text files, perfect for establishing a complete classification system at once.
Implementation Principles
TXT File Format
The plugin supports hierarchical indentation format:
| |
Indentation can use spaces or tabs; number prefixes are used for sorting.
Parsing Logic
| |
Create Collections
After parsing, recursively call Zotero API to create collections:
| |
Notes
- Importing hierarchy will clear all existing collections before rebuilding (papers in collections won’t be deleted, but will become unfiled)
- The plugin prompts for confirmation before importing; proceed with caution
2.5 Stop Classification Task
When processing large numbers of papers, users may need to stop the classification process midway. This feature safely interrupts ongoing classification tasks.
Implementation Principles
| |
Design Considerations
- Graceful Stop: Instead of forcefully terminating threads, it sets a flag to allow natural exit after current batch completes
- State Reset: The flag needs to be reset after stopping so the next classification task can run normally
2.6 Privacy-First: Local Storage
AI Classifier is designed with strong privacy considerations, ensuring your API key and literature data remain secure.
Implementation Principles
Config File Location
All configuration (including API Key) is stored in a JSON file within the Zotero data directory:
| OS | Path |
|---|---|
| Windows | %APPDATA%\Zotero\zotero_ai_config.json |
| macOS | ~/Library/Application Support/Zotero/zotero_ai_config.json |
| Linux | ~/.config/zotero/zotero_ai_config.json |
Data Flow
| |
Key Points:
- API Key only exists in local config files, not transmitted through any third-party servers
- When calling LLM API, the plugin communicates directly with the API provider (e.g., SiliconFlow, OpenAI)
- The plugin itself doesn’t run any backend services and doesn’t collect any user data
- Paper content (title, abstract, keywords) is only sent to LLM API during classification; no copies are stored
3. Configuration Guide
3.1 Config File Structure
zotero_ai_config.json contains the following fields:
| |
3.2 Supported LLM Providers
Theoretically supports all OpenAI API-compatible providers. Common choices:
- SiliconFlow: Fast access in China, affordable pricing
- OpenAI: Official service, reliable quality
- Anthropic: Claude series models
- Local Deployment: Ollama, LM Studio, etc.
3.3 Custom Prompts
Under “Settings” → “Prompt Configuration”, users can customize the prompt sent to the LLM. Advanced users can optimize prompts based on their field to improve classification accuracy.
4. Getting Started
4.1 Installation
- Download the latest
.xpifile from GitHub Releases - Open Zotero, click Tools → Add-ons
- Click the gear icon → Install Add-on From File…
- Select the downloaded
.xpifile - Restart Zotero
4.2 Initial Configuration
- Click Tools → AI Classifier → Set Log File Location to choose log save path
- Click Tools → AI Classifier → Settings → API Configuration
- Fill in:
- API URL: e.g.,
https://api.siliconflow.cn/v1/chat/completions - API Key: Your key
- Model Name: e.g.,
Qwen/Qwen2.5-7B-Instruct
- API URL: e.g.,
- Click “Test Connection”, then save
4.3 Complete Workflow
Step 1: Create Classification Hierarchy
Create a TXT file with your classification structure:
| |
💡 Tip: If you’re unsure how to build your classification hierarchy, you can first use the plugin’s “Export” feature to export your library information (titles, abstracts, etc.), then submit it to an AI to get recommendations for a reasonable classification structure. This is more efficient than brainstorming on your own.
Step 2: Import Hierarchy
Click AI Classifier → Import Hierarchy from TXT, select the TXT file.
Step 3: Select Papers and Classify
- Select multiple papers to classify in Zotero
- Click AI Classifier → LLM Model Classification
- Set confidence threshold (e.g., 0.7)
- Start classification and watch progress
Step 4: Backup Structure
Before important operations, click Save Current Structure to backup.
5. FAQ
Q1: Classification results are inaccurate. What should I do?
- Adjust confidence threshold - lowering it classifies more papers but may increase errors
- Customize prompts to provide more detailed classification guidance
- Use a more powerful LLM model
Q2: Classification is too slow. What should I do?
- Choose an API provider with faster response (e.g., SiliconFlow)
- Use smaller models (e.g., Qwen2.5-3B)
- Reduce the number of papers per classification batch
Q3: API calls failing. What should I do?
- Check if API Key is correct
- Check network connection
- Check logs (Tools → AI Classifier → Log Viewer) to troubleshoot
6. Conclusion
AI Classifier deeply integrates Large Language Models with Zotero’s literature management, enabling intelligent classification of academic papers. It not only has complete functionality but also carefully considers user privacy, data security, and user experience.
Understanding the implementation principles behind each feature helps you use the plugin more effectively and optimize it based on your needs. I hope this article helps you leverage AI Classifier to make literature management easier and more efficient.
Related Links
- GitHub Repository: https://github.com/KeqiYe/Zotero-AI-Classifier
- Author Profile: https://github.com/KeqiYe