Skip to content

GSOC Application: OCR prototype for ancient documents #12881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 195 additions & 0 deletions OCR_Implementation_Plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# OCR Implementation Plan for JabRef

This document outlines the implementation plan for adding OCR (Optical Character Recognition) support to JabRef, focusing on improved handling of ancient documents and scanned PDFs. This plan demonstrates my understanding of JabRef's architecture and how OCR functionality can be integrated in a clean, modular way.

## 1. OCR Service Interface Prototype

### Architecture Overview

Following JabRef's hexagonal architecture, I'll create a clean separation of concerns with:

- **Domain core**: Define OCR operations and models
- **Ports**: Interfaces that define boundaries between components
- **Adapters**: Implementations that connect to specific OCR engines

### Key Components

```
org.jabref.logic.ocr/
├── OcrService.java # Core interface (port) defining OCR operations
├── models/
│ ├── OcrResult.java # Domain model for OCR results
│ ├── OcrLanguage.java # Domain model for OCR language options
│ └── OcrEngineConfig.java # Domain model for engine configuration
├── exception/
│ └── OcrProcessException.java # Domain-specific exceptions
├── engines/ # Package for engine adapters
│ ├── OcrEngineAdapter.java # Base adapter interface
│ └── TesseractAdapter.java # Adapter for Tesseract (placeholder)
└── OcrManager.java # Facade coordinating OCR operations
```

### Implementation Approach

I'll follow these principles to match JabRef's architecture:

1. **Interface-first design**: Define clear interfaces before implementation
2. **Adapter pattern**: Wrap OCR engines in adapters that implement common interface
3. **Dependency inversion**: Core logic depends on abstractions, not concrete implementations
4. **Domain-driven design**: Create proper domain models for OCR concepts

### Integration Points

The OCR service will integrate with JabRef through:

- **Entry processing**: Attach to entry import workflow
- **PDF handling**: Integrate with PDF utilities
- **Search system**: Connect to Lucene indexer

## 2. PDF Text Layer Proof-of-Concept

### Architecture Overview

This component will demonstrate how to add OCR-extracted text as a searchable layer to PDFs, following JabRef's existing patterns for PDF manipulation.

### Key Components

```
org.jabref.logic.ocr.pdf/
├── TextLayerAdder.java # Utility to add text layers to PDFs
├── OcrPdfProcessor.java # Processor for PDF OCR operations
└── SearchableTextLayer.java # Model for searchable text layers
```

### Implementation Approach

The proof-of-concept will:

1. Use PDFBox in a similar way to existing JabRef PDF utilities
2. Follow the same patterns as `XmpUtilWriter` for metadata operations
3. Create a clean API for adding text layers to PDFs
4. Demonstrate how OCR text can be indexed by Lucene

### Integration Points

- Connect with `IndexManager` for search indexing
- Integrate with JabRef's PDF processing pipeline
- Utilize existing PDF utilities where appropriate

## 3. Preference Panel for OCR Configuration

### Architecture Overview

I'll create a preference panel that follows JabRef's existing UI patterns and preference management system.

### Key Components

```
org.jabref.gui.preferences.ocr/
├── OcrTab.java # UI component extending AbstractPreferenceTabView
├── OcrTabViewModel.java # ViewModel for the OCR preferences
└── OcrPreferences.java # Preference model for OCR settings
```

### Implementation Approach

The preference panel will:

1. Follow MVVM pattern like other JabRef preference tabs
2. Use JavaFX controls with property binding
3. Implement validation for preference values
4. Integrate with JabRef's preference persistence

### Integration Points

- Register in `PreferencesDialogViewModel`
- Access through `GuiPreferences`
- Coordinate with OCR service implementation

## API Design

The core OCR service interface will define these operations:

```java
public interface OcrService {
/**
* Process a PDF file using OCR to extract text
* @param pdfPath Path to the PDF file
* @return OCR result containing extracted text and metadata
*/
OcrResult processPdf(Path pdfPath) throws OcrProcessException;

/**
* Process an image file using OCR to extract text
* @param imagePath Path to the image file
* @return OCR result containing extracted text and metadata
*/
OcrResult processImage(Path imagePath) throws OcrProcessException;

/**
* Add OCR-extracted text as a searchable layer to a PDF file
* @param pdfPath Path to the source PDF file
* @param outputPath Path to save the modified PDF
* @param ocrResult OCR result containing extracted text to add
*/
void addTextLayerToPdf(Path pdfPath, Path outputPath, OcrResult ocrResult) throws OcrProcessException;

/**
* Set the language for OCR processing
* @param language OCR language to use
*/
void setLanguage(OcrLanguage language) throws OcrProcessException;

/**
* Get the name of the OCR engine
* @return Engine name
*/
String getEngineName();

/**
* Check if the OCR engine is available
* @return true if the engine is ready to use
*/
boolean isAvailable();
}
```

The adapter base class will provide common functionality:

```java
public abstract class OcrEngineAdapter implements OcrService {
protected OcrLanguage currentLanguage;
protected OcrEngineConfig config;

// Common implementation details for OCR engines
// Engine-specific subclasses will override key methods
}
```

## Implementation Strategy

I'll implement this project in phases:

1. **Phase 1**: Core interfaces and models
2. **Phase 2**: Basic adapter implementation (placeholder)
3. **Phase 3**: PDF text layer utility (demonstration)
4. **Phase 4**: Preference panel integration

This phased approach allows for early feedback and iterative improvement.

## Coding Standards and Testing

I'll follow JabRef's existing patterns for:

- Code style and organization
- JavaDoc documentation
- Unit testing with JUnit 5
- Separation of concerns

## Next Steps

1. Implement core interfaces and models
2. Create basic adapter implementations
3. Develop PDF text layer proof-of-concept
4. Design and integrate preference panel
5. Document and submit PR for review
119 changes: 119 additions & 0 deletions src/main/java/org/jabref/gui/preferences/ocr/OcrTab.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package org.jabref.gui.preferences.ocr;

import javafx.fxml.FXML;
import javafx.scene.control.Button;
import javafx.scene.control.CheckBox;
import javafx.scene.control.ComboBox;
import javafx.scene.control.TextField;
import javafx.stage.DirectoryChooser;

import org.jabref.gui.Globals;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lareinahu-2023, where did you get this from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @subhramit

The imports for javafx.scene.control.TextField and javafx.stage.DirectoryChooser are both standard JavaFX classes that are used in the code. TextField is used for the tesseractPathTextField field to display and edit the Tesseract path, while DirectoryChooser is indirectly used through DialogService to implement the directory selection functionality.

Regarding org.jabref.gui.Globals, I apologize for the confusion. I was unable to find where I originally saw this pattern. Looking at the codebase now, I see this approach doesn't match JabRef's current implementation style. Other tabs like LinkedFilesTab and GeneralTab use a different pattern, accessing preferences through view models rather than a static Globals class.

I'll update my implementation to follow the project's current pattern for directory selection, using the DirectoryDialogConfiguration with proper preferences injection, similar to how it's done in other preference tabs.

import org.jabref.gui.actions.ActionFactory;
import org.jabref.gui.actions.StandardActions;
import org.jabref.gui.help.HelpAction;
import org.jabref.gui.preferences.AbstractPreferenceTabView;
import org.jabref.gui.preferences.PreferencesTab;
import org.jabref.gui.util.DirectoryDialogConfiguration;
import org.jabref.gui.util.ViewModelListCellFactory;
import org.jabref.logic.help.HelpFile;
import org.jabref.logic.l10n.Localization;
import org.jabref.logic.ocr.models.OcrEngineConfig;

import com.airhacks.afterburner.views.ViewLoader;
import de.saxsys.mvvmfx.utils.validation.visualization.ControlsFxVisualizer;

/**
* Tab for OCR preferences in JabRef's preferences dialog.
* <p>
* This class demonstrates how to create a preference tab using JabRef's
* UI framework and MVVM pattern.
*/
public class OcrTab extends AbstractPreferenceTabView<OcrTabViewModel> implements PreferencesTab {

@FXML private CheckBox enableOcrCheckBox;
@FXML private ComboBox<String> engineComboBox;
@FXML private ComboBox<String> languageComboBox;
@FXML private CheckBox preprocessImagesCheckBox;
@FXML private ComboBox<OcrEngineConfig.QualityPreset> qualityPresetComboBox;
@FXML private TextField tesseractPathTextField;
@FXML private Button tesseractPathBrowseButton;
@FXML private Button helpButton;

private final ControlsFxVisualizer visualizer = new ControlsFxVisualizer();

/**
* Create a new OCR tab for JabRef preferences.
*/
public OcrTab() {
ViewLoader.view(this)
.root(this)
.load();
}

/**
* Initialize the tab.
*/
@FXML
public void initialize() {
this.viewModel = new OcrTabViewModel(preferences);

// Bind UI components to view model properties
enableOcrCheckBox.selectedProperty().bindBidirectional(viewModel.ocrEnabledProperty());

// Set up combo boxes with models
new ViewModelListCellFactory<String>()
.withText(name -> name)
.install(engineComboBox);
engineComboBox.itemsProperty().bind(viewModel.availableEnginesProperty());
engineComboBox.valueProperty().bindBidirectional(viewModel.defaultOcrEngineProperty());
engineComboBox.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());

new ViewModelListCellFactory<String>()
.withText(name -> name)
.install(languageComboBox);
languageComboBox.itemsProperty().bind(viewModel.availableLanguagesProperty());
languageComboBox.valueProperty().bindBidirectional(viewModel.defaultLanguageProperty());
languageComboBox.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());

preprocessImagesCheckBox.selectedProperty().bindBidirectional(viewModel.preprocessImagesProperty());
preprocessImagesCheckBox.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());

new ViewModelListCellFactory<OcrEngineConfig.QualityPreset>()
.withText(OcrEngineConfig.QualityPreset::getDescription)
.install(qualityPresetComboBox);
qualityPresetComboBox.itemsProperty().bind(viewModel.availableQualityPresetsProperty());
qualityPresetComboBox.valueProperty().bindBidirectional(viewModel.qualityPresetProperty());
qualityPresetComboBox.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());

tesseractPathTextField.textProperty().bindBidirectional(viewModel.tesseractPathProperty());
tesseractPathTextField.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());
tesseractPathBrowseButton.disableProperty().bind(enableOcrCheckBox.selectedProperty().not());

// Setup validation
visualizer.initVisualization(viewModel.getTesseractPathValidationStatus(), tesseractPathTextField);

// Configure help button
ActionFactory actionFactory = new ActionFactory();
actionFactory.configureIconButton(StandardActions.HELP,
new HelpAction(HelpFile.IMPORT_USING_OCR, dialogService, preferences.getExternalApplicationsPreferences()),
helpButton);
}

/**
* Handle the browse button click for Tesseract path.
*/
@FXML
private void onBrowseTesseractPath() {
DirectoryDialogConfiguration dirDialogConfiguration = new DirectoryDialogConfiguration.Builder()
.withInitialDirectory(Globals.prefs.get(Globals.WORKING_DIRECTORY))
.build();

dialogService.showDirectorySelectionDialog(dirDialogConfiguration)
.ifPresent(selectedDirectory -> viewModel.tesseractPathProperty().setValue(selectedDirectory.toString()));
}

@Override
public String getTabName() {
return Localization.lang("OCR");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string "OCR" should be localized using the proper localization method to ensure consistency across different languages.

}
}
Loading