Alfresco Connector for Apache ManifoldCF

Alfresco logo

I'm proud to announce that in the next release of Apache ManifoldCF will be also released my latest contribution to the project: the Alfresco Repository Connector.

This new connector will be provided in the default package of ManifoldCF together with all the other built-in connectors.

This connector allows repository administrators to manage content indexing for all the Alfresco instances that doesn't yet support the CMIS protocol (Alfresco 2.x and also probably Alfresco 1.x). But it could be also used to specify the scope of all the contents to involve using Alfresco specific features.

 

The Alfresco Repository Connector

This first version of the connector is based on the Alfresco Web Services API, that is the unique way that can be used to execute Lucene queries for any Alfresco installations without adding any customization artifacts.

This means that ManifoldCF 0.4-incubating add the Alfresco connection type in the ManifoldCF crawler web application in the add feature of a new repository connection:

Apache ManifoldCF - Crawler Webapp - Alfresco repository connection settings

 

In the repository connection settings page you can configure all the following parameters for the Alfresco session:

  • Username
  • Password
  • Protocol
  • Server
  • Port
  • Context path of the Alfresco Web Services API

Apache ManifoldCF - Crawler Webapp - Alfresco repository connection settings

 

For each job that involves an Alfresco repository connection it is possible to configure a Lucene query to define all the contents that need to be processed by ManifoldCF:

Apache ManifoldCF - Crawler Webapp - set the Lucene query in the job settings (click to enlarge)

 

It is possible to take a look at all the Alfresco settings (Alfresco connection and Lucene query) from the job view page:

Apache ManifoldCF - Crawler Webapp - Job view settings with an Alfresco repository connection (click to enlarge)

 

In the Document Status page of the crawler webapp then you will see all the contents processed by the Alfresco repository connection that is involved in this specific job: 

Apache ManifoldCF - Crawler Webapp - Alfresco document status 

 

Integration Tests

In the same way I done with the CMIS Connector, here we have a dedicated Maven module for integration tests that checks the quality of the implementation of the Alfresco Repository Connector.

To achieve this goal I used another helper module based on the Maven Alfresco Lifecycle that builds from scratch the Alfresco web application used by integration tests to validate the connector calls that will be made against the repository.

The integration tests module executes the following steps:

  1. Build from scratch the alfresco.war with the H2 database support
  2. Deploy a Jetty instance dedicated to ManifoldCF webapps
  3. Deploy a Jetty instance dedicated to Alfresco webapp
  4. Creates the test area contents in Alfresco
  5. Executes the test job configuring via REST ManifoldCF

I would like to thank Carlo Sciolla for his support on the H2 database dependency that was very important for me during the implementation of the integration tests for this Alfresco connector.

Carlo released the alfresco-h2-support module under an Apache-compliant license that allowed us to import it into ManifoldCF.

The documentation of the Alfresco Repository Connector is published in the official website of the project.

We are looking forward to receive your feedbacks about this new component in Apache ManifoldCF. So please don't esitate to contact me or post a comment here, or post a comment/issue in the ASF Jira related to the Alfresco connector

Anyway you can post a message in the ManifoldCF mailing list as well.

Hope this helps ;)