CMIS Connector for Apache ManifoldCF

Apache ManifoldCF logoYesterday I released a patch for a new component of the Apache ManifoldCF project: the CMIS Connector.

For who doesn't know Apache ManifoldCF, this is an open source framework that allows you to manage repositories indexing.

The framework supports repositories like SharePoint and Documentum (and now all the CMIS-compliant repositories :) ) to configure the target output repositories, as Apache Solr, to create indexes for contents.

This means that you can create and schedule jobs to read content from repositories and push contents for indexing to Apache Solr. The framework also allow you to configure all the metadata fields mapping that you want to push to the Solr instance.

The CMIS Connector implementation is based on the Apache Chemistry Client 0.4.0 using the AtomPub binding.

The patch is available in the ASF Jira, if you don't want to wait for the next version of ManifoldCF, you can download all the patch files and you can apply it to the ManifoldCF source code.

The patch will add the CMIS connection type in the ManifoldCF crawler web application in the add feature of a new repository connection:

Apache ManifoldCF - Crawler Webapp - Creating a new repository connection

In the repository connection settings page you can configure all the following parameters for the CMIS session:

  • Username
  • Password
  • Endpoint
  • Repository ID (optional)

This connector settings are used by ManifoldCF to manage the CMIS sessions against the CMIS-compliant repositories. If the Repository ID parameter is null the session will be taken from the first CMIS repository exposed by the endpoint:

Apache ManifoldCF - Crawler Webapp - CMIS repository connection settings

For each job that involves a CMIS repository connection it is possible to configure a CMIS query to define all the contents that need to be processed by ManifoldCF:

Apache ManifoldCF - Crawler Webapp - set the CMIS query in the job settings (click to enlarge)

It is possible to take a look at all the CMIS settings (CMIS connection and CMIS query) from the job view page:

Apache ManifoldCF - Crawler Webapp - Job view settings with a CMIS repository connection (click to enlarge)

In the Document Status page of the crawler webapp then you will see all the contents processed by the CMIS repository connector:

Apache ManifoldCF - Crawler Webapp - CMIS document status

Now I'm working with Karl Wright (ManifoldCF committer) to test and fix this new component, and during these days I think that I could add some improvements for the CMIS Connector.

I'm looking forward to receive your feedbacks about this new component in Apache ManifoldCF. So please don't esitate to contact me or post a comment here, or post a comment/issue in the ASF Jira related to the CMIS Connector component

Anyway you can post a message in the ManifoldCF mailing list as well.

Hope this helps ;)