Last week we announced a new version of our AutoClassifier (you can read the announcement here). This release has many new features and capabilities, including the addition of machine learning into the AutoClassifier. There is remarkable power in machine learning, and we’re excited to add this to our products.
The trick has been doing it without sacrificing transparency and control. I’ve been in the industry long enough that this is the fourth auto-classification product I have brought to the market, and I’ve seen many sophisticated techniques fail because they are “black-box” algorithms that you can’t understand or control. What has made our AutoClassifier successful is that you can get going quickly, understand exactly how and why you get the metadata it generates, and control it as precisely as you want. Machine Learning is notoriously opaque, so we are introducing it in a controllable way.
We’ve kept our rules-based classification engine as the heart of the AutoClassifier, and added additional modules. In particular, we’ve added:
Our AutoClassifier now has multiple text analytics capabilities that can be combined in many different ways. Its overall function - creating metadata – is the same, but you can now combine rules-based and machine-learning approaches to provide consistent, high-quality metadata. Organizations can use the resulting metadata to improve findability, discover new insights, and ensure compliance.
We reached the conclusion at BA Insight that hybrid solutions give the most flexibility, and the best results, which is what prompted us to come up with this new release of the AutoClassifier. We allow our customers to apply the best technique, or to combine techniques in innovative ways when that’s needed. Our customers benefit from leading-edge text analytics technology, and also from the flexibility to apply our product in different ways to different content or different applications.
Back in November I did a session at Taxonomy Bootcamp explaining Rules-Based and Machine Learning-Based approaches to AutoClassification (you can download the slides from slideshare). There’s been a battle in the industry between these techniques, and there are pros and cons to each, but neither technique is best for everything. Here’s a high level summary of the pros and cons from my talk:
Which is better? Neither, and both. If you are just getting started, it’s simpler to use rules. If you are looking for the most extensive coverage in a wide domain, you’ll want to use some machine learning. And if you want the best possible results, it turns out that a hybrid combination of both is the winner.
This reminds me of a similar ”battle” 15 years ago between statistical and linguistic approaches to search. In the end, the best answer was to use both, and all modern search engines combine statistics and linguistics internally.
I believe we’ve been able to make a single product in the AutoClassifier that allows our customers unprecedented power without losing the simplicity that we are often complimented for. Our customers have told us that we made it simple to combine techniques, and easy to understand.
We’ll continue to invest in the AutoClassifier, and now that we have an ability to plug in different modules we can do some amazing things. I expect that our customers will also come up with ideas and innovative solutions as well. Adding machine learning under the hood is exciting, and a hybrid combination of techniques provides the best of both worlds.
Yes, AutoClassifier V3 supports O365, SharePoint 2013 and SharePoint 2016. Hybrid SharePoint configurations are also supported.
If you can crawl it, you can process it. Combining the AutoClassifier with our connectors provides processing of content from over 60 different systems.
Yes. Follow the online documentation in the customer portal to upgrade, and contact customer support with any questions.
Even if you use machine learning, regular expression matching, or other modules that aren’t taxonomy based, we recommend starting with one or more core taxonomies and rule sets and then add in the other techniques.
Any of these should be taken as a starting point – expect to tailor and maintain these over time to adapt them to your specific needs. We recommend using multiple simpler taxonomies rather than one giant one. You can often work with a simple “enterprise-wide” taxonomy that’s applied to all content and then apply different department-specific taxonomies to different sets of content.