SharePoint Metadata Makes Content Findable

Autoclassification enhanced with Natural Language Processing

SharePoint Metadata is critical in making search work. BA Insight’s Classification software uses text analytics to create metadata, leveraging SharePoint’s Managed Metadata Service (Overview of managed metadata in SharePoint Server 2013 here) and native SharePoint metadata navigation.

Microsoft SharePoint is one of the most widely used technologies in the world. Its place as the preferred platform for intranets, extranets, and portals makes it a key component of any organization’s content access and sharing initiatives. Many organizations, and their SharePoint designers, do not have a good metadata strategy, and thus users struggle finding the information they need. Microsoft has provided some really great information, available here, around the core concepts dealing with metadata. BA Insight takes these concepts one step further, applying the concepts of machine generated metadata and automatic content classification to the already robust metadata management capabilities within SharePoint.

SharePoint out-of-the-box barely meets the minimum requirements for enterprise search and intranet deployments. Organizations do not have a complete and functional taxonomy, which is critical in the application of strong and consistent metadata. Through advanced machine learning, the AutoClassifier tool provides a semantic entity extraction capability which identifies and relates concepts found inside content in nearly any system or format. The time required to create an organizational taxonomy is greatly reduced due to these machine learning capabilities.

Having a complete and functional taxonomy cannot solve all issues alone, as many classification tools on the market lead to ineffective tagging, making it difficult for users to discern which tags are helpful, and which are noise. Through an advanced and easy to use set of rules and thresholds, users gain complete control over the tagging accuracy, ensuring that only the most relevant tags are applied to content. This allows organizations to fully realize the potential of metadata-driven search, pushing the most relevant content directly to users based on their searches.

The AutoClassifier works natively within SharePoint 2010, SharePoint 2013, SharePoint 2016, SharePoint Online, and Office 365. Working with the software is very similar to how administrators currently manage metadata within SharePoint, which Microsoft provides a good article on here.

Organizations use AutoClassifier for SharePoint for:

Automatic Classification Leveraging the SharePoint Managed Metadata Service

Assigns concepts from a taxonomy or ontology to content automatically, providing high quality, consistent metadata. Content is tagged based on clear rules that are easy to understand and manage; rules are automatically generated and also provide very precise control. For example: content can be tagged when it matches particular concepts and also has specified metadata; matching can be hierarchical to disambiguate terms that may have different meanings in different contexts.

Machine Learning-based Semantic Entity Extraction

Prescribed Extraction

The AutoClassifier’s machine learning capabilities automatically identify and extract taxonomy concepts and structures. This automates the process and greatly reduces the time required to create an organizational taxonomy. Furthermore, the AutoClassifier’s smart algorithms allow correlations and patterns within information apparent to the users, when not previously identifiable.

Derived Extraction

Recognizes terms, phrases and regular expressions within content and assigns them to a managed property or managed metadata column. You can extract, for example, part numbers, project names, or customer names from a document. This can also be used for detection of PII, or for similar compliance and content auditing applications.

Automatic Content Types

Designates a content type to a document. For example, if documents have the term “invoice” in them, then you can set the content type to “Invoice” using Automatic Content Types. This also helps in maintaining governance and compliance by ensuring proper use of content types.

Entity Extraction

Smart Tagging

Uses a rules-based approach, with a powerful yet familiar full-text query language complete with Boolean, proximity and fielded search capabilities. There are no ‘black box’ algorithms, so you can understand and control how content is enriched and classified. Rule-generation is scriptable and starts with intelligent defaults, minimizing the effort needed to maintain rules. A Test Bench lets you preview categorization results in real-time against your documents.

Smart Tagging also combines advanced scoring rules and threshold settings to allow granular control of which tags are applied to content, based on how and in what context concepts were identified. Users can apply greater weight to content within the document title, an abstract, a summary, or any other property of a document, specifically tuning the tagging results to the business needs and structure of content. End users are provided a far superior experience based on highly accurate concept tagging, allowing search relevancy improvement not previously available.

Smart Pipeline Integration

The Smart Pipeline component allows for complete control of the content enrichment web services pipeline within SharePoint. It solves the core problem of out-of-the-box CEWS being limited to a single process and allows a wide range of advanced enrichment scenarios through a flexible and powerful interface.

Taxonomy Management, Available Natively within SharePoint

Create and modify multiple taxonomies or ontologies, with drag-and-drop simplicity for rearranging categories and editing category rules. Taxonomy information is stored and managed in the native SharePoint format, while also being enhanced with new features to allow auto-tagging. You can import and export taxonomies from industry formats such as SKOS, RDF, CSV, and the SharePoint term store interchange format, so you have the flexibility to use other taxonomy and ontology tools in combination with the BAI Software Portfolio.

DataSet Connectors

Connect to external systems or processes and enrich content by adding relevant metadata and/or normalizing terms. For example, people names can be matched to a master directory using a fuzzy match, so that misspellings can be cleaned up and different name formats can be normalized. Custom dataset connectors can be built to any enrichment process, for example domain-specific processing to recognize chemical names, protein sequences, etc.

Scripting

Supports complex content and metadata gathering using familiar VBscript. You can solve the trickiest and most demanding scenarios; modify content, metadata, and mappings in any way desired, and combine multiple metadata fields together using scripting.

Automatic Property Mapping

Removes the effort of assigning source system metadata field names to crawled properties and managed properties and maintaining these associations. You can also script or manually assign names and associations, giving you precise control when you need it. Smart Mapping includes dataset connectors, scripting, automatic property mapping, and more to provide sophisticated content enrichment. Metadata can be created and mapped based on other metadata – giving you the ability to flag content flexibly according to the needs of specific groups. For example, you can create a tag that indicates that an organization or project spans multiple countries and has more than a specified number of people associated with it.

KEY CONCEPTS:

Autoclassification

Classification is the process of describing a piece of content (a document, an email, a process), what it is about (products, projects, people, clients, topics and whatever is important to the individual business), and how it should be managed (distributed, secured, archived) by adding one or more tags to the metadata. Classification can be carried out manually or automatically, through classification software. Manual classification depends on humans to classify content and add metadata tags and often lacks consistency and scalability. Automatic classification is the process by which technology is used to assign these tags and metadata. The advantages to automatic classification are consistency, scalability, and time savings.

Rules-Based Autoclassification

In rules-based auto classification, rules are created to match terms and entities to make decisions on how content should be categorized and tagged with the correct metadata. Rules-based classifiers are not dependent on the information in the collection—a new collection can use the same rules. Hence, rules-based classifiers offer greater flexibility than statistics-based classifiers; however, until recently the creation and management of rules required a lot of effort and a person skilled in rules formats. BA Insight’s AutoClassifier provides a simple and intuitive rules authoring engine, allowing business users to manage and implement rules quickly and easily.