Each content object in KARL can be adapted to extract the text provided to the text index. The adapter API is specified in karl.models.interfaces.ITextIndexData: each such adapter called with no additional arguments, and is expected to return a string representing the searchable text of its context.
A trivial implementation which returned only the title of its context might look like:
class TitleOnlySearchableText(object):
def __init__(self, context):
self.context = context
def __call__(self):
return self.context.title
The relevance of a particular document with respect to a given query term depends on the weight assigned to the occurences of the term in the document’s searchable text. By default, this weight is computed using the number of occurences of the term.
karl.content.models.adapters.makeFlexibleTextData() is a “factory-factory”, which defines full-text search adapters using parameterized weighting for different fields. The same adapter defined as a class above could be defined using the factory-factory as follows:
from karl.content.models.adapters import makeFlexibleTextData
TitleOnlySearchableText = makeFlexibleTextData([('title', 1, None)])
The argument passed to the factory-factory is an ordered list of tuples, each of the form (extractor, weight, cleaner).
To define an adapter which can be used across a number of different content types, we might provide an extractor function which tried different attributes:
from karl.content.models.adapters import makeFlexibleTextData
def _find_headline_or_title(context):
result = getattr(context, 'headline', None)
if result is None:
result = getattr(context, 'title', '')
return result
HeadlineOrTitleTextData = makeFlexibleTextData(
[(_find_headline_or_title, 1, None)])
Assume that we have a “news story” content object which has a blurb element, containing HTML, in addition to a plain text headline. We might define the adapter for such objects using the _html_cleaner utility function defined in karl.content.models.adapters:
from karl.content.models.adapters import makeFlexibleTextData
from karl.content.models.adapters import _html_cleaner
HeadlineBlurbTextData = makeFlexibleTextData([('headline', 1, None),
('blurb', 1, _html_cleaner),
])
To weight terms found in the headline three times more strongly than those found in the blurb:
HeadlineBlurbTextData = makeFlexibleTextData([('headline', 3, None),
('blurb', 1, _html_cleaner),
])
Some KARL content keeps a portion of its indexable data in “blob” files on the filesystem. To index such data, we can use another utility function, _extract_file_data, defined in karl.content.models.adapters:
from karl.content.models.adapters import makeFlexibleTextData
from karl.content.models.adapters import _extract_file_data
HeadlineBlurbTextData = makeFlexibleTextData([('title', 4, None),
(_extract_file_data, 1, None),
])