search¶
Services for searching and matching of text.
lshtein¶
A class to calculate a similarity based on the Levenshtein distance.
See http://en.wikipedia.org/wiki/Levenshtein_distance.
If available, the python-Levenshtein will be used which will provide better performance as it is implemented natively.
-
translate.search.lshtein.
distance
(a, b, stopvalue=0)¶ Same as python_distance in functionality. This uses the fast C version if we detected it earlier.
Note that this does not support arbitrary sequence types, but only string types.
-
translate.search.lshtein.
native_distance
(a, b, stopvalue=0)¶ Same as python_distance in functionality. This uses the fast C version if we detected it earlier.
Note that this does not support arbitrary sequence types, but only string types.
-
translate.search.lshtein.
python_distance
(a, b, stopvalue=- 1)¶ Calculates the distance for use in similarity calculation. Python version.
match¶
Class to perform translation memory matching from a store of translation units.
-
class
translate.search.match.
matcher
(store, max_candidates=10, min_similarity=75, max_length=70, comparer=None, usefuzzy=False)¶ A class that will do matching and store configuration for the matching process.
-
buildunits
(candidates)¶ Builds a list of units conforming to base API, with the score in the comment.
-
extendtm
(units, store=None, sort=True)¶ Extends the memory with extra unit(s).
- Parameters
units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in
matcher.inittm()
.
-
getstartlength
(min_similarity, text)¶ Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.
-
getstoplength
(min_similarity, text)¶ Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.
-
inittm
(stores, reverse=False)¶ Initialises the memory for later use. We use simple base units for speedup.
-
matches
(text)¶ Returns a list of possible matches for given source text.
- Parameters
text (String) – The text that will be search for in the translation memory
- Return type
list
- Returns
a list of units with the source and target strings from the translation memory. If
self.addpercentage
is True (default) the match quality is given as a percentage in the notes.
-
setparameters
(max_candidates=10, min_similarity=75, max_length=70)¶ Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored
-
usable
(unit)¶ Returns whether this translation unit is usable for TM
-
-
translate.search.match.
sourcelen
(unit)¶ Returns the length of the source string.
-
class
translate.search.match.
terminologymatcher
(store, max_candidates=10, min_similarity=75, max_length=500, comparer=None)¶ A matcher with settings specifically for terminology matching.
-
buildunits
(candidates)¶ Builds a list of units conforming to base API, with the score in the comment.
-
extendtm
(units, store=None, sort=True)¶ Extends the memory with extra unit(s).
- Parameters
units – The units to add to the TM.
store – Optional store from where some metadata can be retrieved and associated with each unit.
sort – Optional parameter that can be set to False to supress sorting of the candidates list. This should probably only be used in
matcher.inittm()
.
-
getstartlength
(min_similarity, text)¶ Calculates the minimum length we are interested in. The extra fat is because we don’t use plain character distance only.
-
getstoplength
(min_similarity, text)¶ Calculates a length beyond which we are not interested. The extra fat is because we don’t use plain character distance only.
-
inittm
(store)¶ Normal initialisation, but convert all source strings to lower case
-
matches
(text)¶ Normal matching after converting text to lower case. Then replace with the original unit to retain comments, etc.
-
setparameters
(max_candidates=10, min_similarity=75, max_length=70)¶ Sets the parameters without reinitialising the tm. If a parameter is not specified, it is set to the default, not ignored
-
usable
(unit)¶ Returns whether this translation unit is usable for terminology.
-
-
translate.search.match.
unit2dict
(unit)¶ converts a pounit to a simple dict structure for use over the web
terminology¶
A class that does terminology matching