Skip to content

Releases: J535D165/recordlinkage

Version 0.9.0 (21 June 2017)

28 Dec 12:57
Compare
Choose a tag to compare
  • A new index API. The new index API is no longer a single class
    (recordlinkage.Pairs(...)) with all the functionality in it. The new API
    is based on Tensorflow and FEBRL. With the new structure, it easier to
    parallise the record linkage process. In future releases, this will be
    implemented natively. See the reference page for more information and migrating. <http://recordlinkage.readthedocs.io/en/latest/ref-index.html>_
  • Significant speed improvement of the Sorted Neighbourhood Indexing
    algorithm. Thanks to @perryvais (PR #32).
  • The function binary_comparisons is renamed. The new name of the function
    is binary_vectors. Documentation added to RTD.
  • Added unit tests to test the generation of random comparison vectors.
  • Logging module added to separate module logs from user logs. The
    implementation is based on Tensorflow.

Version 0.8.1 (27 Jan 2017)

27 Jan 10:54
Compare
Choose a tag to compare
  • Issues solved with rendering docs on ReadTheDocs. Still not clear what is
    going on with the autodoc_mock_imports in the sphinx conf.py file. Maybe
    a bug in sphinx.
  • Move six to dependencies.
  • The reference part of the docs is split into separate subsections. This
    makes the reference better readable.
  • The landing page of the docs is slightly changed.

Version 0.8.0 (22 Jan 2017)

23 Jan 12:34
Compare
Choose a tag to compare
  • Add additional arguments to the function that downloads and loads the
    krebsregister data. The argument missing_values is used to fill missing
    values. Default: nothing is done. The argument shuffle is used to
    shuffle the records. Default is True.
  • Remove the lastest traces of the old package name. The new package name is
    'Python Record Linkage Toolkit'
  • Better error messages when there are only matches or non-matches are passed
    to train the classifier.
  • Add AirSpeedVelocity tests to test the performance.
  • Compare for deduplication fixed. It was broken.
  • Parameterized tests for the Compare class and its algorithms. Making use
    of nose-parameterized module.
  • Update documentation about contributing.
  • Bugfix/improvement when blocking on multiple columns with missing values.
  • Fix bug #29. Package
    not working with pandas 0.18 and 0.17. Dropped support pandas 0.17 and fixed
    support for 0.18. Also added multi-dendency tests for TravisCI.
  • Support for dedicated deduplication algorithms
  • Special algorithm for full index in case of finding duplicates. Performce is
    100x better.
  • Function max_number_of_pairs to get the maximum number of pairs.
  • low_memory for compare class.
  • Improved performance in case of comparing a large number of record pairs.
  • New documentation about custom algorithms
  • New documentation about the use of classifiers.
  • Possible to compare arrays and series directly without using labels.
  • Make a dataframe with random comparison vectors with the
    binary_comparisons in the recordlinkage.datasets.random module.
  • Set KMeans cluster centers by hand.
  • Various documentation updates and improvements.
  • Jellyfish is now a required dependency. Fixes bug #30.
  • Added tox.ini to test packaging and installation of package.
  • Drop requirements.txt file.
  • Many small fixes and changes. Most of the changes cover the Compare
    module. Especially label handling is improved.

Version 0.7.2 (9 Nov 2016)

09 Nov 11:30
Compare
Choose a tag to compare
v0.7.2

Bugfix in levenshtein algorithms

Version 0.7.1 (9 Nov 2016)

09 Nov 10:58
Compare
Choose a tag to compare
v0.7.1

Improve importing workflow + dist bug fix

Version 0.6.0

12 Oct 12:32
Compare
Choose a tag to compare

This version includes the following updates:

  • Reformatting the code such that it follows PEP8.
  • Add Travis-CI and codecov support.
  • Switch to distributing wheels.
  • Fix bugs with depreciated pandas functions. __sub__ is no longer used for computing the difference of Index objects. It is now replaced by ``INDEX.difference(OTHER_INDEX).
  • Exclude pairs with NaN's on the index-key in Q-gram indexing.
  • Add tests for krebsregister dataset.
  • Fix Python3 bug on krebsregister dataset.
  • Improve unicode handling in phonetic encoding functions.
  • Strip accents with the clean function.
  • Add documentation
  • Bug for random indexing with incorrect arguments fixed and tests added.
  • Improved deployment workflow
  • And much more

Version 0.5.0 (9 Sep 2016)

09 Sep 12:13
Compare
Choose a tag to compare
  • Batch comparing added. Signifant speed improvement.
  • rldatasets are now included in the package itself.
  • Added an experimental gender imputation tool.
  • Blocking and SNI skip missing values
  • No longer need for different index names
  • FEBRL datasets included
  • Unit tests for indexing and comparing improved
  • Documentation updated

Version 0.4.0 (20 Aug 2016)

20 Aug 20:44
Compare
Choose a tag to compare
  • Fixes a serious bug with deduplication (thanks to https://github.com/dserban).
  • Fixes undesired behaviour for sorted neighbourhood indexing with missing values.
  • Add new datasets to the package like Febrl datasets
  • Move Krebsregister dataset to this package.
  • Improve and add some tests
  • Various documentation updates

Version 0.3.1: Fix installation bug

15 Jun 18:32
Compare
Choose a tag to compare
v0.3.1

Fix problems with installing with pip

Version 0.3 (11 June 2016)

11 Jun 11:36
Compare
Choose a tag to compare

This version contains a lot of changes to the API. Hopefully, there are no large API changes needed for now.

  • Total restructure of compare functions (The end of changing the API is close to now.)
  • Compare method numerical is now named numeric and fuzzy is now named string.
  • Add haversine formula to compare geographical records.
  • Use numexpr for computing numeric comparisons.
  • Add step, linear and squared comparing.
  • Add eye index method.
  • Improve, update and add new tests.
  • Remove iterative indexing functions.
  • New add chunks for indexing functions. These chunks are defined in the class Pairs. If chunks are defined, then the indexing functions returns a generator with an Index for each element.
  • Update documentation.
  • Various bug fixes.