close
The Wayback Machine - https://web.archive.org/web/20201010041705/https://github.com/graknlabs/grakn
Skip to content
master
Go to file
Code

Latest commit

## What is the goal of this PR?

We fixed an issue in import where, under rare conditions, the cache would lookup an incorrect concept in the final stage of importing and cause the import to fail if this concept was not a valid attribute (or produce incorrect results if it was).

At the end of the import, we insert any ownerships that could not be inserted earlier (because one of the ends, owner or attribute, were not yet encountered in the data). During this step, we use a local cache to look up any already "live" attributes that used in the same transaction to avoid reading the same attribute through Concept API repeatedly.

The bug was that the lookup ID for missing attribute ownerships was being translated from an "original" ID to an "imported" ID before lookup, but the cache works using the "original" ID to save unnecessary translation. Fortunately, whilst the bug itself could produce incorrect results, it was unlikely to occur commonly. The translated ID was also being used to cache attributes, so any attributes loaded at this stage would be cached and looked up correctly.

The migration would fail if there was an ID collision between the "original" ID of an entity or relation and the "new" ID of an attribute. The migration would potentially succeed but with incorrect results if an attribute was cached during the phase before the final missing ownerships phase where its "original" ID collided with a "new" attribute ID during the same batch. The collisions would be uncommon in most cases, and very uncommon for the attribute case that might go undetected. Since the cache is cleared between transactions, the probability would also be limited (not scaling with data set size beyond a point.)

## What are the changes implemented in this PR?

- Fix the cache lookup to be by original ID for missing ownership insertions.
55b9a69

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
bin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
kb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

GRAKN.AI


CircleCI GitHub release Discord Discussion Forum Stack Overflow Stack Overflow

Building intelligent systems starts at the database. Grakn is an intelligent database: a knowledge graph engine to organise complex networks of data and make it queryable.

Get Started Documentation Discussion
Whether you are new to coding or an experienced developer, it’s easy to learn and use Grakn. Get set up quickly with quickstart tutorial. Documentation for Grakn’s development library and Graql language API, along with tutorials and guides, are available online. Visit our documentation portal. When you’re stuck on a problem, collaborating helps. Ask your question on StackOverflow or discuss it on our Discussion Forum.

Meet Grakn and Graql

Grakn is an intelligent database: a knowledge graph engine to organise complex networks of data and making it queryable, by performing knowledge engineering. Rooted in Knowledge Representation and Automated Reasoning, Grakn provides the knowledge foundation for cognitive and intelligent (e.g. AI) systems, by providing an intelligent language for modelling, transactions and analytics. Being a distributed database, Grakn is designed to scale over a network of computers through partitioning and replication.

Under the hood, Grakn has built an expressive knowledge representation system based on hypergraph theory (a subfield in mathematics that generalises an edge to be a set of vertices) with a transactional query interface, Graql. Graql is Grakn’s reasoning (through OLTP) and analytics (through OLAP) declarative query language.

Knowledge Schema

Grakn provides an enhanced entity-relationship schema to model complex datasets. The schema allows users to model type hierarchies, hyper-entities, hyper-relationships and rules. The schema can be updated and extended at any time in the database lifecycle. Hyper-entities are entities with multiple instances of a given attribute, and hyper-relationships are nested relationships, cardinality-restricted relationships, or relationships between any number of entities. This enables the creation of complex knowledge models that can evolve flexibly.

Logical Inference

Grakn’s query language performs logical inference through deductive reasoning of entity types and relationships, to infer implicit facts, associations and conclusions in real-time, during runtime of OLTP queries. The inference is performed through entity and relationship type reasoning, as well as rule-based reasoning. This allows the discovery of facts that would otherwise be too hard to find, the abstraction of complex relationships into its simpler conclusion, as well as translation of higher level queries into the lower level and more complex data representation.

Distributed Analytics

Grakn’s query language performs distributed Pregel and MapReduce (BSP) algorithms abstracted as OLAP queries. These types of queries usually require custom development of distributed algorithms for every use case. However, Grakn creates an abstraction of these distributed algorithms and incorporates them as part of the language API. This enables large scale computation of BSP algorithms through a declarative language without the need of implementing the algorithms.

Higher-Level Language

With the expressivity of the schema, inference through OLTP and distributed algorithms through OLAP, Grakn provides strong abstraction over low-level data constructs and complicated relationships through its query language. The language provides a higher-level schema, OLTP, and OLAP query language, that makes working with complex data a lot easier. When developers can achieve more by writing less code, productivity rate increases by orders of magnitude.

Download and Running Grakn Core

To run Grakn Core (which you can download from the Download Centre or GitHub Releases), you need to have Java 8 (OpenJDK or Oracle Java) installed.

You can visit the Setup Guide to help your installation.

Compiling Grakn Core from Source

Note: You don't need to compile Grakn Core from source if you just want to use Grakn. See the "Download and Running Grakn Core" section above.

  1. Make sure you have the following dependencies installed on your machine:
    • Java 8
    • Python >= 2.7 and Pip >= 18.1
    • Bazel. We use Bazelisk to manage Bazel versions which runs the build with the Bazel version specified in .bazelversion. In order to install it, follow platform-specific guide:
      • macOS (Darwin): brew install bazelbuild/tap/bazelisk
      • Linux: wget https://github.com/bazelbuild/bazelisk/releases/download/v1.4.0/bazelisk-linux-amd64 -O /usr/local/bin/bazel
  2. Depending on your Operating System, you can build Grakn with either one of the following commands:
$ bazel build //:assemble-linux-targz

Outputs to: bazel-bin/grakn-core-all-linux.tar.gz

$ bazel build //:assemble-mac-zip

Outputs to: bazel-bin/grakn-core-all-mac.zip

$ bazel build //:assemble-windows-zip

Outputs to: bazel-bin/grakn-core-all-windows.zip

Contributions

Grakn Core is built using various state-of-the-art open-source Graph and Distributed Computing frameworks: ANTLR, Apache Cassandra, Apache Hadoop, Apache Spark, Apache TinkerPop, Bazel, GRPC and JanusGraph. Thank you!

Licensing

This product includes software developed by Grakn Labs Ltd. It's released under the GNU Affero GENERAL PUBLIC LICENSE, Version 3, 29 June 2007. For license information, please see LICENSE. Grakn Labs Ltd also provides a commercial license for Grakn Enterprise KGMS - get in touch with our team at enterprise@grakn.ai.

Copyright (C) 2020 Grakn Labs

You can’t perform that action at this time.