Analytic engines that factor in security labels

[A version of this post appears on the O’Reilly Strata blog.]

Originated by the NSA, Apache Accumulo is a BigTable inspired data store known for being highly scalable and for its interesting security model. Federal agencies and Defense contractors have deployed Accumulo on clusters of a thousand or more servers. It also uses “cell-level” security to control access to values stored in individual cells1.

What Accumulo was lacking were easy-to-use, standard analytic engines that allow users to interact with data. The release of Sqrrl Enterprise this past week fills that gap. Sqrrl Enterprise provides an initial set of analytic engines for the Accumulo ecosystem2. It includes support for interactive SQL, fulltext search, and queries over graph data. Each of these engines takes into account security labels placed on data: since every data object ingested into Sqrrl has a security label, (query & analytic) results incorporate those access levels. Analysts interact with data as they normally would. For example Sqrrl’s indexing technology accounts for security labels, and search queries are written in standard Lucene syntax. Reminiscent of the Phoenix project for HBase3, SQL queries4 in Sqrrl are converted into optimized Accumulo iterators.

As I’ve pointed out in recent posts, analytic engines are the natural next step after building a scale-out data store with batch processing capability. Application frameworks like Kiji can then leverage those engines to simplify the app development process. Sqrrl is building analytic capabilities without sacrificing Accumulo’s unique5 security model. It certainly seems like a natural fit for industries (like health care) where privacy is central. I’m just glad that data stores of all stripes are rolling out these basic engines in earnest.

Related posts:

  • HBase looks more appealing to data scientists
  • It’s getting easier to build Big Data applications
  • Improving options for unlocking your graph data

  • (1) In contrast, many data stores can only restrict what columns or rows users can access.
    (2) Sqrrl Enterprise is a commercial product built on top of Accumulo. So the analytic tools described above aren’t technically freely available to users of Apache Accumulo. I think the HBase ecosystem has an advantage in this regard: their tools are available to all HBase users.
    (3) Phoenix turns SQL into optimized native HBase calls.
    (4) The current version of Sqrrl Enterprise does not yet support “joins” or “subselects”.
    (5) Until other data stores implement some version of cell-level security, Accumulo has a distinguishing feature.

    Leave a Reply

    Please log in using one of these methods to post your comment:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s