Third Party Auditing for the Archistar Multi-Cloud Storage

Abstract

The significant costs and processing benefits are attracting individuals and organizations to outsource their data to third party service providers. As a result, there is a high chance to concern about data security and privacy. In this case, the users have to check if the data is stored properly on the distributed storage systems. A solution is to engage a third party auditor (there is no need to be trusted) for performing the auditing process. In addition, the auditing mechanism should ensure the user regarding the data privacy, even if the third party auditor co-operates with a set or subset of storage servers. Up to the writing time of this thesis, third party auditing mechanism exist only for single server storage systems and not for distributed storage systems. In this thesis, we will implement the auditing protocol in JAVA language by using four applications (client, server, share, reconstruct) and evaluate its performance. The protocol we will implement is based on any homomorphic secret sharing scheme, which is information-theoretically private, efficient and keyless. In addition, it is backward compatible with existing secret sharing based storage solutions. The results of this thesis are the first implementation of a new kind of privacy preserving remote data checking for dispersed storage systems.

Motivation

Cloud computing is becoming day by day very popular. The main advantage of cloud computing is that its resources are available whenever you need them and you pay on the base on how much you use it. Internet is the most common way for accessing cloud computing resources. In the case of private cloud, intranet is used. However, it is really hard for the cloud storage customers to secure the data because they do not have direct control of their data storage. There are some encryption methods that we will discuss but still the risk of losing data is always there. One solution is to encrypt the data before putting it to the cloud storage so that companies get back direct control of the stored data on remote servers. When applying encryption, businesses that decide to migrate to cloud should consider two options: the need for encrypting the data before storing it to the cloud or choose a SaaS provider that will manage the encryption and decryption key of the corporate data. In the latter case the solution is rather non-effective, because cloud service provider manages the keys and therefore has still full access to the stored data. On the other hand, if data is encrypted before being uploaded to a cloud storage provider, it is protected from provider access, however, if that data is then needed on a mobile or remote device that does not already have the decryption key, the resulting download will be useless encrypted data. Furthermore, encrypting data is computationally expensive and also introduces complexity, because typically requires the use of both private key and the public key encryption. Systems that perform the computational heavy lifting must always have available resources. Another key to remember about encryption is that security of the data becomes the security of the encryption key. Losing the key is equal to losing the data. Likewise, encrypting the data before storing it in the cloud will never be information-theoretically secure. Features which are required for long-term secure data archiving. We have another option instead of encryption which shares fragments of data and distributes them to multiple servers by using threshold secret sharing techniques. Threshold secret sharing schemes protect the integrity and secrecy of information by distributing the information over different locations. The main working principle of threshold secret sharing technique is that data is shared among n users in such a way that it can be reconstructed easily from any k or more shares, but the knowledge of (k − 1) or fewer shares gives no information about data. A disadvantage of the outsourced storage systems is that one should fully trust the storage nodes that have the data inside. Proofs of Retrievability and Proofs of Data Possession have been introduced. Proof of Retrievability (PoR), is a proof by a cloud service provider to client (auditor) that file F can be fully recovered by client server in a different country. Proofs of Data Possession (PoDP) scheme provides an efficient audit to convince a client that his/her file is available at the storage server, ready for retrieval when needed. This can be done by an auditor instead of the data owner itself. Moreover, it is not allowed for the auditor to learn any additional information about the stored data.

Methodological approach

This thesis aims at the implementation and evaluation of a new kind of privacy preserving remote data checking for dispersed storage systems. The results of the thesis will further be integrated in an existing open source software prototype and therefore have a substantial impact and broad distribution. This thesis makes use of the additive homomorphic properties of linear secret sharing, i.e., Shamir’s scheme in our case, to realize a remote data checking protocol. The new key points of this protocol is that it is keyless and privacy preserving. Moreover, this scheme offers additional advantages as increased flexibility, compatibility with existing secret sharing based storage solutions as well as the ones using Reed-Solomon based erasure coding, compatibility with proactive security steps in storage systems. By introducing the auditing scheme, we achieve a constant communication complexity even when auditing an arbitrary large set of shares simultaneously. The complexity on the server’s side is essentially given by computing a sum over the audited data shares. On the auditor’s side, the computation consists of one degree-t interpolation polynomial of n values, and is therefore fully independent of the batch size. The theoretical part of the thesis, consisting of chapters 2 and 3 is relating to the representation of an overview of cloud computing. We will give a representation of the cloud computing models and which are the benefits of using them. Moreover, a detailed explanation of the three biggest cloud service providers will be presented. Lastly, the basic principles of how the remote data auditing works on different cloud service providers will be described. The practical part of the thesis, consisting of chapter 4 is related to implementation and evaluation of the new remote data auditing protocol using JAVA language and deploying it on Amazon EC2 web service. Furthermore, we will give a evaluation of the implemented protocol and a table of values which will represent the performance of the protocol and will be compared with existing research. The last chapter will give the conclusion of this thesis.

Outline of the thesis

Step  1 will give an overview of the cloud computing technology, advantages, disadvantages and how different cloud service providers deal with the third party auditing.

Step 2 will present the Distributed Storage Systems, ARCHISTAR Dispersed System, Remote Data Checking and a proposed third party auditing protocol.

Step 3 will give the results of implementation and evaluation.

Step 4 summarizes the contribution of the thesis and presents the conclusions

Overview scheme of the proposed auditing algorithm

The main issue in the design of auditing protocol is data privacy problem. This
comes as a result of the fact that for public data, the auditor might get the data information by recuperating data blocks from data proof. In this thesis, we will overview and present a third party auditing protocol  for distributed storage which is flexible, information-theoretically private, efficient, and auditiable distributed storage system which is consistent with actual storage solutions.Moreover, the proposed scheme  is entirely keyless. We have a constant communication complexity in our scheme. The computational complexity on the server’s side is given by computing a sum over the audited data shares, plus a constant overhead which is independent of the number of shares to be batch audited. On the auditor’s part it consists of one degree t-interpolation with n-values. Furthermore, the scheme is very flexible because it does not require additional preprocessing or keys. It is consistent with actual Shamir based storage solutions. Moreover, this proposed scheme is compatible with proactive security steps in storage systems. Lastly, customers are prevented to distribute inconsistent data to the storage servers in order to have a practical storage system, which can be achieved by using verifiable secret sharing schemes. The proposed audit protocol consists of four steps as showed in figure below:

  1. The client shares the random values when uploading the files, so that they
    can be used later during the audit. The final random r value is the sum
    of all shares. After generating this random number value, Shamir secret
    sharing is applied to it and then every server will have a share p i of Shamir
    share of random value r.
  2.  Auditor will broadcast a challenge c to the servers.
  3. After these operations the server will compute the s i = c * σ_ i + p_i .
    With σ i are defined the shares that every server will get as s result of
    Shamir secret share applied to the file.
  4. If all the s_i are consistent, the auditor will accept the data integrity.

Software Architecture

Implementation of Auditing protocol is done on JAVA programming language.
We will use four applications respectively called: audit client.java, audit server.java, share.java and reconstruct.java. Figure below represents the architecture of the third party auditing protocol.

Firstly, on the audit server.java application we have four methods as
described below.

  1. randomShare() method will generate the jointly random number. It will
    generate a set of shares from the random number and as an input we will
    have number of servers, number of required servers in order to reconstruct
    a secret and the output will be a list of shares as raw bytes. In a nut-
    shell, this method creates n random numbers and for each random number
    applies SSS algorithm.
  2.  evaluate() method will evaluate the polynomial.
  3.  sumofShares() method will divide the shares gained from applying secret
    sharing to a random number. It will contain the results of p i .
  4. summation() method will compute the s i =∑c * σ i + p i formula, where
    with σ i are defined the shares that every server will receive, p i determines
    shares of random value r and c determines the challenge.

Secondly, on the audit client.java application we have two methods as
described below.

  1. challenge() method the auditor will generate a random number c, and
    every server will get this challenge number e.g first server will get c, second
    server will get s_2 and so on. In our case it will be until 11-th server, because
    this is the maximu size of the array which represent the list of servers.
  2. getPolynomialCoeff() method will define if the auditor will accept

Thirdly, on the share.java application we have two methods as described
below.

  1. calculateThresholdScheme() method will generate a set of shares from
    the secret provided. As an input we will have the secret, number of shares
    to generate, number of shares in order to reconstruct the secret. In the
    output we will have a list of shares as raw bytes.
  2.  evaluate() method will multiply the shares of provided secret with the
    power.

Lastly, on the reconstruct.java application we have three methods as
described below.

  1.  recoverSecret() method will require ’threshold’ shares in order to re-
    construct correctly a secret. On the input will be a list of shares as raw
    bytes, and at it will give as an output the secret.
  2.  production() method, we will call this our next method, which is called
    LagranP LagranP().
  3. LagranP() method will multiply value of our polynomial which is taken
    back by calculateThresholdScheme() method and an index.

Conclusions

We gave a detailed explanation of the first privacy preserving third party auditing protocol for distributed storage systems. The protocol is based on any homomorphic secret sharing scheme and is information-theoretically private. It also supports arbitrary and allows arbitrary modification on the stored data. Moreover, the challenge that we tried to solve was the implementation of this new kind of privacy preserving remote data checking protocol for dispersed storage systems. The protocol was implemented in JAVA language by using four applications (audit client.java, audit server.java, share.java and reconstruct.java) and also evaluated. The protocol that we implemented will have a substantial; impact and broad distribution because it is flexible, compatible with existing Shamir based storage solutions,and compatible with proactive security steps in the storage systems.

Contact me, if you would like to read the full version of my master thesis.

Leave a Reply

Your email address will not be published. Required fields are marked *