User-paid offload storage available at UPPMAX

Since this spring, we have been offering the Lutra offload storage system suitable for "cold" data which is not accessed or changed frequently, but which still has to be kept available on our clusters. Due to the popularity of this service, the UPPMAX board wants to offer current PIs at UPPMAX the possibility to sign up for additional storage. This time, it will also be possible to buy storage suitable for sensitive personal data.  Read the full article for more details

Since this spring, we have been offering the Lutra offload storage system at
UPPMAX. This solution is intended for "cold" data which is not accessed or
changed frequently, but which still has to be kept available on our clusters.
This is in contrast to the active project storage provided by SNIC, which
is only intended for current analyses.

Due to the popularity of this service, the UPPMAX board wants to offer current
PIs at UPPMAX the possibility to sign up for additional storage. This time,
it will also be possible to buy storage suitable for sensitive personal data.

For data mounted at Rackham, the price will be 500 SEK/TB/year, and users
will have to commit in units of 50 TB for a period of 4 years. That is, the
minimum cost is 100 000 SEK for storing 50 TB for 4 years (paid in multiple
installments per year).

The Rackham cluster, and hence its offload storage, is unsuitable for
sensitive data of the sort processed on Bianca. The processing of sensitive
data is more time-consuming for our staff. It also requires different service
contracts with our vendors, and includes a more expensive encrypted and
physically secured solution for tape backup. Thus, the price will be
800 SEK/TB/year for the sensitive system, i.e. a minimum of 160 000 SEK for
50 TB over 4 years.

The intent is to order the hardware in late December with the systems online in
February. A signed agreement with approval from your head of department will
be needed before we order the hardware.

If you are interested, contact support@uppmax.uu.se, UPPMAX Technical
Coordinator carl.nettelblad@uppmax.uu.se, or UPPMAX Director
elisabeth.larsson@uppmax.uu.se

FREQUENTLY ASKED QUESTIONS
--------------------------
Q: How is this storage different from existing storage, e.g. Crex and Castor?
A: The normal UPPMAX storage systems are intended for active project data,
  i.e. the data which is needed during the course of a project. You have to
  justify your storage needs in your project applications and storage can be
  rationed when we run out. The storage itself is paid for through SNIC in
  that case. When we run out of space on these resources, we have to be more
  aggressive in urging users to limit their storage needs.

  This storage solution is provided by us, but paid for by its users. We will
  not question your needs to store data up to your quota. However, since it is
  not intended for active project data, the performance of the solution is
  tuned for large capacity, not a high amount of write operations. If you need
  that, you should still apply for project storage.

Q: What kind of data can I put there?
A: The kind of use cases we see are storing various large data sets from old
  projects. This can include the primary results from specific experiments.
  If you ever need to re-analyze the data, you'll have it readily available
  on our clusters. On the resource mounted on Rackham, you are not allowed to
  store sensitive data, with the same interpretation of that concept as is
  currently used for computation and storage projects allocations. Typical
  examples of sensitive data we encounter are personally identifiable data
  from population registries, health information systems, and biomolecular
  assays (including genomic data).

Q: I don't need 50 TB. Why don't you offer a smaller volume?
A: We have chosen this limit to keep both the technical and financial
  administration cost-efficient. Even at this price point, a substantial part
  of our costs are staff costs for maintaining the solution and providing user
  support.

Q: What will the availability be like? My data is super-critical.
A: We will maintain the same level of availability we do for other UPPMAX
  resources, that is a best effort intent to maintain continuous operations,
  with monthly service windows. An outage outside of office hours will in
  general not start to be addressed until the next working day. If you need to
  ensure immediate access to the data under all circumstances, we recommend
  that you choose another solution.

Q: What happens if the hardware breaks down?
A: We will have redundancy within the solution, so failure of individual disks
  will not affect user data. In addition, data will be backed up on tape at an
  off-site location.

Q: My budget is already set, I can't pay for this now, but I want to join. What
  do I do?
A: This is our second call of this sort. For now, we plan to continue making
  such calls once a year. Come back in the end of 2020.

Q: How does this relate to other storage offerings and future rules and
  solutions for long-term research data storage?
A: We are currently trying to serve a very concrete need for users that have
  data that cannot easily be considered active project data, but where the
  natural place to access the data, if it is ever needed again, would be
  UPPMAX. In those cases, we think it is better to provide a common solution,
  rather than individual groups buying and maintaining smaller storage systems.
  In addition, our solution will be directly connected to our core network.
  Even though it's a high-capacity solution, rather than a high-performance
  solution, it will give higher bandwidth to our clusters than any solution
  placed outside of our computer room.

  The technical and organizational frameworks for true long-term storage of
  research data will hopefully be clarified in the coming years, but we
  believe there will still be some need for keeping data close to
  computational resources, but outside of truly active project storage.
  This should not be considered a replacement for permanent archival and
  metadata tagging of data.

Q: When will this be available?
A: We intend to get agreements signed with all users and order the hardware
  during December. The solution should be fully online during February, if
  there are no significant delays from our vendors. Since this is the second
  time we are doing this, the process is expected to be smooth.

Q: Can we be sure that the storage is going to be installed?
A: We assess the total interest for the two classes of data (sensitive and
  non-sensitive) separately, since the hardware and software setup is
  different. We will need to collect allocations in the range of 1 PB
  in order to be able to go ahead with each system. This ensures that we
  achieve a sufficient economy of scale to be able to offer our pricing model.
  Judging from previous interest, we believe it is likely that we will
  reach that point.

Q: What happens after four years?
A: The technology and organization landscape is always fluid, but after four
  years the hardware will have reached its lifetime. At that point, you will
  be free to retrieve the data over e.g. ssh. If there is no obvious
  replacement solution provided by another entity, it is likely that we will
  offer another 4-year contract at a similar price point.

FOLLOW UPPSALA UNIVERSITY ON

Uppsala University on Facebook
Uppsala University on Instagram
Uppsala University on Youtube
Uppsala University on Linkedin