User-paid offload storage available at UPPMAX
Since this spring, we have been offering the Lutra offload storage system suitable for "cold" data which is not accessed or changed frequently, but which still has to be kept available on our clusters. Due to the popularity of this service, the UPPMAX board wants to offer current PIs at UPPMAX the possibility to sign up for additional storage. This time, it will also be possible to buy storage suitable for sensitive personal data. Read the full article for more details
Since this spring, we have been offering the Lutra offload storage system at
UPPMAX. This solution is intended for "cold" data which is not accessed or
changed frequently, but which still has to be kept available on our clusters.
This is in contrast to the active project storage provided by SNIC, which
is only intended for current analyses.
Due to the popularity of this service, the UPPMAX board wants to offer current
PIs at UPPMAX the possibility to sign up for additional storage. This time,
it will also be possible to buy storage suitable for sensitive personal data.
For data mounted at Rackham, the price will be 500 SEK/TB/year, and users
will have to commit in units of 50 TB for a period of 4 years. That is, the
minimum cost is 100 000 SEK for storing 50 TB for 4 years (paid in multiple
installments per year).
The Rackham cluster, and hence its offload storage, is unsuitable for
sensitive data of the sort processed on Bianca. The processing of sensitive
data is more time-consuming for our staff. It also requires different service
contracts with our vendors, and includes a more expensive encrypted and
physically secured solution for tape backup. Thus, the price will be
800 SEK/TB/year for the sensitive system, i.e. a minimum of 160 000 SEK for
50 TB over 4 years.
The intent is to order the hardware in late December with the systems online in
February. A signed agreement with approval from your head of department will
be needed before we order the hardware.
If you are interested, contact support@uppmax.uu.se, UPPMAX Technical
Coordinator carl.nettelblad@uppmax.uu.se, or UPPMAX Director
elisabeth.larsson@uppmax.uu.se
FREQUENTLY ASKED QUESTIONS
--------------------------
Q: How is this storage different from existing storage, e.g. Crex and Castor?
A: The normal UPPMAX storage systems are intended for active project data,
i.e. the data which is needed during the course of a project. You have to
justify your storage needs in your project applications and storage can be
rationed when we run out. The storage itself is paid for through SNIC in
that case. When we run out of space on these resources, we have to be more
aggressive in urging users to limit their storage needs.
This storage solution is provided by us, but paid for by its users. We will
not question your needs to store data up to your quota. However, since it is
not intended for active project data, the performance of the solution is
tuned for large capacity, not a high amount of write operations. If you need
that, you should still apply for project storage.
Q: What kind of data can I put there?
A: The kind of use cases we see are storing various large data sets from old
projects. This can include the primary results from specific experiments.
If you ever need to re-analyze the data, you'll have it readily available
on our clusters. On the resource mounted on Rackham, you are not allowed to
store sensitive data, with the same interpretation of that concept as is
currently used for computation and storage projects allocations. Typical
examples of sensitive data we encounter are personally identifiable data
from population registries, health information systems, and biomolecular
assays (including genomic data).
Q: I don't need 50 TB. Why don't you offer a smaller volume?
A: We have chosen this limit to keep both the technical and financial
administration cost-efficient. Even at this price point, a substantial part
of our costs are staff costs for maintaining the solution and providing user
support.
Q: What will the availability be like? My data is super-critical.
A: We will maintain the same level of availability we do for other UPPMAX
resources, that is a best effort intent to maintain continuous operations,
with monthly service windows. An outage outside of office hours will in
general not start to be addressed until the next working day. If you need to
ensure immediate access to the data under all circumstances, we recommend
that you choose another solution.
Q: What happens if the hardware breaks down?
A: We will have redundancy within the solution, so failure of individual disks
will not affect user data. In addition, data will be backed up on tape at an
off-site location.
Q: My budget is already set, I can't pay for this now, but I want to join. What
do I do?
A: This is our second call of this sort. For now, we plan to continue making
such calls once a year. Come back in the end of 2020.
Q: How does this relate to other storage offerings and future rules and
solutions for long-term research data storage?
A: We are currently trying to serve a very concrete need for users that have
data that cannot easily be considered active project data, but where the
natural place to access the data, if it is ever needed again, would be
UPPMAX. In those cases, we think it is better to provide a common solution,
rather than individual groups buying and maintaining smaller storage systems.
In addition, our solution will be directly connected to our core network.
Even though it's a high-capacity solution, rather than a high-performance
solution, it will give higher bandwidth to our clusters than any solution
placed outside of our computer room.
The technical and organizational frameworks for true long-term storage of
research data will hopefully be clarified in the coming years, but we
believe there will still be some need for keeping data close to
computational resources, but outside of truly active project storage.
This should not be considered a replacement for permanent archival and
metadata tagging of data.
Q: When will this be available?
A: We intend to get agreements signed with all users and order the hardware
during December. The solution should be fully online during February, if
there are no significant delays from our vendors. Since this is the second
time we are doing this, the process is expected to be smooth.
Q: Can we be sure that the storage is going to be installed?
A: We assess the total interest for the two classes of data (sensitive and
non-sensitive) separately, since the hardware and software setup is
different. We will need to collect allocations in the range of 1 PB
in order to be able to go ahead with each system. This ensures that we
achieve a sufficient economy of scale to be able to offer our pricing model.
Judging from previous interest, we believe it is likely that we will
reach that point.
Q: What happens after four years?
A: The technology and organization landscape is always fluid, but after four
years the hardware will have reached its lifetime. At that point, you will
be free to retrieve the data over e.g. ssh. If there is no obvious
replacement solution provided by another entity, it is likely that we will
offer another 4-year contract at a similar price point.