TLDR: MathWorks uses a cumbersome and error-prone DRM called MDCS to control the restrictions on their Parallel Computing Toolbox software for Matlab. The MDCS product is an abuse of the vendor lock-in that occurs when researchers invest years developing Matlab-based research code in an environment with gratis access to Matlab supplied by institutional licenses, and are subsequently squeezed for additional licensing fees for their (ostensibly) already licensed product when they need to run their code in parallel at scale. Research is harmed by lost time and/or unanticipated costs incurred directly because of Mathworks’ aggressive and unethical licensing strategies for MDCS.
Imagine that you are a young researcher who invests a great deal of time learning Matlab and creating Matlab codes for your research. The costs of using Matlab are covered by an institutional license provided by your University, or company so that you don’t have to think about them. As you grow as a researcher you tackle harder and harder computational problems that require greater and greater parallel computational power. After working very hard to develop your parallel code using Matlab’s Parallel Computing Toolbox software, you are ready to tackle your problem at scale: 20 cores, 100 cores, 1000 cores, 10,000 cores, depending on the hardware available to you, the nature of your research question, and the capabilities of your competitors. But, you encounter a problem that you didn’t expect: your institution has not paid to run parallel Matlab code at large scale and are unwilling to purchase the necessary licenses. You discover that you must pay unexpected costs of doing research with Matlab, or go through the tremendous efforts of porting your code to another technology.
The issue is MathWorks’ attitude toward distributed computing. Their attitude tells them that the cost of the software license should increase with the high performance computing capacity of their product. They have opted to go the route of per-worker licensing fees and technological restrictions on their product, both of which hurt the researchers who use their products and their research. While MathWorks may have identified an opportunity to create revenue by selling greater computational capacity, the additional costs and inconvenience to researchers does not reflect any additional development effort or cost to Mathworks. Research is pointlessly held hostage by Mathworks’ restrictive licensing policies.
I hope that this product review of Matlab Distributed Computing Server (MDCS) can be useful to everyone who makes decisions related to research computing software, both individual researchers who choose their development platforms and those responsible for institutional licenses. The issues with MDCS should not only be considered by people currently considering purchasing or using MDCS, but by any Matlab user who believes that their Matlab code may someday be used on a distributed computing platform. Moreover, many of the problems that I point out with MDCS are not unique to MDCS or Matlab, and some are fairly common and accepted software licensing practices that should nevertheless be questioned and criticised as they can have important impacts on research computing.
Who is paying for Matlab?
In my opinion, one of the core issues that has lead to this problem is the question of who pays for Matlab software licenses. In some cases, this cost is paid directly by the researchers who work directly with Matlab. However, in many (possibly most) cases, users are covered under various types of institutional licenses paid for by Universities, Colleges, Businesses and other organizations. The result is that the people who are responsible for the research, are often unaware of the costs of choosing to do that research with Matlab. Many institutions do not make the details of the license or the costs involved known to the researchers. From the researchers’ perspectives, the costs are covered and their use of Matlab is gratis. Often, the institutions negotiate licenses that contain many toolboxes and sometimes infinite simultaneous licenses for Matlab products. For researchers, this can create a harmful impression of freedom and obfuscate the costs and restrictions associated with using MathWorks products.
And so, students and young researchers invest their educational efforts into gaining Matlab skills and their research efforts into developing Matlab codes, unaware that they are doing something very expensive that may ultimately restrict their capacity to do research. Naturally, MathWorks encourages researchers to take advantage of these institutional licenses by promoting themselves on University campuses, and organizing free training/marketing workshops for their products. The result is a large number of research careers and research programs that are worryingly dependent on MathWorks products. So, there is a vendor lock-in problem. And then Mathworks abuses this lock-in with MDCS.
Licensing options for parallel Matlab
Most computational research programs grow in scope and eventually evolve to a point where the problems become too large to solve with serial code, and parallel computation capabilities become necessary. So, what options are available for a Matlab user to address this reality?
The first step is to invest in Matlab’s Parallel Computing Toolbox. This toolbox is currently listed on Mathworks’ website for $1000 on top of your existing $2150 standard individual Matlab license. The parallel toolbox gives you easy-to-use, high level constructs for unleashing parallelism in your Matlab programs. This is a good product if you don’t mind the price tag and the proprietary restrictions. Purchasing the toolbox does increase Matlab’s capabilities beyond the standard version, and the toolbox represents real development effort on MathWorks’ part to create the high-level parallel constructs and tools.
The drawback of the Parallel Computing Toolbox is that it contains digital restrictions that only permit it to operate with 12 cores or fewer on a single computer.
The reason for this limitation is not related to the technical implementation of the parallel toolbox. Currently, MathWorks uses an MPI distribution called MPICH2 (version 1.4.1p1). MPI (Message Passing Interface) is an open standard for distributed, parallel computing, and MPICH2 is an open source implementation of that standard which is released under a license which permits its (gratis) use for commercial, closed-source software products like Matlab. Matlab uses MPICH2 to implement the parallelism in the parallel toolbox, and this free software is designed with scalability on multi-node computer clusters in mind.
The reason for the technical limitation of 12 cores in the Matlab Parallel Computing Toolbox is a business decision by MathWorks to charge licensing fees based on the amount of parallel computing capability their users wish to unlock. They may have chosen other ways to inconvenience users into paying additional fees: restrictions based on maximum memory, restrictions based on maximum lines of source code, restrictions based on maximum number of variables used. A maximum number of workers is equally as arbitrary but is somehow easier to market in a culture where we have been trained to confuse software with private material goods that can’t be trivially copied as many times as we need.
MDCS does not really add functionality to Matlab so much as it partially removes technical limitations in the Parallel Computing Toolbox software. The Matlab Parallel Computing Toolbox (a pre-requisite product for using MDCS) contains the technology to provide scalable parallel computing capability to as many nodes as one can afford to put into a cluster, but this capability is intentionally restricted with technical measures. The design, and the purpose of MDCS is to limit the parallel scalability of Matlab software, not to provide it (MPICH2, the free software that it is built on, provides it). Therefore, I refer to MDCS as digital restrictions management (DRM) software.
The costs of MDCS licenses are not publicly available on the Mathworks website and are only available by requesting a customized quote. To give a ballpark idea, licensing for hundreds of workers (i.e. a medium sized computer cluster) will cost tens of thousands of dollars, with thousands of dollars of ongoing annual fees.
Is MDCS good software?
Unfortunately, in designing a product whose primary design goal is the enforcement of licensing restrictions, MathWorks has compromised their product’s usability and introduced a number of unnecessary and frustrating points of failure. The product that researchers deserve from MathWorks is a Parallel Computing Toolbox with no licensing or technical restrictions on the number of nodes or workers that can be used, so that Matlab users simply have a high-level interface to the MPI libraries. The MDCS product falls short of this hypothetical product not only in having restrictions on the number of workers that can be used, but also in terms of usability and stability.
With MDCS, you do not log in to the HPC cluster and submit your jobs as you would with other software. Instead, you open Matlab on your laptop or desktop computer, read the HPC site’s documentation on how to configure your computer to submit to the cluster, download integration scripts and install them onto your computer, set-up metadata folders both on your computer and on the cluster’s file system, and finally you are able to submit jobs from Matlab, provided that you did not make any mistakes. This is a poor software model for an HPC environment because the configuration, job logs, and metadata used by the software is distributed between two different systems controlled by two different groups of people (the HPC system, and the submission system). This model not only ensures that the number of points of failure is increased, but also that when a problem arises, investigating and solving it requires a greater effort coordinated between multiple parties. MDCS results in a significant overhead in labour to operate compared to a hypothetical Matlab product with an unrestricted Parallel Computing Toolbox.
MDCS uses metadata about the submitted jobs that are stored both on the cluster’s file system and on the user’s personal computer. This metadata must be synchronized between the two different systems. If the user wants to submit jobs from a different computer, or from a different version of Matlab on the same computer, this can cause corruption of the metadata. If the user submits jobs to a second MDCS system, the user must carefully manage two separate sets of integration scripts, and must also be careful to avoid corruption of the metadata. The corrupted metadata does not produce straight-forward errors, but rather strange behaviours and the presentation of misinformation to the user, and it is not always obvious that a problem has occurred. Some of these issues can be alleviated by taking special care to set up separate metadata folders for each combination of computer, Matlab version, and cluster that the user wishes to use. To switch between different sets of metadata folders, the user has to modify information in two places: the integration scripts for the target cluster, and the cluster profile. These metadata issues are not yet documented by Mathworks, and it is up to users to discover them by trial and error.
MDCS is also difficult to maintain from the HPC staff’s perspective, compared to ‘normal’ software. To modify some aspects of the software configuration on the cluster, simultaneous changes must be made to the MDCS configuration on the HPC system, as well as the integration scripts living on separate hard drives of many users. It can therefore be very difficult for HPC staff to deliver a reliable experience to researchers because improving the MDCS configuration may mean breaking their workflow of every user and forcing them to upgrade their local configurations before their jobs can run.
For many users, any type of distributed computing can seem complex and error-prone relative to regular desktop computing. An unrestricted Matlab Parallel Computing Toolbox could be an accessible entry point to distributed computing for many Matlab users, in addition to being a high-productivity research platform. It would not have the problems that I have described above and it would even require less development effort from Mathworks. However, instead of making distributed computing less complex, and less error-prone, MathWorks have done the opposite with MDCS.
The above-described problems with MDCS’s design makes it a poor choice for users of other programming languages looking for distributed computing tools, so I assume that these users aren’t the target market for MDCS given the high quality of available alternatives and the high costs of MDCS. Rather, MDCS is a product designed to extort money from locked-in Matlab users.
What else can researchers do?
There are a few workarounds that researchers have developed to try and do parallel computing without incurring the restrictions of Matlab’s MDCS licensing model. One method is to install an MPI distribution such as MPICH2 onto a computing cluster, and then compile wrappers to the MPI function calls using Matlab’s mex compiler so that they can be called from your Matlab program. Finally, using an institution’s many (possibly infinite) non-parallel Matlab licenses, one can launch many separate Matlab tasks that have the capability to communicate through MPI.
This technique works, but there are significant drawbacks. Software built this way is unable to use Matlab’s wonderful debugging capabilities on the parallel system. It is also unable to fully use other parallel debugging tools because the debuggers will be unable to see inside of Matlab’s proprietary binaries. Programs using MPI can be very hard to develop, debug, and maintain, and neither Matlab’s tools nor the tools used by MPI developers will work properly on the Matlab+MPI Frankenstein’s monster that has been created. Nevertheless, this is a common solution adopted by frustrated researchers.
Another technique is to use a cluster’s parallel file system to coordinate communication between separate non-parallel Matlab tasks instead of the network. Not only does this have the development, debugging, and maintenance problems discussed above, it also has slow performance because the file system is much slower than the network on a computer cluster.
Since Matlab contains technical limitations to limit its parallel scaling capacity and the only available products for parallel computing are the parallel computing toolbox and MDCS, any creative methods that a researcher might devise for achieving parallelism with Matlab outside of these official products is at risk of being seen by MathWorks as circumvention of a technical limitation. If so, they could argue that these techniques are unlawful under the infamous anti-circumvention provisions of the USA’s Digital Millenium Copyright Act or similar laws in other countries.
Finally, researchers who cannot continue their research because of Matlab licensing restrictions may choose to port their code to another programming language. This means redoing much of the development work that went into creating the research software in a different language that offers more freedom. Porting code can represent an incredible investment of labour that could be invested in research instead, but this effort may be well worth it for the potential freedom it can bring. There is a free software project called GNU Octave which attempts to be compatible with the Matlab programming language, presumably a response to the many ex-Matlab users who wish to compute with freedom while reducing the costs of porting their code to another system.
Mathworks uses a free software product to provide parallel scaling capacity for Matlab Parallel Toolbox, and then charges additional fees for the removal of arbitrary restrictions on that capacity through MDCS. The MDCS product is not appealing as a distributed computing platform because it is error prone and unnecessarily complex, resulting in considerable labour overhead in using it. The market for this product is not researchers looking for a general-purpose distributed computing platform, but researchers who are already locked-in to Matlab and for whom it is the only option. The costs of MDCS are very high relative to the costs of Matlab, the parallel computing toolbox, or the free software library that provides its parallel scaling capabilities. However, its costs are not transparent to researchers planning for a long-term software project. They are a secret that is only revealed to them through a customized quote at the time they are ready to run their software at larger scales. This makes planning in advance for the costs of distributed computing with Matlab something that is rarely done in practice, forcing researchers to absorb unanticipated expenses late in the software development cycle.
It is a fact of research computing that problem sizes are becoming larger and distributed computing platforms are becoming more and more common and accessible. The marketing for MDCS indicates that you can scale up to cluster computing without the expense and hassle of changing your code. This is a reminder from MathWorks to researchers to think ahead when writing your research software and choose technologies that scale to address the modern (distributed!) reality of research computing. You do not want to get burned by choosing the wrong technology and discovering that you can’t run your code on the same problem sizes as your competitors, or are unable to make use of the new parallel computing hardware that is always becoming available. With Matlab, the software scales, thanks to MPICH2, but the licenses do not. If you are a researcher, you should think ahead and choose a programming environment where the technology scales without the expense and hassle of changing your code, or your license.
Ultimately, I propose that the solution to this be that researchers and organizations that perform research divest in software products that arbitrarily restrict the capabilities of research code for want of more and more licensing fees. If there must be licensing fees, they should be tied to things that represent real development effort from the vendors. They should not represent the removal of arbitrary restrictions such as maximum number of characters in a source file, maximum number of workers, or anything else that might be dreamed up to inconvenience users into paying more.
From MathWorks, I would like to see the technical restrictions on the Parallel Computing Toolbox lifted so that it can scale with the full power of the free MPI distribution that fuels it. This model would improve the usability and stability of Matlab on distributed computing systems by removing restrictive DRM, and would serve to avoid damage done to research efforts for those who have chosen to use Matlab products. It would also reduce development costs on MathWorks part because the development of MDCS and restrictions on the parallel computing toolbox would not be necessary.
I understand that Mathworks and their products exist to make money, and I am not necessarily advocating for reducing the overall costs of parallel computing with Matlab for researchers or institutions, or recommending that MathWorks reduce their total revenue: they can charge a fair license fee for Matlab and an additional fee for the Parallel Computing Toolbox. What I am advocating for is the removal of harmful proprietary restrictions from the parallel toolbox that limit its ability to scale to multiple nodes and many cores, and the abandonment of an unethical business strategy that exploits researchers through vendor lock-in.