How long does a PhD take?

The question is critically important to current and prospective graduate students. The answer affects their ability to plan their educations, personal lives, personal finances, and careers. Deciding to do a PhD is a little bit like signing a cell phone contract that says how much you owe every month, but not when you get to stop paying. The nature of research means that the duration of a PhD is very uncertain. But, you would be foolish to start one without first learning about how long they take, at least on average. I was foolish like that.

Graduate programs attract students by competing with each other to offer graduate students appealing funding packages. But, graduate students can find themselves racing against the clock to finish their programs before these attractive offers run out. In a UBC PhD program, students who remain in their program after 4 years will find that they no longer have access to the four year fellowships, tri-council funding, or preferred TA hiring that attracted them to school in the first place. I was foolish like that. Perhaps it is necessary to cut students off so they don’t linger in the cushy student life forever. But, it is important to see if 4 years is a realistic goal that most students are actually able to achieve, or if the 4 year funding cliff is an unrealistic expectation that harms typical students for taking typical amounts of time to finish their program.

It is a very legitimate question to ask of an educational institution. Of course, the nature of research means that the length of a PhD is very uncertain. That’s to be expected. But prospective and current PhD students should have a realistic expectation of how long their degrees typically take. How many people take 4 years? How many take 5 years? How many take 6? Obviously, no one takes more than 6 years, UBC has a rule about that. It turns out that students and professors alike have a lot of misconceptions about the realities. It doesn’t help that UBC treats this information like its some kind of big secret.

I don’t know how long PhDs take at UBC. But, I did ask a lot of people, sent a lot of emails, did a lot of reading, analysed some data, and even made Freedom of Information Act requests. After all that, I had a small number of answers that didn’t agree with each other. I also had (I think) a number of people annoyed with me for asking too many questions that (I guess) students aren’t supposed to ask.

Asking around

I was involved for two or three years (depending on how you count) in my department’s prospective-student-wooing open house. If those are any indication, “How long do graduate degrees take” is being asked by a large fraction of prospective students coming to UBC. The journey you are currently reading about started because I was disappointed with the answers they were getting. I heard a lot of answers from professors along the lines of “2 years for Masters and 4 years for PhDs, although, obviously, a few slackers take 5 years or more.” I didn’t think that was quite consistent with reality.

In fact, it isn’t quite true. But, I don’t think the professors are being intentionally deceptive. I think they selectively remember their all-star students who wrap up in 3 years, and then make excuses for why they shouldn’t include people who take a long time in their average. Student X didn’t work hard. Student Y had health problems. Student Z had to work part time to feed her family. Obviously, those aren’t ‘average’ students who should be included in the average. The problem is that the students are just asking, when they should be saying “show me the data.”

Show Me The Data

Eventually, a friend of mine discovered that UBC actually had some graduate student completion data up on the PAIR website. It has since been placed behind a password (thanks to me: read on, dear reader), but here it is. Since this is a story about how hard it is to find out how long PhDs take, I should mention that the data was in the form of an Excel spreadsheet with 31832 entries instead of something sensible like a database. If you ever want to frustrate someone into not looking at large volumes of data, use Excel.

Here is a description of the data from a representative of FOGS:

The spreadsheet you have been looking at represents a study undertaken using a subset of Admission cohorts – that is taking a subset of each admission years new intake and tracking them to completion. The report you are looking at amalgamates the 1995 – 2003 Admission cohorts and presents the outcomes of these students. For the Outcome, students may have Graduated, Left UBC, Transferred to another program, Still be in Program or are Unknown if we don’t know what’s happened to them.

This data would easily answer my question, except that it has apparently been corrupted. Among lots of other information, it contains three dates: Program start date, Program end date, and graduation date for each student. The amount of time a student spends in their program is the Program end date minus the Program start date. I noticed something funny about this quantity: About 30% of PhD students in the represented cohorts completed their programs in 2 years or less and not a single person in those cohorts completed their program in between 2 years + 1 day and 3.5 years. About 20% of every student who graduated finished in exactly 2 years to the day. There’s a very suspicious gap in the data between 2 years and 3.5 years. I asked FOGS about it. They responded:

[The students who finish in 2 years or less are] likely those that have come from other institutions to complete their studies at UBC or in very structured programs – but the majority graduate in the 4-7 year range (400; 559; 469; 246).

… If you break the TIP Years by CIP Division (the Statscan area of program study) you’ll see that the earlier graduants tend to be clustered in Education and Social Sciences. Social Sciences includes Psychology, a Department that streams students from Masters thru Doctoral levels which helps them complete in a timely (often quicker) fashion…

…It makes sense that very few students would complete in the 3rd year. The earlier 0-2 group are likely either transfers or in uniquely structured programs (ie: Education) where completion may occur more quickly than anticipated. The average time to completion (graduation) is just over 5 years.

Okay, it all makes sense now. Its Education and Psychology students. Except it’s not. Here’s the breakdown for the PhD students who finished their programs in 2 years or less:
10 business & management
129 Education
160 Engineering
75 Health Sciences
23 Humanities
13 Professional
182 Science
96 Social Science

487 Domestic
195 International

Only 12 transferred from other institutions

This group of students is pretty statistically typical for UBC as a whole. They aren’t more likely to be education students as engineers or scientists or anything else.

The students aren’t demographically unusual in any way, but there is something highly unusual about them as a group: They wait an incredibly long time to graduate after their programs end. Sometimes, they wait almost 10 years. Here is a plot where the vertical axis is how long the student waited to graduate after the end of their program ([Grad date] minus [program end date]) and the horizontal axis is time in program ([program end date] minus [program start date]).

The students who complete their programs in less than two years are looking very conspicuous.

Here is the histogram of the two groups of students. Homework: Use your favourite statistical technique to discover how likely it is they were sampled from the same distribution.

Use your favourite statistical test to discover whether these two groups of students are sampled from the same distribution.

I showed these plots to the above mentioned representative of FOGS and the response from UBC was to not answer me and place the data behind a password immediately so that students could no longer get at it. (No worries, I put it here on the web.)

Freedom of Information

I had hit a roadblock with FOGS and still didn’t know how long PhDs took. But UBC had a report from 2010 with the title “Graduate Student Completion Rates and Times.” Perfect. I asked for it along with another report. Apparently, it is top secret:

I’ve talked with the Dean’s Assistant and neither reports are available for public readings as they contain privileged information.

Of course, when someone from the public requests personal or top secret information, employees of public institutions are required by the Freedom of Information and Protection of Privacy act to inform them that an FOI request must be made to obtain the information. But, I guess they forgot. I made one anyway.

[Spoiler: It will turn out that neither report was top secret.]

FOI Requests and responses:
Request sent March 30, deadline May 17
1. Graduate Student Completion Rates and Times, March 2010, including all appendices

  • Report was given June 5.

2. UBC Faculty of Graduate Studies 2011 External Review Self-Study, including all appendices

  • Report available on the web. Appendices delivered on July 9th, except appendix 25 disclosure of which may have harmed the business interests of a third party

3. Any draft versions of the report Graduate Student Completion Rates
and Times, March 2010

4. Any raw data used in preparing the report Graduate Student
Completion Rates and Times, March 2010, in machine-readable format
such as a spreadsheet.

  • “We were informed that draft versions of this report no longer exist in electronic or hard copy formats, and that the database containing the raw data for this report (drafts and final copy) is no longer available. Recreating that compiled data…would be extremely time consuming and would unreasonably interfere with our operations.”

[UBC: If you lost the data, don't even fret about it. You can just download it again here.]

5. All records related to the CUPE 2278 union’s bargaining to extend
the preferred hiring preference for TAs. Date range is January 1, 2012
to March 30, 2012.

  • 30 day extension requested on June 5
  • 535 pages of responsive emails were identified, mostly emails between UBC staff. These were withheld in their entierty because all of them contained advice, recommendations, legal advice, would harm the right to personal privacy or would harm the financial or economic interests of a public body. Fair enough. I didn’t get them on July 9.

6. All records related to graduate student completion rates and times
in the date range January 1, 2012 to March 30, 2012

  • “We have been informed that creating these records would involve compiling and analysing data, which would be extremely time consuming and would unreasonably interfere with our operations” June 5

All that work and all I got was some lousy appendices. Well, I also got the report that I wanted. The only really useful thing about it is that it is full of interesting plots like this:

How many students graduated by a certain time in program. The colours are different groups of students, but you’ll have to guess what they are because the report doesn’t say.

Because FOGS lost the data used to compile this report, I have no way of knowing if it is affected by the problems I discovered above.

So, How Long Do PhDs Take?

It depends who you ask. It may be 4 or 5 years if you ask the professor-on-the-street. It may be just over 5 years on average if you ask FOGS. But, I’m writing this, so I guess we are asking me (i.e. I’m asking me).

I don’t trust FOGS’s data for the reasons outlined above. I believe that there is a problem with the program end date column. Dates in this column may be getting “toggled” prematurely (at or before 2 years) for about 30% of students in the cohort. I don’t know the reason, but the evidence is in the plots above. We can estimate the average completion time by avoiding this column for the corrupted group of students. [Grad date] minus [Start date] is one way, but it doesn’t account for the months that students spend after they finish waiting for a graduation ceremony to happen. We can account for this time by subtracting off the average of ([Grad date] minus [Program end date]) for the uncorrupted group (students who took more than 2 years to graduate). Here is the histogram compared to the one for (Program end date – program start date):

Top: Time in program histogram as prescribed by FOGS
Bottom: Time in program histogram as prescribed by the author
The mean is about 6.5 months greater in the bottom histogram

The average time in program according to this is about 5.5 years. This is about 6.5 months longer than the bizarre looking FOGS data would claim. You are welcome to believe FOGS instead of me, but it could cost you about $700 in tuition and a few months of your youth. I was foolish like that. Keep in mind that there are well known variations according to what type of program students are in. By the way, programs where students finish quickly are correlated with programs where students are well funded. I’ll let you think about that.


Is a four year funding model reasonable? At UBC, 78% of PhD students need more time. Is a 6 year hard limit on PhDs reasonable? At UBC, 27% of students need to write a letter explaining why circumstances beyond their control resulted in them being unable to complete their degree in a timely manner. Those percentages only include the students who actually graduate. As we can see in FOGS’s data, many more end up leaving for various reasons.

I have the following requests for my various audiences:

Professors and Staff: Please download the graduation data, analyse the outcomes of students who graduated from your program and post the results where current and prospective students can get it. Then, everyone will have answers tailored to their department.

Graduate students: Please ask for the data when you ask “how long will my graduate degree take?”. Don’t let anyone tell you the answer in words because they really don’t know. Definitely don’t believe any program length indicators that involve someone giving you money (scholarship lengths, preferred TA hiring, etc.).

UBC Representatives: Please return the graduation data to the people of BC. I’m really sorry that I plotted it. In my defence you trained me for way too many years to compulsively plot things. Will you forgive me? Funny looking data is much better than none at all, and people would rather get it from you than from me. Keep new data flowing all the time. This information affects people’s lives. Also, if you have any answers to the questions I’ve raised, please join the conversation in the comments below.

Starting in September, I will be teaching first year undergraduate students computational physics. Well, that’s the plan, at least.

Mark Guzdial has posted a sobering series of blog posts that highlight how difficult achieving that plan is going to be. Mark’s blog posts summarize PhD research performed by Georgia Tech graduate Danny Caballero.

We experimented with a computational component in this course last year as one aspect of a series of major reforms. Our goal was to provide a small taste of how computer programming can be used in physics problem solving because this idea is emphasized in their textbook. However, we didn’t want to get distracted from our main goal: teaching physics. We provided the students with working vPython programs and problems that could be solved by making minor and obvious changes to certain lines of code, or changes to initial conditions, for example. We collected feedback from the students that indicated that no one was satisfied with this treatment: most students were not interested in programming and were frustrated at being forced to work with code they didn’t understand while the students who did have an interest in programming were frustrated at the lame, unchallenging activities they were presented with.

This year, we have decided to make a commitment to teaching programming, giving more challenging tasks that will satisfy the students who are interested in learning programming, while providing proper instruction in programming ideas so that (hopefully) no one feels the code they are working with is mysterious or scary.

In the first blog post, Mark explains why our students are less likely to learn the physics content of the course. Basically, by spending time on teaching programming, we have less time to spend on the physics, and the physics naturally suffers.

Of course, the main goal of the course is the teach physics, but it is a mistake to think that should be the only goal. It is especially a mistake to think that the point of the course is to achieve high student performance on the Force Concept Inventory. This course is an enriched physics course where most of the students are expected to go on to major in physics where they will take many more physics courses. This is very different from most first year physics courses which, for most students, are the last time in their lives they will receive any physics instruction. In these courses, it is reasonable to be concerned that your students have not become Newtonian thinkers. But a student who majors in physics will have Newtonian ideas reinforced over and over for several years, and so there is perhaps less pressure to ensure they are thinking Newtonianly at the end of their first course.

What is important for these student, in the context of their entire degree, is that they become well trained in all three pillars of physical reasoning: theory, experiment, and numerical computation. In my own undergraduate experience, I spent many hours each week developing myself as a theorist, and many hours each week in the lab developing myself as an experimenter throughout the entire duration of my degree. Numerical computation was relegated to a single course in a single semester taught by the math department without any explanation for how numerical analysis might be used by physicists. So, we would like to fix this problem and instill students with the idea that being able to solve problems by programming a computer is as important to your training as a physicist as being able to analytically solve problems, or make careful experimental measurements. Like analytical problem solving, or experimental methods, numerical techniques should be introduced early and reinforced throughout the entire duration of the degree program.

So, it is good to teach numerics to first year students, even if it temporarily sets back their education in physics concepts like Newtonian mechanics. The important thing is finding the right balance, and using the time you spend well.

The third blog post in the series, has even more troubling news for us. Caballero developed an instrument to assess the student’s attitudes toward computational modelling. This instrument is a series of questions which probe various aspects of attitudes, and the student answers are compared with answers given by experts at computational modeling. So, it produces a measure of how “expert-like” the students have become over the course of instruction. But, the results from this instrument are negative.

In particular, students after instruction had less personal interest in computational modeling, agreed less with the importance of sense-making (the third bullet above), and agreed more with the importance of rote memorization (last bullet above).

These results remind me of how students attitudes toward physics and science often shift to becoming less expert-like after most physics courses. Learning is shaped like a U, with an initial decline in performance as knowledge is getting restructured.

If some negative progress early on is a normal part of learning computational physics, perhaps we shouldn’t be too concerned, given that this isn’t the only course they will take. After all, the grand plan is to keep developing each student’s ability at computational physics regularly over the course of 4 years. So, if the bottom of the U occurs after the first semester, there is still plenty of time to spend on getting the students to climb up the slope of the other side of the U over the next three years of computational physics instruction. By the end of that, they should be more expert-like than when they started school, even if they experience a dip somewhere in the middle…at least according the the U-shaped learning idea. If most of the students in a course will never do another physics course, leaving them at the bottom of the U is probably a bad idea. But, with students who are majors, the final state of the student brain is what it looks like at the end of the entire degree, not at the end of the first semester of instruction.

Unfortunately, it may be a long time before we can really start thinking about the efficacy of entire degree programs in a truly evidence-based way. What PhD student is going to want to research a whole 4-year degree for their thesis project?! For the moment it is very frustrating that in trying to move forward, you have actually taken a step backward and have to look for excuses that make you feel good about knowingly taking that step backward.

Now, I’m going to take a close look at the types of errors made by the students Caballero worked with and figure out how they might inform our computational physics instruction. I’ll post more as the brilliant insights come to me.

We can show evolution taking small steps in the labs. But evidence for large evolutionary steps is unfortunately a bit rarer. This leads some non-experts to deny that small evolutionary steps can compound into large changes over time. So, it’s interesting to explore some of the important evolutionary milestones in the lab. One of those milestones is single celled organisms -> multicellular organisms. New Scientist is reporting that this step has been observed with Yeast in the lab.

Yeast clumps

Sure enough, within 60 days – about 350 generations – every one of their 10 culture lines had evolved a clumped, “snowflake” form. Crucially, the snowflakes formed not from unrelated cells banding together but from cells that remained connected to one another after division, so that all the cells in a snowflake were genetically identical relatives. This relatedness provides the conditions necessary for individual cells to cooperate for the good of the whole snowflake.

After a few hundred further generations of selection, the snowflakes also began to show a rudimentary division of labour.

A new Canadian $100 bank note has been unveiled. The bill depicts medical research and innovation:
New Canadian bill depicting medical research and innovation

Along with the generic white lab coat scientist at a microscope, the image shows a bottle of insulin a life-saving drug invented in part by Canadian Frederick Banting.

Ben Goldacre (of Bad Science fame) and some colleagues have done some research into the reliability of health claims made in UK newspapers. The results suggest that newspapers have a lot of misleading information.

The vast majority of these claims were only supported by evidence categorised as “insufficient” (62% under the WCRF system). After that, 10% were “possible”, 12% were “probable”, and in only 15% was the evidence “convincing”. Fewer low quality claims (“insufficient” or “possible”) were made in broadsheet newspapers, but there wasn’t much in it.

Ben describes his research here.

Wouldn’t it be nice to know what the weather is going to be like next week? What the stock market is going to be like tomorrow? What the housing market is going to be like over the next decade?

This BBC Radio 4 episode of discusses how accurate experts are at making predictions. It turns out they are not any more accurate than random chance. It may seem like they get a lot of things right, but if we don’t keep careful records, we may be selectively remembering their hits better than their misses.

So, what can make an expert better at predicting?

According to experiments, a few of the experts have modest predictive abilities. These experts were characterized by a willingness to be honest about their uncertainties, using numerical, probabilistic terms, and getting regular, precise feedback.

There does seem to be a different kind of style of thinking in some experts which is much more cautious…These people tend to be a lot better at making predictions.

Don’t trust an expert who is certain about the future. They tend to be only as good as chance and they are unlikely to improve as the feedback comes in.

Gambling, by careful design, is a losing proposition whenever you are not the ‘house’. But, it takes some familiarity with statistics and critical thinking to see what’s going on.

This clever game helps you understand why you shouldn’t go to McDonalds just to play their Monopoly game. You choose a prize you want to win and the game buys fictional Big Macs for you until, according to the official odds of winning and a statistical model, you win the prize. Afterwards, you get to know how many calories you consumed and how much money you spent playing the game.

You might get lucky and win the prize using less money than you spent at McD’s. But, much more often, you will learn that a cheaper contest is to go to a store and pay for the prize directly. You will have plenty of money left over to buy some balloons and champagne to celebrate your victory!

Many countries have no purchase necessary versions of popular contests.

The Unscientific Realm

October 8, 2010

Science strives to collect the best tools for collecting and evaluating evidence and making inferences. During this process, some tools are found to be broken and are discarded: Intuition, personal experience, authoritative dogma, eyewitness accounts, magical reasoning. None of these ways of knowing form an acceptable basis for argument in science because they are known to be unreliable. Unfortunately, people wandering by the tool shed of science are rummaging through the pile of discarded tools, hoping to find something useful.

Science is not the only way that us humans try to understand our world. There are other domains of inquiry which hope to comprehend spirituality, the purpose of the universe, ethics, the meaning of life, and the role of government, to name a few. Perhaps these tools which were turned away by science still have a place as we try to understand the unscientific realm.

There are many issues which are not easily dealt with by science. It is not clear how to define the terms, and it’s not clear how to make measurements. It’s not clear that some issues are even approachable to science. In this unscientific realm where notions are vague and evidence is lacking, it is very difficult to know what things are true and what things are false. From a scientific perspective, it seems that we have to throw up our hands and say that no conclusions can be reached.

But, does this mean that topics such as ethics, or spirituality are the domain of personal experiences, intuitions, and feelings? After all, personal experiences are anecdotal, intuitions are unreliable, and feelings are subjective, according to science. These ways of knowing are ancient tools which have been banned from the realm of scientific inquiry. Is it possible that these tools, which fail on scientific topics, are still relevant and useful when discussing nonscientific topics?

If this were true, it would imply that there is something special about scientific topics. For example, perhaps scientific topics are more difficult for our brains to comprehend than spiritual topics. If this were true, it would make sense that we need to protect ourselves from getting fooled with elaborate tools while we were doing science, even though we could let our guard down while doing theology.

But, there is nothing special about scientific topics. Our non-scientific tools are not failing because we are applying them to solve problems which are too hard. Our tools are failing because they never worked in the first place. The real problem is not with the nature of the problems we try to solve, scientific, or nonscientific. The problem is with the human mind. A small computer made of firm, pink tofu is simply not as good at comprehending the complexities of the universe as it perceives itself to be.

The result is a perception of reality which is plagued by biases, fallacies, and illusions. We see patterns where none exist. We see magic where there is only misunderstanding.

Science is not just one idea, and it is not just one way of knowing. It is a collection of ideas and tools that humans have found useful for understanding. They are not just appropriate for understanding forces, or cells, or chemicals, they are our tools for understanding every aspect of our world. In fact, the word science encompasses the full set of tools that are believed to be reliable; if there is a tool which can demonstrate its reliability, that tool is adopted into the set, and when a tool is found to be unreliable, it is removed from the set. So, the only tools for understanding which are not scientific are the broken, discarded ones, or new ones which have not yet proven themselves.

So, what does this mean for the topics of human inquiry which are not accessible to our scientific tools? How do we understand things when we can’t perform controlled experiments, make careful observations, and collect systematic evidence?

When a question comes up and the scientific toolbox isn’t equipped to answer it, we shouldn’t disgrace ourselves by rummaging through the pre-scientific tools that have been rightly banished from the realm of scholarly inquiry. We don’t get to manufacture truth from our gut, or our intuition. We don’t get to choose our favourite authority figure to tell us what reality is like. What we need are *new*, *scientific* tools for dealing with these questions. Ones that aren’t already known to be broken.

The reality is that there is no unscientific realm of inquiry. Humanity has always been trying to figure out what is true and false about reality. And, we have always done it by looking at various kinds of evidence and trying our best to make sense of it. We must do this in all of our inquiry, whether we are asking about science topics, or about non-science topics. It is true that our tools don’t probe into every dark corner of our understanding, but don’t be fooled into thinking there are other tools that do.

Whatever you base your understandings on, the only respectable position to take is to be honest about the limits of your understanding. None of us understands anything beyond the evidence
which is made available to us, and the models we formulate to account for that evidence. Oh, you’ll tell yourself you understand things better than that, but never believe yourself; you’d be placing your trust in firm, pink tofu.

Numeracy is a super power

October 7, 2010

One of the new shows in ABC’s lineup this season is No Ordinary Family. It is a show about an otherwise ordinary family who unexpectedly develop super powers.

The father becomes ultra-strong and bulletproof. The mother can heal quickly and has super human speed. The daughter can read minds. The teenage son…wait for it…develops enough numeracy to succeed in his high school courses.

This amazing ability is portrayed as supernatural, as amazing as the rest of his family’s super powers. When he looks at geometry textbooks, he can visualize diagrams. When he looks at a problem on the chalkboard, he can visualize a few steps and then see to the answer. At one point he is holding a textbook and says “I just get it”.

All of these super powers are just normal aspects of being numerate. Imagine watching a show made in a culture where not many people could read and watching a character suddenly being able to understand the words in a book after being sprayed with magical ooze. This is a terrific accomplishment for that character, but in a well educated society where literacy is common, we would scoff at the idea that magical ooze was needed to get the character to read.

This character tells us two things about the zeitgeist amongst TV producers and/or ABC’s target audiences. Firstly, that they see being able to understand math and use it in every day situations as a desirable trait. A trait one would fantasize about having. That’s great! The bad news, however, is that they are also portraying it as mystical. JJ wasn’t born with good math ability, so he needs a supernatural solution in order to pass his exams.

Well, I have good news for all the kids watching this show fantasizing about JJ’s amazing powers. You can have them too! You don’t even need to crash in a plane. All you need is an appreciation for the ways you can apply math in your life, and some willingness to spend time puzzling over some math concepts. Numeracy isn’t a gift, it is something that is earned and developed through work–just like learning to read.

JJ is the late bloomer of the family. So, it will be interesting to see how the show will portray his abilities when they are fully developed. Being numerate may be an amazing accomplishment for JJ, but super human it is not.

How to be a science journalist

September 28, 2010

This science news story template helps make science journalism easy!

I especially like how this article deals with the issue of controversy in science reporting. This issue has exactly two sides to it and there are no tools for determining which side has more validity than the other. Many journalists don’t stress this enough.


Get every new post delivered to your Inbox.