The MTA publishes train service-change announcements by email. A few years ago, a smart guy named Pierre wrote a program to publish these email announcements on Twitter. A few days ago I wrote a program to pull the tweets in, parse them, and neatly file them into a database. Then graphed them out with R.
The results: Two years worth of service-announcements for the B, Q, F, G, and L trains that show a different side to a train we love to hate.
Side by side comparison of all negative service changes for easier comparison. Note that some trains run shorter routes, so you should expect them to have fewer problems.
The G train here is interesting: it is one of the most hated trains in the system. And yet, it had the fewest number of incidents of the entire sample.
“But hey, the G makes half the stops compared to the F train!” So I normalized the results to make it more Apples to Apples by dividing the number of incidents by the number of stations serviced.
Shockingly, the G train beats my workhorse favorites.. the Q and F.
MTA publishes it’s own statistics:
OTP – the percent of commuter trains that arrive at their destinations within 5 minutes and 59 seconds of the scheduled time.
Subway Wait Assessment – the percent of actual intervals between trains that are no more than the scheduled interval plus 25%. You can see more details on MTA Stat’s homepage.
The data suggests that the G line has fewer problems and arrives on time more consistently than other popular lines. So the idea that the G train sucks is unsupported by service-change announcement and arriving-on-schedule metrics. So why do people hate this line so much?
Edit: Now with more prongs!
My shot in the dark guess is the G-hate is
3 pronged 4 pronged:
- G train run through popular areas whose commuters are more vocal, visible, and stay out later. Catching a train after a party at 3am is much more painful than catching it sober at 7pm.
- Some of the G train stops are packed with rats. An extra minute there is more excruciating compared to a station that merely smells like piss, or is perhaps outdoors.
- The few service changes that the G suffers are actually more debilitating to the commuter because of a lack of good alternatives. Coming home late form work? Sorry you gotta take the shuttle bus!
This being the only Brooklyn cross-town train means if the G is down, you go from a 20 min train ride to a 1+ hour bus ride (if you can catch the bus). That stays with you.
- The G is a very short line with an average wait-time equal to a much longer line. There may be an optimum waiting time vs travel time that the G is violating. (more on this soon)
Edit: I’m responding to suggestions with a Part 2.
I think the problem wasn’t necessarily the delays, but how infrequently the damned thing used to run.
It’s easy to have no delays if you only send a train through the tunnels once a day.
Good point Pavel.
During the morning and evening rush hours, the F and L run every 5 minutes, while the G, Q, and B run every 8 minutes. That’s another objective prong to evaluate :)
Your first problem is in assuming that the source data is of actual value. The MTA simply doesn’t bother to announce when the G train service is altered.
The problem with the G train is that they frequently skip trains. It’s not at all uncommon to find yourself waiting for 20 minutes between trains in the times when it is scheduled to run every 8-10 minutes. The idea of a G train schedule is a joke.
Otherwise I agree that it’s solid. The G’s problem is the way the MTA runs it, not anything inherent with the trains, tracks, etc.
I think you also need to at least consider the number of trains running and adjust for that. You have to think of it in terms of lost rider opportunity – in other words, how many rides can be provided by a particular line at any given time, based on frequency and number of trains and their capacity? That would at least be an upper bound, but still wouldn’t account for people’s traffic patterns and time of day, if you cared about actually determining exactly many people are affected. Then after all that you apply the announcements to determine how many people are affected and how often.
It’s also important to note that the G (and L) train cannot physically be rerouted, run express or run local so those incidents should be taken out to make a better comparison.
I frequently take the G between Metropolitan and Hoyt-Schermer and I had to laugh at the only two mentions of running in sections. For a long time, there was a shuttle from Bed-No that you had to switch at almost every time I took it at night.
@Andy You are right, I remember the late night service changes at Metropolitan Ave, they are underrepresented here.. but not by much. Many of those announcements were labeled as “Bypassing Stations” because the same announcements said the the trains were bypassing all the stations between Hoyt-Schermerhorn and Church Ave.
Keep in mind that these graphs are of the number of service-changes announced by MTA and not the number of days the service-changes were in effect. I’d love to get my hands on the later numbers :)
Pingback: Which New York City Subway Line Is Really the Worst? | New York Local Me.me
Pingback: The G Train: Maybe Not So Bad! | Park Slope Stoop
Pingback: Graph of Subway Service Changes Shows the Q Isn't Too Great | Ditmas Park Corner
I lived in Greenpoint a decade ago, a block from a G train stop. Hilarious to hear that even after the neighborhood took off, getting to and from it is still a nightmare. The only train that doesn’t run into the city! Let’s move to that neighborhood.
How young and foolish I was. This data is awesome though. :)
Thank you for doing this study. I’ve been aware of the on-time performance of the G train for some time, and feel that people’s gripes were unwarranted.
In fact, I’ve used the google maps transit map, and have found the G train is almost magically on time every time I use it. However, I am not a regular commuter, so I may not have the same sensitivity as other riders.
I really feel that the G train gets the negative hype simply because it does not come as often as other trains do – especially in Manhattan. People expect the same regularity of trains that they get in Manhattan, but they do not understand that the areas that the G train services are nowhere near as dense as the areas that are serviced by trains in Manhattan. Further, since a lot of the lines that run through Manhattan go through the outer boros, the service on those lines is probably perceived as better because those lines have to service the dense areas of Manhattan. The G train does not have that burden on its schedule.
So good to get these thoughts out of my head. Thanks for your work!
As you accept is a possibility, the data is wrong and that is the problem the campaign is talking about here! The MTA’s service advisories frequently do not tell G train riders that there is a major problem with the train, even for hour-long delays. So the data you’re is working from is inaccurate.
You bring up a great point and I agree that the data could be better: oh how much I wish it was automated (the way the 1-6 trains are) and easy to parse.
But it’s the best I can find and I think it represents the whole system fairly well. The G train isn’t alone in not communicating delays perfectly. So while yes, the data may miss problems, if it misses the problems evenly for all trains, then the relative comparisons are worthwhile.
I will reiterate some points already made here. It’s a good idea to look at the truth to the perception of the G train sucking but there a few more things that need to be considered:
1. Rush hour trains on the 6 line run every 2 minutes. On the L line every 4 minutes. On the G? Every 8 minutes. So you are talking about a train that runs with significant wait times as it is, whether or not there is a delay. And so should there be a delay it’s that much more frustrating.
2. The trains are only four cars long. This means you must walk (or run) across roughly 1/3 of the platform to get your train – an extra minute commute time each way. It increases the chance of missing your train, and contributes to a general feeling that this line is somehow inferior.
3. When the R train runs in two segments, it terminates at Jay Street-Metrotech. When the 4 train runs in two segments, it terminates at Franklin Ave. These are destination hubs in busy areas with express and local transfers – it’s a minor travel interruption to be dropped off there. When the G runs in two segments (which feels very frequent when you rely on this singular line to get around every weekend or evening), the terminal is at Bedford-Nostrand. This nearly unavoidable station is dead in the middle of the route at a station with no transfers or alternatives. Considering the majority of G train riders have already transferred from another line to begin with, being asked to get off and wait just to continue your ride on the same line feels like a slap in the face. If Hoyt-Schermerhorn or Court Sq were the middle terminus of the two segments it would make sense – that they make you get off and transfer at a stop that most people get off anyway because it connects with many other lines. But those are actually the END points of the segmented service, with the middle terminus being a local stop in the middle of Bed Stuy. Imagine if the MTA segmented the 1 train in Manhattan at the 18th Street stop. That would be really irritating because most riders have to pass that stop to get where they are going, and there are no transfer options.
So it’s not just the frequency of service disruptions, it’s about how those disruptions are managed and about the quality of the service on a day-to-day basis which is being impacted.
@Joe that’s an EXCELLENT break down of the deeper issue! I’m trying to put data together to analyze the “pain level” of the interruptions. Because as you and some others suggested (and so far the data agrees): it’s not how often the trains come, but how much pain is involved when they don’t.
Unfortunately this is very difficult to automate (10,000 msgs to process JUST for these 5 trains), MTA’s data isn’t good for parsing, so I need to write a lot of code to “understand” what each announcement means . This will take me a lot longer than I was hoping!
I know the G Train service changes without any announcements. Particularly when it changes to shuttles that stop at Bedford-Nostrand. It adds a transfer to a trip, which can easily add 20 minutes of wait time.
What you did here is awesome given the MTA data. But some kind of crowdsourced data might be more accurate, at least for the G-train.
I knew someone who worked for the MTA that told me the MTA had ways of massaging the statistics: they’d change the (internal) name of a running train so that the train wouldn’t be counted as running late. The “new” train would be running on time.
Finally: now that I don’t use the G train as much, my perception of the MTA has improved. Even though my average trip time is longer than it used to be, it’s more predictable and therefore less frustating.
I also wonder if these announcments account for multiple service changes. Such as the current Greenpoint Improvement projects/Sandy Recovery. It’s technically one announcement but for multiple weekends. How does this factor into they’re statistics!