TIME Education

Here’s the New Way Colleges Are Predicting Student Grades

Students Computers
Getty Images

Data algorithms cover millions of grades from thousands of students

For years, Stephanie Dupaul would jokingly consult her collection of Magic 8 Balls when students asked her questions such as, “Will I get an A in that class?” Now, she can give them an answer far more accurate than anything predicted by a toy fortune-teller.

Dupaul, the associate provost for enrollment management at Southern Methodist University, is one of a growing number of university administrators consulting the performance data of former students to predict the outcomes of current ones. The little-known effort is being quietly employed by about 125 schools around the U.S., and often includes combing years of data covering millions of grades earned by thousands of former students.

It’s the same kind of process tech behemoths like Amazon and Google employ to predict the buying behavior of consumers. And many of the universities and colleges that are applying it have seen impressive declines in the number of students who drop out, and increases in the proportion who graduate. The early returns are promising enough that it has caught the attention of the Obama Administration, which pushed for schools to make heavier use of data to improve graduation rates at a White House higher education summit last week.

The payoff for schools goes beyond graduation rates: tracking data in this way keeps tuition coming in from students who stay, and avoids the cost of recruiting new ones, which the enrollment consulting firm Noel-Levitz estimates is $2,433 per undergraduate at private and $457 at four-year public universities.

“It’s a resource issue, it’s a reputational issue, it does impact — I’ll say it — the rankings” by improving graduation rates, Dupaul says.

At SMU, for instance, data analysis showed that students who applied early in the admissions process were more likely to ultimately earn degrees. So were those who visited the campus before enrolling, joined a fraternity or sorority, or registered for a higher-than-average number of classes.

From this and other knowledge, the university has built a predictive algorithm that can gauge the probability that a student will finish school, and prop up those who might not by sending academic advisors or deans to intervene.

Other universities also use detailed data to make sure students stay on track once they’ve arrived. Georgia State, for instance, has analyzed 2.5 million grades of former students to learn what may trip up current ones. That early-warning system, begun in 2012 to address a lower-than-the-national-average graduation rate, triggered 34,000 alerts last year about students who may have been in trouble, but didn’t know it yet.

It works by identifying risk patterns that can help catch students before they fall. For example, Georgia State’s data shows that students’ grades in the first course in their majors can predict whether or not they will graduate. Eighty-five percent of political science majors who get an A or B will earn degrees, but only 25% of those who score a C or lower will.

“What we used to do, and what other universities do, is let the C student go along until it was too late to help them,” says Timothy Renick, Georgia State’s vice president for enrollment management and student success. “Now we have a flag that goes off as soon as we spot a C in the first course.”

That student is invited to meet with an advisor and given the option of switching majors before spending more time and money on a losing proposition.

The university also uses its predictive algorithm to channel incoming freshmen with higher risk factors — like those who come from high schools where earlier graduates have been poorly prepared — into a seven-week summer session. Nine out of 10 of these students make it to the end of the first year, more than their classmates who entered without red flags.

And the analysis isn’t limited to first year students. Last year, some 2,000 Georgia State upperclassmen were hauled in for one-on-one sessions with an advisor when they signed up for courses that didn’t satisfy requirements for their majors — which the data showed would probably derail them — and moved to classes that did.

“Most students, when they take classes that don’t apply to their program, it’s not because they’ve always wanted to take a course in Greek philosophy,” says Renick. “It’s because they don’t understand the maze of rules that big institutions like Georgia State have created. And when they go off course, it’s a difference between graduating and not graduating.”

The university also uses 12 years of data from former students to nudge current ones toward majors that track more closely with their academic strengths, thereby increasing their chances of graduating.

“It’s a really simple process,” Renick says, “but it’s the kind of thing that higher education hasn’t been doing.”

Despite the promising early returns, most institutions have not embraced predictive data. Only about 125 of the more than 4,000 degree-granting postsecondary institutions are using data in this way, according to the Education Advisory Board, a firm that helps Georgia State and other schools run such programs.

More will sign on, experts say, because it can do as much for the bottom line as it does for students. For every 1 percentage point improvement in the proportion of students data tracking keeps from dropping out, Renick says, Georgia State keeps $3 million in tuition and fees that would have otherwise been lost. So far, that rate has increased by five percentage points since the university started tapping this data two years ago, meaning it has more than recouped the $100,000-a-year cost of running the system and the $1.7 million per year it takes to pay an extra 42 advisors hired to help the students it predicts might fall between the cracks.

“It’s no longer just a moral imperative. It’s a financial imperative,” says Ed Venit, a senior director at the Education Advisory Board. “The students who are on their campuses now, they have to keep them around, hopefully ’till graduation.”

Yet graduation rates overall are down, not up, since 2008, according to the National Student Clearinghouse. Only 55% of students earn their two- or four-year degrees within even six years, as they switch majors, flounder through required courses, and take classes they don’t need.

To Venit, analyzing that information — which schools already collect — can help avert such stumbles. “The data is so accurate that we can see the problems coming a mile away,” he says. “Higher education is lagging behind other industries in the use of this.”

That’s begun to change as students, parents, and policymakers press universities to provide a better return on their investments, and as universities themselves — especially public schools, whose revenues are under strain — are forced to become more efficient.

At Georgia State — where 80% of students are racial minorities, low-income, the first in their families to go to college, or from other groups that often struggle to graduate— the six-year graduation rate had fallen to a dismal 32% before the university began to look at data. It’s since increased to 53 percent.

“Think of going through college as driving a car and the destination of the car is graduation,” says Mark Becker, Georgia State’s president, a first-generation college student who went on to earn a PhD in statistics. “If you start drifting off the road, we want to straighten you out and keep you driving forward.”

Such aid is becoming increasingly important as the students arriving on campuses look more like the ones at Georgia State: less affluent, nonwhite, and often the first in their families to attend college.

“A lot of these are students who are just barely able to afford college,” Renick says. “Taking the wrong course, getting a couple of Fs, losing a scholarship, wasting credit hours all can stop them from getting a degree.”

Now the university is poring over its data to determine how to predict when financial problems might force students to drop out, and offering “micro grants,” with stringent conditions, to keep them enrolled. Nine out of 10 freshmen who were offered the grants last year stayed in school.

At Purdue University Calumet, where only 31% of students graduate in six years, 74% of students returned this fall — a 5% improvement over the year before. The gain preserved nearly $500,000 in tuition, and saved the school the expense of recruiting new students to fill those empty seats — an amount worth almost five times what the university says it paid to analyze and act on the data.

Southern Illinois University increased its return rate by an even larger 8.3 percentage points, to 68%, and its revenue by more than $2 million, according to John Nicklow, who was provost when the process was begun last year. Those gains came after the university used data to identify a much larger proportion of students who needed help than was previously thought. The cost was about $100,000, part of it paid for by a grant from the Bill & Melinda Gates Foundation.

“I can’t believe it’s taken us this long to dig into this data,” says Nicklow, an engineer by training. “More of us need to do it.”

Sitting amid her collection of 30 Magic 8 Balls at SMU, Stephanie Dupaul calls predictive data “one of those waves that’s coming. A lot of schools just haven’t caught the wave yet” But she cautions that even the best algorithms can sometimes be about as precise as the toys that line her desk.

“We still have to remember that data alone is not always a predictor of individual destiny,” she says, “even when ‘Signs Point to Yes.’”

This story was produced by The Hechinger Report, a nonprofit, independent news website focused on inequality and innovation in education.

Read next: Forget College, This Is the Expense New Parents Should Be Freaking Out About

MONEY stocks

3 Things to Know About IBM’s Sinking Stock

141020_INV_IBM
Niall Carson—PA Wire/Press Association Images

IBM's shares plunged 7% Monday after a disappointing earnings report. Can tech's ultimate survivor transform itself one more time?

International Business Machines INTERNATIONAL BUSINESS MACHINES CORP. IBM 0.5264% has long enjoyed a unique status on Wall Street — a tech growth powerhouse that investors also see as a reliable blue chip, with steady profit growth and a hefty dividend. But with the rise of new technologies like cloud computing, Big Blue has struggled to maintain that balancing act.

Now investor confidence has suffered a big blow.

On Monday the company announced the results of a pretty lousy quarter. IBM’s third-quarter operating profit was down by nearly one fifth, and the company failed to generate year-over-year revenue growth for the 10th consecutive quarter.

Big Blue also revealed plans to sell-off its struggling semiconductor business, a move that involves taking $4.7 pre-tax billion charge against IBM’s bottom line. Actually, it is paying another company to take this unit off its hand.

While CEO Virginia Rometty acknowledged she was “disappointed” with IBM’s recent performance, she’s also pledged to turn the company around, led in part by IBM’s own foray into the cloud.

Now, you don’t get to be a 103-year-old tech company without learning to adapt. That’s what IBM famously did in the ’90s, when the computer giant started to shift away from profitable PC hardware in favor of consulting and service contracts for businesses.

But Monday’s dismal earnings show just how hard repeating that trick could turn out to be.

Here’s what else you need to know about the stock:

1) You can’t really call IBM a growth company anymore since its sales aren’t rising.

When it comes to revenues, IBM ranks behind only Apple APPLE INC. AAPL -0.7723% and Hewlett-Packard HEWLETT-PACKARD CO. HPQ 0.2009% among U.S. tech companies. On a quarterly basis, though, sales have actually shrunk for 10 periods in a row, including a 4% slide in the third quarter. The big culprit is cloud computing, in which businesses can access computing services remotely via the Internet.

Since the 1990s, IBM’s model has been premised on selling powerful, expensive computers to large businesses, then earning added profits on contracts to help firms run those machines. But the cloud lets companies rent, not buy, this computing power. “You only pay for what you use,” says Janney Montgomery Scott analyst Joseph Foresi. The result: IBM’s hardware revenues sank 15% last quarter.

2) IBM is racing to be a leader in cloud computing, but with mixed results.

The company has identified four alternative areas of growth. One is the cloud, the very technology eating into IBM’s hardware sales. Big Blue has spent more than $7 billion on cloud-related acquisitions. It’s also going after mobile, IT security, and big data, the analysis of information sets that are too large for traditional computers. An example of that is Watson. IBM’s artificial-intelligence project, which won Jeopardy! in 2011, is being marketed to businesses in finance and health care.

These initiatives have promise, but IBM’s size is a curse. For instance, the company’s cloud revenues jumped 69% to $4.4 billion last year, but with nearly $100 billion in overall sales, “it’s hard to move the needle,” says S&P Capital IQ analyst Scott Kessler.

3) The stock is now much cheaper than its tech peers, but it may deserve to be.

Investors willing to wait and see if these moves will transform IBM may take comfort in the fact that the stock looks cheap. What’s more, the shares yield 2.4%, vs. 2% for the broad market. This could make the company look like a good value.

But investors should tread carefully, says Ivan Feinseth, chief investment officer at Tigress Financial Partners. He notes IBM has spent $90 billion on stock buybacks in the past decade, which has kept the P/E low by increasing earnings per share. Yet none of that money was invested for growth, as evidenced by IBM’s sluggish annual growth rate. It is hard to imagine IBM outmuscling Amazon AMAZON.COM INC. AMZN 0.7288% , Cisco CISCO SYSTEMS INC. CSCO 0.434% , Microsoft MICROSOFT CORP. MSFT 0.2946% , HP HEWLETT-PACKARD CO. HPQ 0.2009% , and Google GOOGLE INC. GOOG 1.0272% in the cloud — and there are better values in tech.

TIME Innovation

Five Best Ideas of the Day: October 7

The Aspen Institute is an educational and policy studies organization based in Washington, D.C.

1. Learning from our mistakes: Global response to the current Ebola crisis should improve our handling of the next outbreak.

By Lena H. Sun, Brady Dennis, Lenny Bernstein, Joel Achenbach in the Washington Post

2. A blueprint for reopening the tech industry to women: be deliberate, build a new pipeline that is openly focused on women, and attack the archetype of tech success.

By Ann Friedman in Matter

3. We need to change what’s taught in business schools and the narrative about business success that dominates boardrooms.

By Judy Samuelson in the Ford Forum

4. A health system that learns from its experience through data analysis can change medicine.

By Veronique Greenwood in the New York Times Magazine

5. A long overdue move to align our international development with climate reality could trigger sweeping policy changes around the world.

By Charles Cadwell and Mark Goldberg in the Baltimore Sun

The Aspen Institute is an educational and policy studies organization based in Washington, D.C.

TIME Ideas hosts the world's leading voices, providing commentary and expertise on the most compelling events in news, society, and culture. We welcome outside contributions. To submit a piece, email ideas@time.com.

TIME Data

Will We Have Any Privacy After the Big Data Revolution?

Operations Inside The Facebook Data Center
Operations inside the Facebook data center Bloomberg/Getty Images

Zocalo Public Square is a not-for-profit Ideas Exchange that blends live events and humanities journalism.

Corporations know more about their customer’s lives than ever before. But the information economy doesn't have to leave us exposed

Does the rise of big data mean the downfall of privacy? Mobile technologies now allow companies to map our every physical move, while our online activity is tracked click by click. Throughout 2014, BuzzFeed’s quizzes convinced millions of users to divulge seemingly private responses to a host of deeply personal questions. Although BuzzFeed claimed to mine only the larger trends of aggregate data, identifiable, personalized information could still be passed on to data brokers for a profit.

But the big data revolution also benefits individuals who give up some of their privacy. In January of this year, President Obama formed a Big Data and Privacy Working Group that decided big data was saving lives and saving taxpayer dollars, while also recommending new policies to govern big data practices. How much privacy do we really need? In advance of the Zócalo event “Does Corporate America Know Too Much About You?, we asked experts the following question: How can we best balance the corporate desire for big data and the need for individual privacy?

Corporations need to protect vulnerable data

Last week, the government of Singapore announced an increase in the cost of a toll at Bangunan Sultan Iskandar, the customs point for travelers entering and exiting between Singapore and Malaysia. Motorists, who will have to pay over five times more than they previous paid, are furious. In protest, a group of hackers, known simply as “The Knowns,” have decided to use their skills to hack into and release corporate data on customers. The group released the mobile numbers, identification, and addresses of more than 317,000 customers of Singapore-based karaoke company K Box.

In an era of “hacktivism,” data is necessarily vulnerable. So how do we negotiate between companies’ increasing needs to collect and store our personal digital data, individuals’ privacy and ethical needs, and governments that are often slow to gain an understanding of these needs and how to address changes in this area?

If we borrow from recent work by psychologists and ethicists, we can agree upon a few preliminary guidelines: 1) Before collecting private and personal data, consumers should be informed of what data a company intends to collect, how it will be stored and used, and what precautions are being made to protect their information from data attacks. 2) Consumers should be given the ability to consent and opt-out from collection of personal data. 3) Companies that are collecting and storing personal data should periodically remind their customers about their data storing policies.

Although companies should have the freedom to be innovative in their business models (such as by collecting new types of consumer data), these methods should not compromise the individuals on whom companies ultimately depend.

Sean D. Young is the Director of the UCLA Center for Digital Behavior and a medical school professor in the Department of Family Medicine. He writes and teaches about topics at the intersection of psychology, technologies, medicine, and business, at seanyoungphd.com.

Big data isn’t magic

A big data society seems to be inevitable, and promises much, but privacy (properly understood) must be an important part of any such society. To have both privacy and the benefits of big data, we need to keep four principles in mind:

First, we need to think broadly about privacy as more than just the keeping of a secret, but as the rules that must govern personal information. Privacy rules are information rules. We have rules now protecting trade secrets, financial and medical data, library records, and computer security. We have to accept the inevitability that more rules (legal, social, and technological) will be needed to govern the creation of large data sets and the use of big data analytics.

Second, we need to realize that information does not lose legal protection just because it is held by another person. Most information has always existed in intermediate states. If I tell you (or my lawyer) a secret, it is still a secret; in fact, that’s the definition of a secret, or as we lawyers call it, a confidence. We must ensure that big data sets are held confidentially and in trust for the benefit of the people whose data is contained in them. Confidentiality rules will be essential in any big data future.

Third, we need to realize that big data isn’t magic, and it will not inevitably make our society better. We must insist that any solutions to social problems based on big data actually work. We must also insist that they will produce outputs and outcomes that support human values like privacy, freedom of speech, our right to define our own identities, and political, social, economic, and other forms of equality. In other words, we need to develop some big data ethics as a society.

Finally, it’s important to recognize that privacy and big data aren’t always in tension. Judicious privacy rules can promote social trust and make big data predictions better and fairer for all.

Neil Richards (@neilmrichards) is a Professor of Law at Washington University in St. Louis and an internationally-recognized expert in privacy and information law. His book, Intellectual Privacy, will be published in January 2015 by Oxford University Press.

Corporate research is always an unequal exchange

When asking “how can we best balance” the desires of corporations and the needs of individuals, we need to recognize that there are different “we”s involved here. Executives at Google and Facebook are interested in learning from big data, but they are, naturally, more concerned about their own individual privacy than the privacy of their users.

As a political scientist, I’m interested in what I can learn from moderately sized data such as opinion polls and big data such as voter files. And I naively act as if privacy is not a concern, since I’m not personally snooping through anyone’s particular data.

Survey organizations also profit from individuals’ data: They typically do not pay respondents, but rather rely on people’s goodwill and public-spiritedness to motivate them to participate voluntarily in helping researchers and answering surveys. In that sense, the issue of privacy is just part of the traditional one-way approach to research in which researchers, corporate and otherwise, profit from uncompensated contributions of the public. It is not clear how to balance this unequal exchange.

Andrew Gelman is a professor of statistics and political science at Columbia University. His books include Bayesian Data Analysis and Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do.

This discussion originally appeared on Zócalo Public Square.

TIME Ideas hosts the world's leading voices, providing commentary and expertise on the most compelling events in news, society, and culture. We welcome outside contributions. To submit a piece, email ideas@time.com.

TIME Books

9 Ugly Lessons About Sex From Big Data

Dataclysm
Dataclysm Courtesy Random House

Christian Rudder, author of Dataclysm and a founder of OkCupid, dives into the numbers and surfaces with some revelations on love, sex, race and culture

Big Data: the friend you met at a bar after your usual two drinks, plus one. You leaned in, listening more intently than usual. “Digital footprint.” “Information Age.” You nodded and smiled, even though you didn’t understand. “Change the world.” “The future.” You were impressed—and even if you weren’t, you faked it well.

Come morning, you have only fuzzy recollections of Big Data, its tag lines and buzzwords. You also find it vaguely reprehensible.

If you’re still up for it, there’s another side of Big Data you haven’t seen—not the one that promised to use our digital world to our advantage to optimize, monetize, or systematize every last part our lives. It’s the big data that rears its ugly head and tells us what we don’t want to know. And that, as Christian Rudder demonstrates in his new book, Dataclysm: Who We Are (When We Think No One’s Looking), is perhaps an equally worthwhile pursuit. Before we heighten the human experience, we should understand it first.

Rudder, a co-founder of OkCupid and Harvard-educated data scientist, analyzed millions of records and drew on related research to understand on how we search and scramble for love. But the allure of Rudder’s work isn’t that the findings are particularly shocking. Instead, the insights are ones that most of us would prefer not to think about: a racial bias against black women and Asian men, or how “gay” is the top Google Search suggestion for “Is my husband… .”

Here are 9 revelations about sex and dating, courtesy of Rudder, Dataclysm, and, of course, big data.

1. Straight men think women have an expiration date.

Although women tend to seek men around their age, men of all ages are by far looking for women in their early 20s, according to OkCupid data. While men often set their age filters for women into the 30s and beyond, rarely do they contact a woman over 29.

2. Straight women are far less likely to express sexual desire than are other demographics.

On OkCupid, 6.1% of straight men are explicitly looking for casual sex. For gay men, it’s 6.9%, and for lesbians, 6.9%. For straight women, it’s only 0.8%.

3. “Most men lead lives of quiet desperation and go to the grave with the song still in them.”

Like any good data scientist, Rudder lets literature—in this case, Thoreau—explain the human condition. Rudder cites a Google engineer who found that searches for “depictions of gay men” (by which the engineer meant gay porn) occur at the rate of 5% across every state, roughly the proportion of the world’s population that social scientists have estimated to be gay. So if a poll shows you that, for instance, 1% of a state’s population is gay, the other 4% is probably still out there.

4. Searches for “Is my husband gay?” occur in states where gay marriage is least accepted.

Here’s a Big Data nugget you can see for yourself: Type “Is my husband” in Google, and look at your first result. Rudder notes that this search is most common in South Carolina and Louisiana, two states with some of the lowest same-sex marriage approval rates.

5. According to Rudder’s research, Asian men are the least desirable racial group to women…

On OkCupid, users can rate each other on a 1 to 5 scale. While Asian women are more likely to give Asian men higher ratings, women of other races—black, Latina, white—give Asian men a rating between 1 and 2 stars less than what they usually rate men. Black and Latin men face similar discrimination from women of different respective races, while white men’s ratings remain mostly high among women of all races.

6. …And black women are the least desirable racial group to men.

Pretty much the same story. Asian, Latin and white men tend to give black women 1 to 1.5 stars less, while black men’s ratings of black women are more consistent with their ratings of all races of women. But women who are Asian and Latina receive higher ratings from all men—in some cases, even more so than white women.

7. Users who send copy-and-paste messages get responses more efficiently.

OkCupid tracks how many characters users type in messages versus how many letters are actually sent. (For most users, it’s three characters typed for every one character sent.) In doing this analysis, Rudder found that up to 20% of users managed to send thousands of characters with 5 keystrokes or less—likely Control+C, Control+V, Enter. A little more digging showed that while from-scratch messages performed better by 25%, copy-and-paste messages received more replies per unit of effort.

8. Your Facebook Likes reveal can reveal your gender, race, sexuality and political views.

A group of UK researchers found that based on someone’s Facebook Likes alone, they can tell if a user is gay or straight with 88% accuracy; lesbian or straight, 75%; white or black, 95%; man or woman, 93%; Democrat or Republican, 85%.

9. Vermont doesn’t shower a whole lot, relatively speaking.

Rudder has doled out some heavy info to ponder, so here’s some that’s a little lighter: in general, according to his research, in states where it’s hotter, people shower more; where it’s colder, people shower less. Still, the Northeast is relatively well-washed. Except, that is, for Vermont. Rudder has no idea why. Do you?

 

Rudder has a few takeaways from beyond the realm of love, too…

— On an insignificant July morning, Mitt Romney gained 20,000 Twitter followers within a few minutes.

Rudder dives further into social media data to show that Mitt Romney gained 18,860 new followers at 8 a.m. on July 22, 2012. Nothing particularly interesting happened on that day, and that spike in followers was about 200 times what he was getting immediately before and after. The secret? Likely purchasing followers. And Romney isn’t the only politician to do so—it’s a common practice, Rudder says, as we seek to strengthen our “personal brands.”

— Obama’s election and inauguration caused a massive spike in Google searches for “n-gger.”

According to Google Search data, search volume for “n-gger” more than doubled when Obama was elected in Nov. 2008, then fell rapidly within one month. When Obama was inaugurated in Jan. 2009, it similarly spiked, and then immediately fell. We don’t have national conversations on race, Rudder suggests, just national convulsions.

TIME Ideas hosts the world's leading voices, providing commentary and expertise on the most compelling events in news, society, and culture. We welcome outside contributions. To submit a piece, email ideas@time.com.

TIME Innovation

Five Best Ideas of the Day: August 4

1. Making the punishment fit the crime: A better way of calculating fines for the bad acts of big banks.

By Cathy O’Neill in Mathbabe

2. Lessons we can share: How three African countries made incredible progress in the fight against AIDS.

By Tina Rosenberg in the New York Times

3. Creative artists are turning to big data for inspiration — and a new window on our world.

By Charlie McCann in Prospect

4. We must give the sharing economy an opportunity to show its real potential.

By R.J. Lehmann in Reason

5. Technology investing has a gender problem, and it’s holding back innovation.

By Issie Lapowsky in Wired

The Aspen Institute is an educational and policy studies organization based in Washington, D.C.

TIME Data

Meet the Man Who Turned NYC Into His Own Lab

Steven Koonin, under secretary for science at the U.S. Department of Energy, listens during the 2011 CERAWEEK conference in Houston on March 11, 2011.
Steven Koonin, under secretary for science at the U.S. Department of Energy, listens during the 2011 CERAWEEK conference in Houston on March 11, 2011. Bloomberg—Bloomberg via Getty Images

Using big data to make a difference

In the mornings, Steven Koonin often dons a light blue shirt and khaki suit jacket, walks out of his apartment above Manhattan’s chic Washington Square park and heads for the subway. As he beelines down the sidewalk, the West Village buildings burp up black clouds of smoke as their boilers are fired on. At Sixth Avenue, an express bus screeches to the curb and blocks the pedestrian crosswalk. And as Koonin sits in the subway, he notices some of the signs are badly placed. “Can we fix this?” he wonders. He gets off at Brooklyn’s Jay Street-Metrotech station and rides an elevator to the 19th floor of a commanding building perched high above his native city. Then he gets to work.

Koonin is the director of New York University’s Center for Urban Science and Progress (CUSP), which is to say, he is the big data guru of New York City. He’s been given a lot of cash, millions of data points and a broad mandate by the city of New York: make it better. No big data project of this scale has been attempted before, and that’s because the tools never existed, until now. “There’s an enormous amount of data out there,” Koonin says with the vestiges of a Brooklyn accent. “If we can use the data to understand what’s going on in cities, we can improve them in a rational way.”

CUSP is both a research laboratory and a school. This year, it will have more than 60 students and 8 full-time faculty members. The students collaborate with the faculty and the city on big projects while they work toward either a Master of Science degree or an educational certificate. About a quarter of students this year will have social science degrees, another quarter each are engineers or scientists by training, and the rest will hail from fields as miscellaneous as film and fashion design. Their collective challenge is to turn numbers, spreadsheets, graphs and charts into a model that makes New York City work faster, cleaner, and more efficiently.

The program is already starting to make policy recommendations to the city, and as the institute attracts more talent, it will begin to play an important role in everything from easing Manhattan’s nasty rush hour traffic congestion, advising on prekindergarten school placement, cutting back on city pollution and helping businesses decide where best to open a franchise. “CUSP is able to work on those projects and take it to a deeper level of making more vetted recommendations,” says Nicholas O’Brien, the chief of staff in the Mayor’s Office of Data Analytics. “They bridge the gap between city data and creating actionable policy for city agencies.”

Koonin grew up in Bensonhurst, Brooklyn and in the late 1960s attended the prestigious Stuyvesant High School, where he and his friends once tried to use an old IBM computer (“it clunked along and had less power than your phone,” he says) to try and figure out the shortest time a subway rider could visit every single city stop on one fare. Koonin would go to the MTA headquarters to copy down timetables and input them into the computer.

Forty years later, Koonin has more data than he knows how to use. There are figures for household water consumption, purchases of goods, noise levels, taxi ridership, nutrition, traffic levels, restaurant inspections, parking violations and public park use; subway ridership, bus deployment, boiler lifespans, recycling rates, reservoir levels, street pedestrian counts; granular demographic breakdowns, household income, building permits, epidemic monitoring, toxin emissions, and on, and on and on. The challenge is making sense out of it, and that’s where CUSP comes in.

“The city has very little time to stand back and ask itself, ‘what are the patterns here?’” Koonin says. “That’s because they’re up to their asses in alligators, as you almost always are in government.”

Koonin would know. After receiving a Ph.D from MIT, he taught as a theoretical physics professor at Caltech before eventually working for BP and then the Obama administration. As Undersecretary of Energy for Science in the Obama administration, he was frustrated by the glacial progress on energy policy. To get things done, Koonin concluded, he needed a novel approach. “I came up with this notion of, ‘I’m going to go instrument a city as a scientist would,’” he says. In April 2012, he was announced director of the newly created CUSP program to make New York a living laboratory for urban improvement. Since then, Koonin has overseen a rapidly growing operation as it dances between 13 city agencies, the Metropolitan Transit Authority, the mayor’s office, and NYU, taking chunks of data and imagining actionable city policy.

CUSP’s temporary location (before it moves into the retrofitted Metropolitan Transit Authority headquarters) is an eclectic mix of high-tech and deep retro. The foyer, with firm orange chairs and dull wood paneling, looks like an Ikea designer recreated a 1970’s-era therapists’ office, but inside, two robots patrol the halls wielding touchscreens. A glass-enclosed conference room has 60 high-resolution monitors that on one Wednesday displayed the city’s taxi pick-up and drop-off data from the evening of May 1, and hundreds of teal and black taxi icons are scattered around a detailed digital map of Manhattan. In Koonin’s impressive corner office with magisterial vistas of downtown Brooklyn, he keeps a classic slate blackboard next to a keyboard. He can fluidly play “You Go To My Head,” the J. Fred Coots jazz standard, and “The Way You Look Tonight.”

“My dream is to be a lounge pianist,” Koonin the data-meister says drolly.

Like a doctor holding a prodigious stethoscope to New York City’s skyscrapers, Koonin needs to give the city a thorough physical before he can write a prescription. “The city has a pulse, it has a rhythm. It happens every day. There’s a characteristic pattern in the rise of economic activity, energy use, water use, taxi rides, et cetera,” Koonin says. “Can we measure the physiology of the city in its various dimensions? And define what normal is? What’s normal for a weekday, what’s normal for a weekend?”

“Then you can start to look for abnormalities,” he continues. “If subway ridership was low, was that correlated with the weather? When subway ridership is low, is taxi ridership high? You get a sense of what’s connected to what in the city. Can we look for anomalies, precursors of things? Epidemics, economic slowdown. So measuring the pulse of the city is one of the big things we’re after.”

CUSP is creating a system to measure microbiological samples from the city’s sewage system, using genomic technology to learn more about people’s nutrition and disease based on their waste. Do certain neighborhoods need better nutritional or hygienic practices? Another project involves a camera fixed to the roof of CUSP headquarters that can see anonymized data of when people’s lights turn on and off and monitor energy usage. When do people go to sleep? How regular are people’s sleeping hours? The institute is also working out a way to help the city’s Parks Department measure how many people use city parks, and what they do in them. (Hint: it could involve lots of infrared video.) The city could then much more intelligently design its public spaces.

“This is opening the door to the possibility that we would very accurately and very comprehensively understand how people would use our public spaces,” says Jacqueline Lu, director of analytics at the Parks Department.

The city’s 8.3 million-strong crowds, packed together on the subway like brightly colored gumballs or streaming through the streets like grains of sand blown by the wind, will be the ultimate beneficiaries of Koonin’s work. On his morning commute, he notes how the city has changed since he was a kid coming up in the public schools. “Everyday it’s really interesting to look at the crowds and see how they interact with one another,” he says. “The city works better. The trains are pretty much on time. So it’s pretty good.”

MONEY privacy

7 Ways to Protect Your Privacy Online

Illustration
Robert A. Di Ieso, Jr.

Companies can buy info on your health, political affiliations, financial stability, and more. Here's how to keep data brokers in the dark.

Data brokers store personal information about almost every single American consumer–and there’s usually very little you can do to see, correct, or delete your file. In fact, the companies that sell your personal data may know more about you than your own family does. That’s Federal Trade Commission chairwoman Edith Ramirez’s striking conclusion about a new government study on the data broker industry.

Brokers collect a wide swath of data about your buying habits, online behavior, home, finances, health, and more, according to the FTC, including this information:

• Your name (and previously used names), age, birthday, and gender
• Your address (and previous addresses), phone numbers, and email addresses
• Your Social Security and driver’s license numbers
• Your children’s ages and birthdays
• Your height and weight
• Your race and ethnicity
• Your religion (based on your last name)
• What languages you speak
• Whether you’re married (and whether you’re a single parent)
• Who lives with you
• Your education level and occupation (or if you’re retired)
• Bankruptcies, convictions for crimes, and tax liens
• Your state licenses–whether you hunt, fish, or have a professional license
• Your voter registration and political party
• The electronics you buy
• Your friends on social media
• How much you use the Internet and various social networks, including Facebook, LinkedIn, and Twitter
• Whether you use long distance calling services or mobile devices
• What kind of home you live in and how long you’ve lived there
• Your home loan amount, interest rate, and lender
• Your home’s listing price and market price
• How many rooms and bathrooms are in your home
• Whether you have a fireplace, garage, or pool
• What kinds of clothes you like
• What kinds of sporting events you attend
• The charities and causes you donate to
• Whether you gamble at casinos or buy lottery tickets
• Whether you’re a newlywed or pregnant
• The magazines and catalogs you subscribe to
• The media channels you use
• Whether you golf, ski, or camp
• Whether you own pets
• The celebrities, movies, music, and books you like
• Whether you have upscale retail cards
• The daytime TV you watch
• What credit cards you carry and your credit worthiness
• Whether you own stocks and bonds
• How many investment properties you own
• Your estimated income and your discretionary income
• Whether you have life insurance
• What car brands you prefer
• The make and model of your cars
• Whether you own a boat
• The most you’ve ever spent on travel
• Whether you’re a frequent flyer and your favorite airline
• Whether you own vacation property
• What kinds of vacations you take (including casino, time share, cruises or RV vacations)
• How you pay for things
• What kinds of food you buy
• How much you buy from “high-scale catalogs”
• What kinds of products you frequently buy
• Whether you buy women’s plus-sized clothing or men’s big & tall clothing
• Whether you search for ailments online
• Whether you or someone in your household smokes
• The drugs you buy over-the-counter
• Whether you wear contacts
• Whether you suffer from allergies
• Whether you have an individual health insurance plan
• Whether you’ve bought supplemental Medicare or Medicaid insurance
• Whether you buy weight loss supplements

 

How do companies know that? You might be revealing details about your private life without realizing it. Whenever you post information online, register on a website, shop, or submit a public record like a mortgage or voter registration, data brokers can collect information, and then turn around and sell what they have on you to advertisers and other companies (like risk mitigation and people-finder services).

Data brokers also make guesses about you and your interests based on other information they have, then sort you into groups, called “segments.” That way, advertisers can buy lists of consumers who might be interested in particular products.

Privacy advocates fear that companies might use personal information–and particularly demographic information–to discriminate against certain consumers. For example, the FTC warns that lenders could target vulnerable groups with subprime loans, or insurers could decide that people with adventurous hobbies are high-risk.

The industry line is that those concerns are purely speculative and that some customers appreciate targeted ads. “One interesting thing about this [FTC] report is that after thousands of pages of documentation submitted over the two years of thorough inquiry by the FTC, the report finds no actual harm to consumers, and only suggests potential misuses that do not occur,” Peggy Hudson, senior vice president of government affairs at the Direct Marketing Association, said in a statement.

The FTC is urging Congress to give you access to your data and the ability to opt-out of data broker services. In the meantime, here are a seven easy things you can do to limit what you share.

1. Delete Cookies

The first step towards protecting your privacy online is to delete “cookies” from your browser, says Paul Stephens, director of policy and advocacy at the Privacy Rights Clearinghouse. Cookies let websites collect information about what else you do online. Most browsers have privacy settings that let you block third-party cookies. But it’s not fool proof. Stephens warns that trackers are now switching from cookies to a new kind of targeting called fingerprinting, which is much harder to avoid.

2. Log Out of Social Media Sites While You Browse the Web

Another simple strategy, says Stephens, is to use different browsers for different online services. That will limit how much information any one site can collect about your web activity. For example, he says, “don’t go to a shopping site while you are logged in to Facebook.”

3. Change Your Smartphone’s Privacy Settings

Advertisers can also track you when you’re browsing the web on your mobile device, warns Gautam Hans, attorney at the Center for Democracy and Technology. You can change the privacy settings on your iPhone or Android device to limit ad tracking.

4. Skip Store Loyalty Cards

Data brokers collect information from the real world too, Hans says. It’s impossible to limit brokers’ access to some kinds of personal information, like public records. But if privacy is really important to you, decline offers for store loyalty cards–a major way retailers gather information about your buying habits. The downside? You may miss out on discounts.

5. Employ Advanced Online Tools

For the especially privacy-conscious, there are a number of online tools that can ratchet up your defenses. Some browser add-ons, like Disconnect.me, help you see and block tracking requests as you spend time online. Instead of Google, you can try the DuckDuckGo search engine, which promises not to collect or share personal information. Or use the browser Tor, which lets you go online anonymously. But these extra measures may not be a good fit for everyone. Some websites don’t load properly when you use anonymous browsing, Hans notes.

6. Opt-out of Data Broker Collection—Whenever Possible

Ultimately, it’s difficult to get data brokers to stop collecting information about you, or even find out how much information brokers already have. The FTC concluded that to date, “consumer opt-out requests may not be completely effective.” But one major data broker made waves last year when it launched a portal that allows you to access your data and opt-out of certain services. Check AboutTheData.com to see what information Acxiom has stored on you.

7. Do a Digital Check-up

Many popular sites like Facebook, Amazon, and Twitter offer privacy controls, so use them. Every once in a while, check your settings and see if you’re happy with how you are limiting the ways your data is used. “What’s important is that people have the opportunity to meaningfully consent,” Hans says.

TIME Culture

# Selfie, Steampunk, Catfish: See This Year’s New Dictionary Words

154446225
This picture displays "nautical steampunk fashion." Renee Keith / Getty Images / Vetta

Merriam-Webster has revealed 150 new words that will be added to its collegiate dictionary this year, ranging from 'hashtag' and 'catfish' to 'dubstep' and 'crowdfunding,' most of which speak to some intersection of pop culture, technology and the Internet

Today Merriam-Webster, America’s best known keeper of words, announced new entries for their collegiate dictionary in 2014. Among them are telling specimens like selfie, hashtag and steampunk, reflecting lasting cultural obsessions that have become widespread enough to earn a place in the big red book.

“So many of these new words show the impact of online connectivity to our lives and livelihoods,” says Editor-at-Large Peter Sokolowski, in a press release. And that’s not all.

Many of the 150 new words do indeed speak to some intersection of pop culture and technology, like Auto-Tune and paywall. But others, like freegan and turducken, remind us how many modern Americans are bravely pursuing alternative eating habits, refusing to forego dumpsters as a regular food source or to consume merely one kind of poultry at a time. And though MW does not say as much, others remind us of what lasting influence Kate Middleton has in our society (See: baby bump, fangirl).

Here is a selection of the new words, with their definitions and the earliest year Merriam-Webster editors could find them being used:

Auto-Tune (v., 2003): to adjust or alter (a recording of a voice) with Auto-Tune software or other audio-editing software esp. to correct sung notes that are out of tune

baby bump (n., 2003): the enlarged abdomen of a pregnant woman

big data (n., 1980): an accumulation of data that is too large and complex for processing by traditional database management tools

brilliant (adj., new sense): British: very good, excellent

cap-and-trade (adj.,1995): relating to or being a system that caps the amount of carbon emissions a given company may produce but allows it to buy rights to produce additional emissions from a company that does not use the equivalent amount of its own allowance

catfish (n., new sense): a person who sets up a false personal profile on a social networking site for fraudulent or deceptive purposes

crowdfunding (n., 2006): the practice of soliciting financial contributions from a large number of people esp. from the online community

digital divide (n., 1996): the economic, educational, and social inequalities between those who have computers and online access and those who do not

dubstep (n., 2002): a type of electronic dance music having prominent bass lines and syncopated drum patterns

e-waste (n., 2004): waste consisting of discarded electronic products (as computers, televisions, and cell phones)

fangirl (n., 1934): a girl or woman who is an extremely or overly enthusiastic fan of someone or something

fracking (n., 1953): the injection of fluid into shale beds at high pressure in order to free up petroleum resources (such as oil or natural gas)

freegan (n., 2006): an activist who scavenges for free food (as in waste receptacles at stores and restaurants) as a means of reducing consumption of resources

gamification (n., 2010): the process of adding game or gamelike elements to something (as a task) so as to encourage participation

hashtag (n., 2008): a word or phrase preceded by the symbol # that clarifies or categorizes the accompanying text (such as a tweet)

hot spot (n., new sense): a place where a wireless Internet connection is available

insource (v., 1983): to procure (as some goods or services needed by a business or organization) under contract with a domestic or in-house supplier

motion capture (n., 1992): a technology for digitally recording specific movements of a person (as an actor) and translating them into computer-animated images

paywall (n., 2004): a system that prevents Internet users from accessing certain Web content without a paid subscription

pepita (n., 1942): the edible seed of a pumpkin or squash often dried or toasted

pho (n., 1935): a soup made of beef or chicken broth and rice noodles

poutine (n., 1982): chiefly Canada: a dish of French fries covered with brown gravy and cheese curds

selfie (n., 2002): an image of oneself taken by oneself using a digital camera esp. for posting on social networks.

social networking (n., 1998): the creation and maintenance of personal and business relationships esp. online

spoiler alert (n., 1994): a reviewer’s warning that a plot spoiler is about to be revealed

steampunk (n., 1987): science fiction dealing with 19th-century societies dominated by historical or imagined steam-powered technology

turducken (n., 1982): a boneless chicken stuffed into a boneless duck stuffed into a boneless turkey

tweep (n., 2008): a person who uses the Twitter online message service to send and receive tweets

unfriend (v., 2003): to remove (someone) from a list of designated friends on a person’s social networking Web site

Yooper (n., 1977): a native or resident of the Upper Peninsula of Michigan — used as a nickname

TIME technology

My Experiment Opting Out of Big Data Made Me Look Like a Criminal

The Latest Mobile Apps At The App World Multi-Platform Developer Show
The Facebook Inc. and Twitter Inc. company logos are seen on an advertising sign during the Apps World Multi-Platform Developer Show in London, U.K., on Wednesday, Oct. 23, 2013. Bloomberg/Getty Images

Here's what happened when I tried to hide my pregnancy from the Internet and marketing companies

This week, the President is expected to release a report on big data, the result of a 90-day study that brought together experts and the public to weigh in on the opportunities and pitfalls of the collection and use of personal information in government, academia and industry. Many people say that the solution to this discomfiting level of personal-data collection is simple: if you don’t like it, just opt out. But as my experience shows, it’s not as simple as that. And it may leave you feeling like a criminal.

It all started with a personal experiment to see if I could keep a secret from the bots, trackers, cookies and other data sniffers online that feed the databases that companies use for targeted advertising. As a sociologist of technology, I was launching a study of how people keep their personal information on the Internet, which led me to wonder: Could I go the entire nine months of my pregnancy without letting these companies know that I was expecting?

This is a difficult thing to do, given how hungry marketing companies are to identify pregnant women. Prospective mothers are busy making big purchases and new choices (which diapers? Which bottles?) that will become their patterns for the next several years. In the big-data era of targeted advertising, detection algorithms sniff out potentially pregnant clients based on their shopping and browsing patterns. It’s a lucrative business; according to a report in the Financial Times, identifying a single pregnant woman is worth as much as knowing the age, sex and location of up to 200 people. Some of these systems can even guess which trimester you’re in.

Avoiding this layer of data detectors isn’t a question of checking a box. Last year, many people were shocked by the story of the teenager in Minnesota whose local Target store knew she was expecting before her father did. Based on her in-store purchasing patterns tracked with credit cards and loyalty programs, Target started sending her ads for diapers and baby supplies, effectively outing her to her family. Like the girl in the Target store, I knew that similar systems would infer my status based on my actions. So keeping my secret required new habits, both online and off.

Social media is one of the most pervasive data-collection platforms, so it was obvious that I couldn’t say anything on Facebook or Twitter, or click on baby-related link bait. But social interactions online are not just about what you say but also what others say about you. One tagged photo with a visible bump and the cascade of “Congratulations!” would let the cat out of the bag. So when we phoned our friends and families to tell them the good news, we told them about our experiment, requesting that they not put anything about the pregnancy online.

Social media isn’t the only offender. Many websites and companies, especially baby-related ones, follow you around the Internet. So I downloaded Tor, a private browser that routes your traffic through foreign servers. While it has a reputation for facilitating illicit activities, I used it to visit BabyCenter.com and to look up possible names. And when it came to shopping, I did all my purchasing—from prenatal vitamins to baby gear and maternity wear—in cash. No matter how good the deal, I turned down loyalty-card swipes. I even set up an Amazon.com account tied to an email address hosted on a personal server, delivering to a locker, and paid with gift cards purchased with cash.

It’s been an inconvenient nine months, but the experiment has exposed harsh realities behind the opt-out myth. For example, seven months in, my uncle sent me a Facebook message congratulating me on my pregnancy. My response was downright rude: I deleted the thread and unfriended him immediately. When I emailed to ask why he did it, he explained, “I didn’t put it on your wall.” Another family member who reached out on Facebook chat a few weeks later exclaimed, “I didn’t know that a private message wasn’t private!”

This sleight of hand is intentional. Internet companies hope that users will not only accept the trade-off between “free” services and private information but will also forget that there is a trade-off in the first place. Once those companies have that personal data, users don’t have any control over where it goes or who might have access to it in the future. And unlike the early days of the Internet, in which digital interactions were ephemeral, today’s Internet services have considerable economic incentives to track and remember—indefinitely.

Attempting to opt out forced me into increasingly awkward interactions with my family and friends. But, as I discovered when I tried to buy a stroller, opting out is not only antisocial, but it can appear criminal.

For months I had joked to my family that I was probably on a watch list for my excessive use of Tor and cash withdrawals. But then my husband headed to our local corner store to buy enough gift cards to afford a stroller listed on Amazon. There, a warning sign behind the cashier informed him that the store “reserves the right to limit the daily amount of prepaid card purchases and has an obligation to report excessive transactions to the authorities.”

It was no joke that taken together, the things I had to do to evade marketing detection looked suspiciously like illicit activities. All I was trying to do was to fight for the right for a transaction to be just a transaction, not an excuse for a thousand little trackers to follow me around. But avoiding the big-data dragnet meant that I not only looked like a rude family member or an inconsiderate friend, but I also looked like a bad citizen.

The myth that users will “vote with their feet” is simply wrong if opting out comes at such a high price. With social, financial and even potentially legal repercussions involved, the barriers for exit are high. This leaves users and consumers with no real choice nor a voice to express our concerns.

No one should have to act like a criminal just to have some privacy from marketers and tech giants. But the data-driven path we are currently on—paved with the heartwarming rhetoric of openness, sharing and connectivity—actually undermines civic values and circumvents checks and balances. The President’s report can’t come soon enough. When it comes to our personal data, we need better choices than either “leave if you don’t like it” or no choice at all. It’s time for a frank public discussion about how to make personal-information privacy not just a series of check boxes but a basic human right, both online and off.

Your browser, Internet Explorer 8 or below, is out of date. It has known security flaws and may not display all features of this and other websites.

Learn how to update your browser