This site will work and look better in a browser that supports web standards, but it is accessible to any browser or Internet device.

Whedonesque - a community weblog about Joss Whedon
"It's a nativity scene, except nobody here is wise."
11944 members | you are not logged in | 29 July 2014












April 18 2011

Whose show is it anyway? A statistical analysis of characters' screen presence throughout the seven seasons of Buffy the Vampire Slayer.

"Because I'm an insane fan, I did a line and scene count of each episode, put out season-by-season summaries, then played with the numbers to try to find meaning. In particular, I started out interested in the trends revolving around Spike's character (testing the oft-repeated claim that he took over the show). However, my interest grew more general as the project went on. In addition to counting lines and scenes, I kept track of how many times characters were referred to by other characters. Doing so revealed interesting patterns not apparent in the line/scene counts." - Gabrielleabelle

A girl after my own mathematical heart. Could it be love? Only the numbers can tell...
Now that is one dedicated viewer and my mind is just too tiny to actually even think about someone else doing such a thing, let alone doing it myself! She has earned my respect...and fear! Fear of her big brain.
Well it doesn't bear out the claim that Spike took over the show, but if you had "Xander was buried" arguments you could make that case based on his use as the show progressed. Not a huge case mind you, but you do see him fade to the background in numbers.

That said, I'm kinda glad most of my assumptions were at least statistically valid. Although the good point that I think was made in this was that number of lines doesn't necessarily mean screen time since some people speak faster or slower or in some cases mean more with less.
Also very cool is the correlation measures. What I find interesting is that Buffy's lines seem to be negatively correlated to everyone except Giles, Spike, and Dawn. I might have thought Angel would be on that list, but who knew?

Also, Xander seems to be negatively correlated to love interests or plot related players and positively correlated to all the scoobies (save Buffy.) Exactly what you'd expect, but perhaps why he got lost in the series.

[ edited by azzers on 2011-04-18 05:53 ]
This is extremely interesting. I love statistics!

It makes me wish we had gotten a bit more Tara-centric stuff. The only one I can think of is "Family." We should have at least gotten another one.
Well, it isn't Fermat's last theorem, but I'm good with it. :)
There's also this epic work as well.

http://buffyandangeltrainspotters.blogspot.com/2010/05/top-fifty-buffyverse-characters_21.html

Sadly the top three characters haven't been done yet but it's a useful companion piece to the entry.
I haven't seen this post before. Thank you, Simon!
Holy poop, this is awesome.
So it is Buffy's show in the end.

Now if only we could see if AtS is Angel's show also.
I'd like to see a Season 8 analysis and look at how that compares with previous seasons.
These are fascinating, thanks for the links Emmie and Simon.
Egads, I never thought I be looking at correlation tables for fun. That was interesting stuff, I especially liked the part about the 'featured character' in each of the seasons who didn't get loads of lines or screen time but was Buffy's emotional centrepoint and was referred to loads by other characters. Cool.
Textual analysis! I love it. This ought to go to slayage, for sure. [I feel like I should send a link to Amber Benson to show her how Tara grew in importance (modestly) from S4-6].

Now, interpreting the meaning of this- let the debate begin!
Very cool. As the author says themselves, what they actually tell us is very open to debate (e.g. with the correlation matrix, long-term couples aside, most of the significant correlations seem to be negative which is about as you'd expect - when one person's the focus of the episode everyone else isn't) but just collatin' the data is an amazing job of work and I totally respect the attempt to actually quantify contentious points of view like e.g. whether Spike takes over the show or not.

(tiniest of nitpicks with the graphs for percentage of lines, maybe a barchart would be better there since joining the points with a line implies there's a continuous relationship when there really isn't. Which is why - unless i'm reading it wrong - you have e.g. a graph apparently saying Faith had some positive percentage of the lines in Buffy S5/6 even though she didn't actually appear)

((tiniest of nitpicks with the post blurb itself rather than the linked article - it's actually explicitly not a measure of screen-time, the author mentions that a couple of times))
Just looking at it briefly, and not studying it in detail, a couple points jump out at me. 1) It includes BtVS, and Angel. 2) Wesley was missed out on.
I think Joss is writing another verse to "Heart (Broken)" at this very moment.
Excellent. One more verse after that and we'll have enough words for a frequency analysis.

... what ?

1) It includes BtVS, and Angel. 2) Wesley was missed out on.

It only includes BtVS doesn't it Cutlass ? And even then, only the main characters. Though it's somewhat arbitrary it seems reasonable to accept that in BtVS Wesley wasn't a main character (Jenny Calender isn't in it either for instance).
Saje, I was going for the whole 'time is relative' aka this is the easiest way I can think of to sum it up in one sentence. I'll substitute 'presence' for 'time', though. ;-)

Excellent. One more verse after that and we'll have enough words for a frequency analysis.

... what ?


If one were to quantify affection and life narrative connection on a moment-to-moment deviating scale, the statistical analysis would show that you're my Featured Character on April 18, 2011 at 10:24 EST.

[ edited by Emmie on 2011-04-18 15:24 ]
Didn't Ocipital do something like this about 9 years back? Or maybe it was someone else. I forget. I just remember someone counting every character's lines in season 7.
Was it just season 7, eddy?
I believe so. They counted minutes of screen time and everything. It might've been posted on Buffy Cross and Stake.
That must have taken at least 3 days to complete! (Or maybe more like 3 years?)

Do you guys know if anyone has ever done a "knocked out" count? I'd be curious how many times Giles and Xander were knocked unconscious. Its a wonder that Xander doesn't have brain damage from all the concussions.
This is grade-A awesome! Been working for some time on a similar analysis project looking at character screen time and apparent audience reaction in Dollhouse*, and - considering how that show is only two short seasons long - to say I am in awe of the author's analytical fortitude would be a HUGE understatement. Bravo!

* Coming soon somewhere on these virtual streams, once real-life permits...

[ edited by brinderwalt on 2011-04-18 15:53 ]
Actually alexreager, you're in look. Saturn5 on buffy-boards.com kept a count of how many times the scoobies got knocked out on his ep by ep breakdowns. Buffy got knocked out the most, 23 times. Xander got knocked out 17 times and Giles got knocked out 12 times. Thats all over the course of the series.
lol @Saje

Analysis shows that King Duncan's lines sharply decline going into Act II and audience polling indicates a decidedly unfavorable viewer reaction. Text your vote to SHAKES. Act III to be rewritten to bring the popular character back.

Lady MacBeth: "Out damn statistics!"

Twain agrees. Details coming all in the next hour, but first up: Three witches, but only double toil and trouble. Are witches statistically under-performing?
Cutlass the main link only includes data for BtVS and includes only those who have appeared in the opening credits for the show plus Faith.

The link Simon posts further down uses data from BtVS and Angel and includes a Wesley analysis if you're interested.
I love this. So I'll add my voice to the chorus: great links, both of them! (The linked 'article' and the one Simon linked too.)

ETA: If you ask me the world can never have enough 'insane' fans like these!

[ edited by the Groosalugg on 2011-04-18 16:37 ]
If one were to quantify affection and life narrative connection on a moment-to-moment deviating scale, the statistical analysis would show that you're my Featured Character on April 18, 2011 at 10:24 EST.

That's easily one of the most mathematically coherent things anyone's ever said about me Emmie, ta ;).
BringItOn5x5 --

As someone who's spent most of her academic and professional career analyzing literature and studying the writing craft, I can certainly appreciate the value of formulas and statistics. Even the most basic forms of literary analysis embrace that stories often adhere to a pattern. This is also true in other mediums of art as music theorists will use mathemathics to understand music.

I don't think there's a higher ground to stand on in terms of the best way to analyze and appreciate art. Mathematics, statistics, and formulas have great value in providing another level of understanding.

Also, graphs and pie charts are sexy. I ship it. Graph/Pie Chart OTP. ♥

[ edited by Emmie on 2011-04-18 23:29 ]
eddy it might be on ATPoBtVS somewhere, their archives tend to be better than BC&S.
Emmie, I think I've heard this argument before.

'Understanding Poetry' by Dr. J. Evans Pritchard, Ph.D.

To fully understand poetry, we must first be fluent with its meter, rhyme and figures of speech, then ask two questions: 1) How artfully has the objective of the poem been rendered and 2) How important is that objective? Question 1 rates the poem's perfection; question 2 rates its importance. And once these questions have been answered, determining the poem's greatness becomes a relatively simple matter.

If the poem's score for perfection is plotted on the horizontal of a graph and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.

A sonnet by Byron might score high on the vertical but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will, so will your enjoyment and understanding of poetry.


Leaving aside all the dangers of collecting and shaping data, from a practical standpoint, is this data going to add to anyone's understanding of BtVS or is it going to be selectively used as pseudo-factual backup for already existing opinions about the series? I strongly suspect the latter.
One of the Buffy books also inlucded a head knock count for each episode.
I respectfully disagree. Statistical analysis is no more or less likely to be warped by bias than traditional literary analysis. I've read plenty of lit analysis where the author is clearly warping the text to prove a point (or grind an axe); you can tell the difference between forced conclusions and analysis that comes to conclusions more organically. The same is true of statistics -- you let the numbers display the pattern rather than forcing the numbers into a pattern that reinforces what you already believe to be true.

[ edited by Emmie on 2011-04-18 19:09 ]
On the other hand, we are a fandom where perceptions will always be more important than actual facts. Sometimes they meet, other times unfortunately they don't.
Emmie, I think I've heard this argument before.

Analytical analysis is not (always) synonymous with over-simplification...
When it comes to discovering "actual facts" in literature, it's more like trying to hit the bullseye as best you can and hoping enough people concur that you did in fact get that close to bullseye territory. And then someone will point out that there is another equally valid bullseye on the other side of the room only that bullseye is The True Bullseye. Blah bitty blah, I'm so stuffy, gimme a scone.

[ edited by Emmie on 2011-04-18 19:24 ]
Leaving aside all the dangers of collecting and shaping data, from a practical standpoint, is this data going to add to anyone's understanding of BtVS or is it going to be selectively used as pseudo-factual backup for already existing opinions about the series?

Well it certainly doesn't detract from our understanding of BtVS (because our previous understanding is still there) and the use/misuse argument applies to all information (as Emmie says, you see non-numerical analysis of the show that's biased all the time, bias isn't the sole purview of statistics and at least with numbers there're tools for checking for bias and the same information is there for all to check/use as they see fit).

To me it's a bit of fun that also might add to our understanding or suggest other ways to think about the show (the correlation data for instance could suggest unexpected ways characters may or may not be connected - if only Ben and Glory were on there !), it probably won't radically alter many fans' perceptions of Buffy but the idea that it's somehow a bad thing just because it's a numerical analysis seems pretty daft to me (I don't buy the "Dead Poets Society" idea that art works in a different way to everything else we do, that art appreciation by necessity means throwing off the shackles of analytical thinking and "just feeling" - sometimes that's true, sometimes not).


ETR some unnecessary speech marks.

[ edited by Saje on 2011-04-18 19:26 ]
Simon, I'm afraid we're too busy arguing the validity of statistical analysis to go looking for "actual facts" at the moment. Rain check?

[ edited by Emmie on 2011-04-18 19:29 ]
Well as someone once said "you can prove anything with statistics even the truth".
80% of quotes are made up.
"Lies, damned lies, and statistics."

Hadda. Sorry.
1) How artfully has the objective of the poem been rendered and 2) How important is that objective? ....

A sonnet by Byron might score high on the vertical but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will, so will your enjoyment and understanding of poetry.


If you look at this closely you can see that it attempts to give statistical validity to subjective judgements - "how artfully" and "how important" are not defined and cannot be quantified. Byron's long satirical poems attempt to do quite different things from Shakespeare's concise, intense sonnets. Don Juan is much better at being an epic satire than Sonnet 130. That's why the argument is ridiculous.

These statistics do no such thing, however - they count and quantify what is quantifiable, and reveal certain facts as a consequence. There are flaws - a silent, broody individual will talk less by definition than a lively, reckless, garrulous individual, for example. The writer/compiler acknowledges the potential fallibility of her metghodology openly at the start of her work.

I still think it is awesome - in particular the way in which she demonstrates how each season has a character who dominates not the speaking of lines but the conversation when s/he is not there. It's a real labour of love and a really interesting way of exploring characters.
Back to what Eddy was saying earlier, I did finally remember about 'The Line League'.

There's references to it here.

http://www.phi-phenomenon.org/buffy/characters.htm

But the main site, www.hellmouthhigh.co.uk where the data was housed is sadly no more.

HOWEVER!

It is available through Way Back When so if anyone wants to do a comparison between Gabrielleabelle's and octipal's research I would love to read it.

Here's a link to the Line League. You may have to go to a different month/year in WayBackWhen to see particular pages.
In the end, this is just data, and it has no meaning per se. There was no real research question to begin with, but the idea is sound- let's do a textual analysis based on time and line and collate the numbers. As I noted, interpreting this could be fun: what does it mean, if anything? I am not sure this can be answered, since we started with the data in the absence of specific aims. There was no real question save to see how Spike fared in the analysis, but the results provide no actual answer.
@Emmie It's a little funny: Someone with a background in literary analysis promoting statistics and an engineer decrying their use. I don't dispute that there's plenty of bias to be found without numbers, but numbers have a special way of lying to you. They look so simple... so pure... so convincing. For example...

@Gill You've fallen into the trap. These numbers are fully quantified. So let me ask... if a character in a dream sequence in Restless says a line, does it count? Technically, the character didn't say it "in the real world". What about a thought Buffy heard in Earshot, but was not actually uttered? What about Buffy in Faith or Faith in Buffy in This Year's Girl? Whose line is it - the physical body or the mental self? Are Willow, Vampire Willow and Dark Willow the same character? To Dana5140's point, without a goal in mind, it's an impossible to even try to answer those questions "correctly" - it depends on what you're trying to address. And even so, the eventual decision is ultimately an arbitrary one. It can't be counted or quantified, so what does that do to the numbers that change depending on the choice that is made? The numbers are only quantified to the extent that the methodology is fully specified and the resulting statistics are used strictly in that context... and for something of any complexity, those details will be dropped. It happens all the time in even the most precise environments.

@Saje Are you trying to use statistics to suggest some kind of connection between Ben and Glory? Clearly another fallacy.

And I don't know that I'm succeeding, but I'm actually not trying to offend anyone. If there was enjoyment in putting this together, that's fine. If others enjoy browsing the results, that's also fine. To me personally, it seems a little crass though. If you're seeing a play, musical, TV show, it's the ideas that matter. Slicing and dicing it to find which character said how much and when... that doesn't address the thought that went into it. It's blind to the thing's entire purpose for existing and so I do have to question what these numbers add to the discussion. If someone can write a paper with an interesting take on this data, I'll be happy to be wrong.
Yes, the primary focus of literary analysis is a reflection on the human condition through a story's themes, motifs, characterization, etc. Is observing this relevant?

This stats project isn't purporting to take the place of traditional literary analysis. Instead, it's useful in denoting patterns in storytelling structure. These stats aren't a tool for clarifying meaning in stories (e.g. what are the themes explored in "Chosen"?) -- they're a quantifiable means of highlighting patterns in how these stories are crafted (e.g. the episode-to-episode evolution of the relationship between Buffy and the Featured Character serving as Buffy's foil).

The difference is key. The question isn't why, the question is how. As an engineer, I'd think you would appreciate that. (My physicist father drilled this into my brain growing up -- don't ask why, Emily, ask how.)

To me, this stats project is most useful in analyzing craft. If one were expecting these stats to answer questions like, What is the meaning of the recurring use of fire imagery in relation to Buffy's emotional journey?, then I'm not surprised the stats are being deemed insufficient. As Spike would say, "Ask the right questions."

I'm actually not trying to offend anyone


Calling the project "crass" is uncalled for.

[ edited by Emmie on 2011-04-19 02:41 ]
I don't think you're offending anyone BringItOn, but I think there is disagreement to be had here.

I honestly think there isn't a person on here who would argue that the best way to analyze Joss's work is to build a statistics table. What I think we're finding is a lot of people are incredibly interested because no one has really bothered to do it (except for maybe Simon's link but it seems to be stuck in an endless loop for me.)

The thing about these statistics are they are either accurate or they're not. Gabrielleabelle isn't providing sampled data (so there are a number of biases that can't occur by definition), she's either giving a complete population counts based on her definitions of events. So the only place she can really be wrong is if she's made mistakes, or used a questionable methodology like allowing a long pause in between one person speaking to count as two lines (which she admits she did, due to defects in some transcripts.) Your point about Earshot or Faith in Buffy's body are well taken, but amongst the population these events should at best be noise. They only occur in one episode a season usually. Gabrielleabelle would have been wise to list standard deviation along with her average lines in addition to what she did because it indicates just how noisy her statistics actually are.

I think the natural gut reaction of someone from any (and I do mean any) profession is to hate statistics. Even a person working on an assembly line probably has reasons why his average output is or is not representative of his work (he's sick, his partner keeps bugging him, etc.) He doesn't have a choice in the matter. In baseball, a sport that lives in statistics, people will still argue their efficacy. And even in art, statistics have been used to prove everything from from a dominant trend, fad words, to systematic racism and sexism. The reason? Statistics don't care what the artist was trying to do, just like they don't care if a baseball player had an injury or was going through a divorce when he stopped hitting the baseball. The reality was the artist was habitually doing the same thing.

So backing up to Joss for a second, I find it incredibly fascinating how these things were broken down because I think it does show the care he put in to balancing his charecters out. But I also think it tends to show where on some level, certain qualified observations people had were also valid. He DID squeeze the core four's total time as the series progressed.

I'm willing to give the author the benefit of the doubt and say she was not cheating the stats to fit an arguement. That said, if someone wants to go over her work, that would be a more productive task I would think that simply insinuating that numbers can't be trusted because of bias.

Twain himself was a worldly man and was responding to other men's attempts to hoodwink the others with numbers because they are convincing (often because they had a weak arguement to begin with) and to ignore statistics that don't support their arguement. He was not attacking statistics by themselves.

* Quick addition, no one run statistics on the number of times I use the phrase "the reality is" or "I think." I really don't want to know.

[ edited by azzers on 2011-04-19 04:43 ]
On The Line League, which Simon, Emmie and Eddy have discussed: I believe Occipital (or was it someone else?--I remember the site) did all seasons of the show, and AtS--but ultimately due to differences in methodology the total number of lines counted were different between those results and gabrielleabelle's. Gabrielleabelle actually checked the transcript for each episode directly, whereas Occipital did a first pass with a computer program. In a sense those stats on lines are more 'complete'--more characters, more information, etc. But less precise, because there were mistakes (I even found a few) due to a less careful approach. So Gabrielleabelle wins on that front. :) That site also didn't track the number of scenes per character, or references.

Re: BringItOn5x5 and the statistics argument. I suppose if the author of this project were trying to present her findings as a definitive anything, it might be misleading. However, it seems to me she is trying to present her results as...the number of lines and scenes each character has. Wow. If that's of limited usefulness in drawing any absolute conclusions, that's probably because there are no absolute conclusions to be drawn. But it seems pretty clear that there is going to be a correlation between a character's lines and their importance. The raw data files are available (in .txt) so that you can resolve most of the questions you have--whether to count Faith-in-Buffy and Buffy-in-Faith as Buffy and Faith respectively if you feel you take issue with the author's choices, which are, in the individual season threads, stated outright.

It just seems downright bizarre to me criticize the project for failing to provide literary analysis when that's obviously not its intent. Its intent is to give some data and information that a lot of people find interesting. Because there is going to be a correlation between lines and presence, it also leads to possibly drawing tentative conclusions about characters' importance in each season--conclusions which are presented as exactly that--tentative.
@Emmie How is counting number of lines/scenes/references for characters a relevant metric for storytelling structure? If I give you counts for a given episode, can you tell me if it was primarily comedy, drama, etc.? That would seem to be the bare minimum bar for a structure correlation. How does it capture Buffy's relationship with the featured character episode to episode? They each may individually have many lines/scenes - and all or none might be with the other. "Ask the right questions" is spot on. "Use the right tool for the right job" would be another. Is there a correlation between these counts and that which we want to analyze or is it merely interesting trivia that says nothing more than what it literally implies? Personally, I don't see a lot of application, but I do have my concerns as to how it might be abused. It's not uncommon for people to elevate controversial events in the show to a moral issue. Not only did they not like what Joss wrote, but Joss was morally wrong for writing it. Does this just provide the ammunition to claim his writing was not just morally flawed, but factually/statistically flawed as well? Let's remember, the whole study came about over the "Spike taking over the show in S7" gripe. If that's the upside to it, I can't see it as a good.

@azzers I really don't suspect a deliberate attempt at bias in collecting the data. The nature of compiling statistics forces one to make subjective calls though, even without any intent to skew. It's actually a minor point, but there's always going to be some squish in supposedly "hard data". That's just the nature of summarizing the output of a complex system. I agree, statistics measure what actually is accomplished. But a statistic only continues to be kept if it offers some kind of relevant indication of achievement or lack thereof - and possibly a future projection for the purpose at hand. For the assembly line worker, it's output. For sports, it's winning. For an artist, actors and agents might have a different take, but the purpose of a show is not line/scene allocation.

@WilliamTheB Yes, I'm well into the downright bizarre now, but in all fairness, I've probably had some help getting here.
It just seems downright bizarre to me criticize the project for failing to provide literary analysis when that's obviously not its intent.

Yeah, that's the thing. Not to pile on BringItOn5x5 (i'm not offended at all BTW, we're just talkin', same as always) but she states up front that interpretation is a whole other kettle of worms. As others have said, this is more about what Joss (et al) did than what Buffy/Willow/Xander/Giles did, to me it's much more metatextual analysis than textual but since it's new information from an unusual perspective it may jog a few interpretive ideas loose too.

(mentioned this before but I read a series of books years ago where the tech savvy, scientifically minded detective lead character would use Tarot cards to kind of "nudge" his thinking down paths he may not have considered before. He knew Tarot cards didn't tell him anything about the world but he also knew that thinking and interpreting facts often benefits from different perspectives)

He DID squeeze the core four's total time as the series progressed.

Well, the very-slightly-less-core three anyway azzers (Buffy's lines stayed fairly consistent throughout it seems). As noted though, whether that means anything beyond simply more characters joining the regular cast is one of those matters of interpretation (no is my own feeling).

Let's remember, the whole study came about over the "Spike taking over the show in S7" gripe. If that's the upside to it, I can't see it as a good.

Well, surely if it highlights certain controversial claims as spurious then that's a good thing ? Or similarly, if it provides solid evidence that Spike did "take over" then that's also an end to the controversy. Sure, lots of people won't accept it either way (maybe for good reasons - lines != screen time or arguably significance for instance) but it at least provides a basis beyond individual's impressions to judge empirical claims like "Spike got too much attention" or "Because of the Potentials we had less time with the core Scoobs". As another example, in times past we've seen claims related to how black characters are treated on the show, wouldn't it be useful to actually know e.g. how many black vamps actually were killed by fire ?

I dunno, i'm just really struggling to understand how more information can be a bad thing. Maybe not useful, granted but actually a negative ? As I say, use/misuse arguments apply to all data but I very much doubt you're suggesting we just stop gathering information on anything BringItOn5x5 ;).

(maybe it's just a fundamental perspective thing - i'm very much an "I'd rather know" kind of person than a "Don't tell me if it's bad news" type)
When I say core 4, it's because I simply use the scoobies as a block. If you want to get right down to it, it's not even the Scoobies. It's specifically Xander and Giles. Willow actually increases over time and Buffy fluctuates but stays in the commanding lead.

For an artist, actors and agents might have a different take, but the purpose of a show is not line/scene allocation.


Agreed, but scene allocation is a component of the finished product (just like building a widget can be a component of a finished manufactured good but that finished good will vary depending on what widget you're producing). For example, an episode dominated by Willow and Giles in terms of scenes is going to be very different from one dominated by Buffy and Spike. Therefore I don't think scene allocation is an unreasonable measure if you're trying to quantify what kind of show Buffy was over its 7 year run.

[ edited by azzers on 2011-04-19 08:17 ]
When I say core 4, it's because I simply use the scoobies as a block. If you want to get right down to it, it's not even the Scoobies. It's specifically Xander and Giles. Willow actually increases over time and Buffy fluctuates but stays in the commanding lead.

Having defended "more information" above one issue I can see with this sort of analysis is, by its nature, we might start to get caught up in minutiae but bearing that in mind (i.e. having hung a lantern on it i'm now going to get caught up in minutiae ;), where do you get Willow goes up from azzers ? Her share of main character lines starts at 16% (in S1) and ends at 13% (in S7), which personally i'd call basically no change (but certainly not upwards) and her significant/centric episodes don't really trend upwards either.

Absolutely true though, as a unit the core 4 drop from a whopping 90% of the lines in S1 to around 60% ish in S6 and S7 (as a single character it's mainly from Giles though - Buffy's line share actually drops more or less in line with Xander's for instance) but again, the characters counted increases by 50% between S1 and S7, adding characters just seems an essential part of long-form storytelling to me, we might as well observe that the season number got higher as time went on ;).

(interestingly, as the article notes, Willow's lines peak in S4 NOT S6 but references to Willow are highest in S6 i.e. she wasn't saying that much but she was doing things such that people were talking about her a great deal. "Show don't tell" and all that)


ET break a monster para (somewhat arbitrarily but whaddya do ?).

[ edited by Saje on 2011-04-19 10:22 ]
Awesome! I love statistics and I love Buffy even more - so this was great fun to read.
Ah, but Saje, Willow's peak may have been in season four but season six is tied with season one for a close second--17% vs. 16%. The other thing to consider is that those totals represent the percentage of lines spoken by the major cast members. So--there were six major cast members in season one (including Angel) and nine in season six (if you include Giles and, because of one episode, Riley). In terms of average absolute percentage of lines per episode, Willow's season six total is considerably higher than her season one total (13 vs. 12) and just below her season four total (14 vs. 13). So--season six was closely behind season four as her 'best season.' And thus I disagree that she wasn't saying that much. Since she also was referenced often, perhaps it's less 'show don't tell' as 'show, and also tell'. :)

[ edited by WilliamTheB on 2011-04-19 12:15 ]
@Saje If Joss wants to use Tarot cards as a source for inspiration, that's fine. If others start using Tarot cards as a serious method for evaluating Joss' work, that's more of an issue for me. Should I also be open minded to an evaluation of the Venus de Milo that counts arms or is that just wrong-headed?

I dunno, i'm just really struggling to understand how more information can be a bad thing.

Paul Ballard: We split the atom, make a bomb. We come up with anything new, the first thing we do is destroy, manipulate, control. Its human nature.


If this is made into a bomb, someone is going to look very foolish. No, to be serious, it's not that this is “bad” in the evil sense, but I do feel it's taking the collective discussion of Joss' work in the wrong direction. If I walked into a code review and someone had prepared an analysis of how many times I use "i" as a for loop iterator vs other variable names, I'd have some serious questions as to why that person was in my code review in the first place and what on earth would come out of it. Likely, nothing good. And then my trusted colleague, Emmie, stands up and says, yes, this is going to be a wonderful tool for code analysis. Now, I have big concerns as to how exactly this is going to be used. It's simply not the appropriate metric for addressing the sum and substance of the thing I've created. I don't think it's wrong to voice that.
But no-one is saying it encompasses the entire sum and substance of what's been created - where does the analysis tell me where to cry at 'The Gift' or that Rob Duncan's Chosen theme is triumphant ? Of course it doesn't do that but there's a difference between "doesn't tell us everything" and "tells us nothing".

... it's not that this is “bad” in the evil sense, but I do feel it's taking the collective discussion of Joss' work in the wrong direction.

Personally I don't think it's taking the collective discussion of Joss' work in any direction at all (anymore than the various papers looking at Buffy through a Freudian or Marxist lens are) it's just providing another direction to approach it from, better at answering some questions than other methods. It's a fun, harmless diversion that might also suggest other avenues of consideration.

If I walked into a code review and someone had prepared an analysis of how many times I use "i" as a for loop iterator vs other variable names, I'd have some serious questions as to why that person was in my code review in the first place and what on earth would come out of it.

True but a closer comparison would (IMO) be to things like how many lines you use, which methods/classes/functions etc. you use, not which specific variable names you use but whether you employ a consistent naming scheme and so on. Those help indicate how verbose your code is, how easy it'll be to maintain, how comfortable you are with a particular language etc. (as it sounds like you'll already know BringItOn5x5 most programming languages allow both a generic and what might be called a more "idiomatic" solution to the same problem and which you use is indicative). Can that kind of inspection decide whether e.g. your code is elegant or ingenious, can it separate genius programmers from jobbers who can get it done and stick to the in-house procedures but not much else ? Not really (outside of some broad indicators) but that doesn't mean it's entirely useless either.

So--season six was closely behind season four as her 'best season.' And thus I disagree that she wasn't saying that much. Since she also was referenced often, perhaps it's less 'show don't tell' as 'show, and also tell'. :)

Yeah, you're right in fact WilliamTheB, from the text files I got an average for Willow of 50 lines/episode in S4 and just over 51 for S6 so even with more characters to spread the lines around, she still actually says more per episode (on average) in S6 (though on average there were ~40 more total lines/episode - ~408 vs ~368 - in S6 compared to S4 i.e. more was said in general, which probably explains the percentage being lower even though the average no. of lines was higher).
@BringItOn5x5, you say

If I walked into a code review and someone had prepared an analysis of how many times I use "i" as a for loop iterator vs other variable names, I'd have some serious questions as to why that person was in my code review in the first place and what on earth would come out of it. Likely, nothing good. And then my trusted colleague, Emmie, stands up and says, yes, this is going to be a wonderful tool for code analysis.

Is this colleague supposed to stand in for the Emmie from this thread you've been conversing with? I think it's a big leap to say that because she thinks the line counts are of interest that she would think your use of variable names is of interest.

@Saje: Ooh, thanks for that!
Consult your doctor about whether viewing statistics is right for you. Some statistics-viewers report blurred vision, pie-chart PTSD, and exploding brains. You should be aware that simple descriptive statistics such as 'frequency counts' can be subject to bias in their inclusion but the real mental-health hazards lie in their interpretation. Use with caution.
Great analysis.

Just as an example to show its usefulness:
Look at Giles and his lines per scenes. It is very high. Especially in the early seasons you can clearly see the correlation to him being the "exposition guy".

Or those same numbers for Oz in season two and three. You can see how the (already low) number went down. Obviously the writers tried to underline his stoic image.

To quote Emmie:
"The question isn't why, the question is how."

This analysis does shed light on at least a few aspects of "how".

[ edited by kurna on 2011-04-20 02:46 ]

You need to log in to be able to post comments.
About membership.



joss speaks back home back home back home back home back home