Episode 7.3 Cephalic Version, Interpreting Studies Part 1, and Jackie Kennedy

In this episode, we discuss four tips for external cephalic version. Then we dive into the basics of interpreting scientific studies. Finally, we discuss the pregnancy history of Jackie Kennedy.

00:00:20 Tips for External Cephalic Version

00:17:10 Evaluating Clinical Studies

00:53:13 Jackie Kennedy’s Impact on Cesarean Deliveries

Links Discussed

Reducing the cesarean delivery rates for breech presentations: administration of spinal anesthesia facilitates manipulation to cephalic presentation, but is it cost saving?

Eliciting primitive fetal reflexes in the intrauterine environment: A new concept to aid external cephalic version


Howard 00:18


Antonia  00:18


Howard 00:19

What are we thinking about on today’s episode?

Antonia  00:22

Well, we’re going to talk about how to interpret articles and we’ll continue that in the next episode too and we’ll plan to apply some of the things we’ve learned into just looking at some newer articles. But we’re also going to discuss four practical tips for external cephalic version, and we also have some interesting tidbits about Jackie Kennedy’s obstetric history saved up towards the end.

Howard 00:46

Tidbit seems like such an outdated word. Do you mean like gossip or tea, is it? You could be hipper.

Antonia  00:53

I think if I start to use the term tea to refer to anything besides what I’m drinking right now and I steeped beverage then it’s probably officially outdated and no one will ever want to use it and I’m younger than you.

Howard 01:07

Well, no cap.

Antonia  01:08

Well, that might be another phase that’s now outdated. I’m still not really sure what that meant, but it’s okay. Well, let’s move on. Let’s discuss our four tips for external cephalic version.

Howard 01:20

Let’s do it.

Antonia  01:21

All right. So external cephalic version it’s one of those things that comes up relatively rarely compared to many other things we do in OBGYN. That’s because the incidence of breach presentation isn’t that high. It’s about 4% at term. So even if 100% of all of our pregnant patients were always going to be amenable to ECV which they’re not, but even if they were it would only still come up 4% of the time at the very most. So what that practically means is that certain physicians really never do them, while some others might end up doing it fairly routinely and partially. This could be chance, but there’s usually some other things that play physician preference and comfort and maybe local practice patterns and how do the patients feel, how are they counseled, etc.


And ECV is just one example of a whole category of things, of conditions treated and procedures done that necessarily are lower volume because they’re related to conditions that are more bare, especially for generalists, and so it’s one of those things that is either rarely or never done by a given provider, depending on that specific setting and context.


So some other examples might include breach, extraction of a second twin or manual rotation of the fetal occiput during labor, or forceps-assisted vaginal deliveries, rotational forceps, cervical cerclages amniocentesis, possibly even vaginal hysterectomy in some cases, and these are all examples of things that in the right patients and for the right indications they have really good safety and outcome evidence. I’d say an amniocentesis for a generalist would be maybe in the context of a fetal demise and it’s getting a sample for that and sometimes for antenatal diagnosis too if they don’t have access to an MFM. So especially vaginal hysterectomy and the other obstetric things I listed, they are recommended by our guidelines in professional societies but they might be more difficult to perform and may be perceived as risky by the physician, likely because the opportunities to do them and to maintain the skill sets are rare. But ECV external cephalic version is successful roughly two-thirds of the time and it saves most women who have it done an unnecessary cesarean and really is a fairly safe procedure. But I think because it’s so rare it’s something that a lot of people struggle with.

Howard 04:03

In. A lot of the decisions we make clinically are influenced by internal factors that are emotional or psychological. So if something doesn’t reimburse well and it has a bit of a learning curve and it makes you feel bad if you’re unsuccessful, it makes you feel like a failure and it takes some deliberate practice and work to try to be good at it and maintain your skill set in it. And if it’s not absolutely necessary that you do it because there’s another way to do it or deliver the baby in this case, then yeah, I think a lot of folks will just take the path of least resistance and give it up over time if they’ve struggled, in particular in favor of what they think is safest in their hands. And that’s the things you mentioned.


Ecv is easily replaced by cesarean delivery and yes, the cesarean delivery is more dangerous for the mother, but not in a way that you would really perceive that on an individual patient level. Ecv doesn’t reimburse well and when you fail at it, the patient and the nursing staff and everybody else knows that you failed at it and it doesn’t feel great emotionally. There’s actually a study from Israel where 98% of breach pregnancies were delivered surgically and the ECV rates were very low there and they did a cost analysis and cesarean for breach was 11 times as expensive as an external cephalic version and twice as expensive as a vaginal delivery following a successful ECV, even if you did a spinal at the time of it. So it is the right thing to do. It does save money, it is better outcomes, but all those other factors are headwinds against people doing it.

Antonia  05:31

Well, we’ve already discussed previously about the safety and efficacy profile of ECV, so let’s try to make it easier and take away at least that as a negative incentive. It’s not a more dangerous thing to do than a C-section. It is the right thing to offer patients that have a noncephalic baby in most cases if they don’t have any contraindications to vaginal delivery. So most patients should be getting it done. So let’s go through our four tips. The first tip is going to be maternal relaxation.

Howard 06:03

Right. So it’s definitely true that uterine relaxation helps this excess rate of external cephalic version. I say uterine, but we should also include the abdominal wall. This practically means giving a dose of terbutaline, which has the best literature in terms of an agent to support it, and you might also give a narcotic or an anxiolytic intravenously at the same time, remembering that the maternal abdominal wall is also something that fights against us.


But relaxing the uterus with terbutaline nearly doubles the rate of success, at least in the best large randomized controlled trial, and data from other studies shows that it’s effective even if the patient’s already in labor, and routine use is associated with a higher successful vaginal delivery rate, which implies a higher rate of successful version and a lower complication rate. So it’s a must. There is no good data that calcium channel blockers do the same thing, nor the nitrous donors seem to benefit, and there’s really no good data either that a spinal or epidural is worthwhile for external cephalic version. The use of regional anesthesia should probably be limited to women who are already in labor and want an epidural anyway, or to people who perhaps have had a prior attempt at ECV that failed and really want to do another attempt, especially if that prior attempt failed due to internal discomfort, causing the procedure to be ended short or something like that. But overall the literature doesn’t support what I think is becoming an increasingly common practice of getting an epidural or a spinal for ECV attempts.

Antonia  07:29

Yeah, and I’ve been at least in the past. I’ve had the impression that it made a big difference on the abdominal wall relaxation. So in the past I’ve recommended it more strongly, but more recently I’ve been seeing that, yeah, it seems to work just as well with the routine practice of not doing regional anesthesia. So just a good thing for me to be updated on as well. And it’s also not an issue of cost to omit the regional. That study from Israel found that even if you did use a spinal at the time of ECB, it’s still cost effective. So either way, it’s cost effective whether or not you use it. And it turns out you really don’t get a huge benefit in terms of success from using it. So the next tip is create space.

Howard 08:14

Right. So I’m sure everyone listening at least Susan Obajoyen, knows that the general idea of how you do an external cephalic version. So use some oil or lubricant to reduce friction on the skin, use an ultrasound to figure out how the baby’s lying, and with your ultrasound you can periodically, or in some cases continuously, check on the fetal heart rate and you try to rotate the baby, usually in a forward roll, and if that doesn’t work and you’ve tried a couple of times then maybe you can rotate the other way and do a backflip and, practically speaking, you need to dislodge the fetal breach from the pelvis by pushing it upwards and laterally and then you try to direct the fetal head forwards in the direction that it’s looking, the direction that the fetus is looking if you’re doing a forward roll and then, while pushing the breach up, you guide the head down. So it sounds easy and straightforward and there’s not much more to it than to say all that, but the idea about this tip is that creating space is really important. So realize that the lower part of the uterus going down into the pelvis is the most narrow portion of the cavity and in this case it’s usually filled with the breach.


When we start this process.


That breach needs to be elevated up and out in order to be successful and you have to create space for the fetal head to go down there.


So I think a mistake that’s often made is worrying about the top the head too soon. Now almost all the force and activity that you’re initially using should be directed at the bottom, elevating the breach upwards and slightly laterally towards the back or the opposite side that you want to roll from, and there’s really not much point in doing anything else until that breach is up there. So any pushing on the top, even though you might be pushing the head to the side, can potentially have a negative effect on making it harder for you to elevate the breach. You’re fighting yourself. The fetus will almost turn by itself if you can get the breach elevated successfully out of the maternal pelvis, and too much time and effort, I think, is spent with a hand at the top or a colleague’s hand at the top or something like that, trying to guide the head in the initial part of this procedure, and that can create a situation where you’re fighting yourself.

Antonia  10:19

Yeah, I’ve definitely seen that play out. We all know that we need to elevate the breach, but it is a balance of power. There’s some literature that talks about 70-30, where, like 70% of your effort and force are on the bottom while 30% are on the top. But even that might be too much directed towards the baby’s head. Initially you really just want to gently ease it forward, just so that the head is flexed or the chin is flexed, and that really doesn’t take almost any power at all, and then you put almost all the pressure and force on the bottom so that it creates an opportunity, kind of creates a void for the head to just naturally, almost for the baby to put their own head forward. So yeah, it’s just a little bit counterintuitive. I think, especially earlier trainees, I really would see them wanting to just yeah.


Put both hands and brace themselves against the floor and just push with all their might.

Howard 11:15

It’s a more finesse technique than people realize, yeah yeah, Okay.

Antonia  11:18

so the next tip has to do with either diamonds or baseball, depending on your preference. So it’s meant to be a visual. So why don’t you explain that one a little bit?

Howard 11:28

Well. So we think about the uterine cavity in geometric terms sometimes and we typically think about it as an inverted triangle. So people will say that the, if you think about it in that sense, the top part is the wide part. It narrows down to the bottom and it’s an inverted triangle. And they’re just trying to make the point that the bottom is narrow and you have more space at the top. And at that top part of the triangle, the inverted triangle, the head can bobble side to side where there’s space to do that. But I’d encourage you to think instead of the shape as a diamond. So we can use a baseball analogy here.


If home plate is a lower uterine segment where the breach is located, now imagine that the head is up at the top, at second base, and it’s looking down towards third base. So, to continue with what we were saying, you want to push the head gently and roll it towards third base, just causing the head to be flexed downward towards the chin, and that opens up first base for the breach to go from home plate to first and just leave the head on third base. You’re not trying to do any more than that and, as you said, it doesn’t take much force to go from second to third base and just flexing that head. The head doesn’t need to advance past that point until the breach has reached first base.


When versions are successful, you’ll have a moment where you realize that the fetus is basically transverse. Once you have a player on first and third, then you can run at the same time. And so now that’s when people are doing that 70-30, I think, where you put some downward pressure on the head, maybe 30% of your force, while simultaneously putting pressure on the breach to go on up to second base and the head to home. That’s the simultaneous part and it’s the easy part of a version. If you’ve ever a baby that was already transverse, you know that that’s just does itself. Basically, the hard part is getting the breach to first base, and the point about the previous tip is that as long as the head is on third base, then you’ve got space at second base and you don’t need to worry about doing anything with the head until the breach is up there to first base.

Antonia  13:27

Yeah, I can see why this analogy would be a really useful teaching technique. I’m not even that it’s not that easy for me to visualize things, but this one’s pretty easy for me even need to imagine. And it does also describe the fact that there’s like a diamond shape internally in the uterine cavity at least of a full term gravid, normal uterus. So it helps you understand where the spaces are and where you need to move things. So I guess the last tip we want to have four tips, that’s our thing. So the last tip is less is more right.

Howard 14:01

Right. And again, the big lesson I think that’s difficult to learn is that you actually don’t need that much force and power. So pressing too hard can make the uterus tense up and then you’re fighting yourself. Putting too much pressure at the top, as we said, fights against yourself as you elevate the breach. Using too much force makes them other uncomfortable and then she contracts her abdominal muscles against you and she’s fighting against you with her abdominal muscles. So this technique, as I said, is more subtle than I think people realize. It’s more nudging in the first part than it is pushing.


I think a lot of folks have figured this out over time and people who are good at versions. You don’t see a lot of force being used. I’ll put a link to an article by a researcher in Germany, an obstetrician in Germany at a charity hospital, who’s actually shown that you can elicit the fetal gallant reflex and the stepping or what we call walking reflex with more gentle movements and even just stroking the fetal back. And they’ve put an ultrasound and shown that you, with gentle movements and doing this more gentle technique, you can have these reflexes. You can have the fetus helping you. These primitive reflexes will put the fetus into a more favorable position and curve the back and bring the head down and then actually pull the legs up with the stepping reflex. And all of these subtle things go away when you use too much force and overwhelm the fetus with pressure. So it’s a subtle technique and less is more.

Antonia  15:21

Yeah, so, as we said, it’s about two thirds of the time this procedure is successful, which does mean that about a third of the time, for whatever reason, either the fetus’s breech is firmly stuck there, maybe due to the baby’s anatomical factors, or the mother’s, maybe the tone even of the baby’s tone uterine tone and probably at least some of the time it does have to do with the provider’s technique. Maybe they’re having a difficult time, maybe the maternal habitus as well might get in the way of getting just the right angle to actually get the breech elevated. I don’t know that that rate can be purely explained by a third of the time the provider just doesn’t know how to do it. But maybe to some extent there is some technique that plays a role. But in those maybe a third of the time cases where it’s not going to go, it does get very demoralizing to first push a little bit and then push a little bit more and the breech isn’t going anywhere. It can be easy to just start pushing as hard as basically the mother will allow or the fetus will allow and be sweating and still get nowhere. But yeah, we just need to appreciate that pushing harder and harder and harder, we’re probably making it even less and less likely to succeed the harder we push. I’ve definitely, thankfully, had wonderful successes and seen my colleagues also have really great successes using very minimal force. In those cases, the bonus is that the patient is comfortable and will say something like, oh, that was a lot easier than I thought, and then the nurses and everyone is happy. We should talk about those fetal reflexes maybe some other time.


Get really into it, that could be fun, all right, so we’ll summarize these four tips on our Instagram with a nice little graphic. So just look out for that. And now we’ll get into what really will be part one in this episode of interpreting journal articles, and then the next time we’ll take this part through some articles just to show it in action, so to speak. We tend to spend a lot of time on this podcast dissecting literature and articles to try to understand if their findings are legitimate and whether or not we should change our practices based on their findings. But in this episode and next time we want to break down that process more explicitly of how we try to, how we try to judge articles, and obviously there’s a lot of articles we never even mentioned that are published every month. So we’re we’re reading through all of this stuff every time, just constantly, and when we’re thinking about what to discuss, we need to be able to filter through a huge number of things to decide what’s impactful, what’s relevant, what’s quality, and then what isn’t.

Howard 18:06

Yeah, I think in general, this is one of the most valuable skills any physician can have. You probably shouldn’t even be reading new literature in our journals if you don’t have some sense of how to do it well. And of course, we also talk about articles that are low quality on here, sometimes because they’re getting a lot of publicity or a lot of promotion or they just need some context and analysis to help us from making bad decisions. But this is an incredibly valuable skill for physicians.

Antonia  18:32

Yeah, and unfortunately most new published articles are not high quality and will not be validated if they’re repeated in the future. A lot of them probably are never even attempted for replication, but if they are, a lot of them will fail that. So the challenge is always how to read this month’s journal and understand whether the article or study you’re reading about should change your practice or even just change how you think about that specific topic.

Howard 19:01

I feel like I’ve written a book about this.

Antonia  19:02

You might have actually, but when we started this podcast we did want to spend more time talking about some of those things in your book, and one of the most important parts of the clinical reasoning book is how to interpret studies. It’s really the foundation for what we do. We don’t primarily just make up stuff out of the blue. We rely on evidence and what’s been shown to be true in studies, so we have to trust those studies are done well and ironically, now, several, several seasons into this podcast, we really haven’t spent any substantial time talking about the material in your books. So we’re gonna rectify that a little bit. First will talk about interpreting scientific studies today and the next time, and then you also have a book on vaginal hysterectomy, so we’re also gonna talk about that.

Howard 19:50

Well, a lot of us obviously have a background of some sort in interpreting studies and that’s the purpose of journal clubs and article presentations and medical school and residency but what I find is that most people aren’t great at it. Most med students and residents and therefore attending so used to be med students and residents have not received any formal training on how to do this. They pick up a few things through osmosis, they sometimes perpetuate misassumptions and that means that they can sometimes be easily deceived by low quality studies.

Antonia  20:20

Yeah, honestly, I’ve heard about study or little courses of how to interpret studies really done in fellowships that generalists just miss out on because it’s just not offered to them or they it’s not incentivized to do that. So this will not be a formal course, but hopefully it’ll be better than nothing for whoever’s listening. So we can get into lots of detail. But let’s just we’re gonna at least start with an overview here and then, like I said, practice this overview on a few articles next time and hopefully will at least peak interest, if nothing else.


Okay, so there’s a couple things you talk about quite a lot and one is a sort of a process. You have this five step process for interpreting an article, and then the other thing you talk about a lot is the idea of how to adopt a new intervention or a new finding into your practice. So we’ll try to cover both of those, at least broadly speaking, and then maybe we’ll give some examples with some recent articles. So take us through your five step process for interpreting an article, because this is a large section of your clinical reasoning book. So I think the first step is how good is the study?

Howard 21:35

Yeah, and I think that this step is the part that a lot of folks focus on in a journal club, and there’s a ton of stuff to know even to be able to do this.


But this is just answering the question of whether the study design and the methodology and the procedures used in the experiment has generated a reliable result or were appropriate.


So, as you said, most published research findings are eventually falsified and most findings are not replicated over time, and there’s a lot of medical reversal and there’s a lot of ability by study authors to, either intentionally or unintentionally, manipulate the design of a study or the statistical analysis or tools used, through things like which inclusion and exclusion criteria they choose, starting and stopping points of the study, the way in which participants are blinded, and a ton of things that we call research or degrees of freedom that can make a study that is not just a study but a study. So the key here is to remember that about 80% of published findings will eventually be falsified, even though they look pretty good. When they’re published, they’ve gotten through some kind of peer review and been selected by some journal. So you have to look well beyond the abstract and ask some basic questions. So, in a typical journal club format you might go through something called the.

Antonia  22:43

I am already approach. So that stands for introduction, methods, results, discussion, and then each of those things are just separately covered. So what are some highlights here? Well, in broad strokes, in the introduction, I think about the quality of the journal that the paper appears in.

Howard 23:00

I think about stated conflicts of interest for the authors and who might have funded the research or paid for the paper.

Antonia  23:06

Yeah, and usually a journal will recognize this is important to address any possible bias, so they’ll publish financial relationships or conflicts of interest that are present at the time of publication.

Howard 23:18

Right, but many times authors will be paid by a company after the paper is published, so they can avoid stating that there’s a conflict of interest.


So one of the things I also commonly do is look up authors on openpayments.org to see if they’ve received monies from a drug company or a device manufacturer right after the paper was published, and this actually happens a lot. For example, there’s a randomized control trial of the fetal pillow which showed a benefit of faster delivery time by a few seconds compared to no intervention or not inflation of the pillow, and the paper declares no conflicts of interest. But the lead author did receive money from the device maker in the month or two after the paper was published. I’ll also sometimes just do a Google search of the author and the drug or product to see if, after the paper was published, they’ve been doing talks for the company, and this is helpful too for a lot of European authors who are not in our open payments database in the United States. Yeah, I actually didn’t know that website was a thing, so that’s helpful knowledge. Everyone’s going there right now to look themselves up.

Antonia  24:14

We know that when papers are funded by or influenced, by drug companies or device manufacturers.


They’re not going to be able to do that Are funded by or influenced by drug companies or device manufacturers, they will typically show double the effect size compared to studies on the same thing that are not funded by those companies. So there is a lot of financial incentive for authors. When they’re being paid by someone who wants the device to show benefit, there’s incentive for the authors to make it look like there’s as good of a benefit as possible, or even some benefit when there actually is no benefit. Yeah, definitely.

Howard 24:51

Well then, I try to understand what specific question the researchers are trying to answer. What is their null hypothesis? What is the primary outcome and secondary outcomes that they’re interested in? I’m also curious about what’s already known about the topic. You have to be very careful not to interpret any study in a silo by itself, but realize that any new study may often stand in stark contrast to other studies that already exist, and in many cases the older studies might actually be higher quality or better studies. So you have to do a general literature search to even understand what’s going on, and it’s not enough just to read the introduction of the paper. Where the authors done that. Authors will often exclude prior studies or data that tends to go against their findings or their hypothesis. So you have to do this work yourself.

Antonia  25:35

It’s also useful sometimes to put the article in PubMed and see what related articles come up. As a way of starting that search, and sometimes, like you said, just doing a Google search of the article is helpful. You may find people who have commented about issues that you were unaware of. You might find some nice letters to the editor or criticisms of it that that you might not have thought of yourself. Or in some cases you might even find that papers have already been retracted some years after they were published. So you mentioned the fetal pillow, for example, and one of the most significant studies on the fetal pillow was retracted at the end of last year. And so if you didn’t specifically look for that you you wouldn’t necessarily know. You also wouldn’t know that meta-analyses based on that paper that was then later retracted have now become irrelevant, unless you look for those papers individually.

Howard 26:27

Well, okay, well, that gets into the methods portion of the paper, and there I like to think about how would I design the ideal study to answer the question that the authors are asking and then compare that to what the authors actually did and try to make some sense of that.


So I looked at a paper recently that tried to see if osteopathic manipulation had a beneficial effect on labor outcomes. So the study authors might want to know if patients who received OMT have quicker labors or perhaps a lower cesarean delivery rate or something like that. So before I think about how they actually did their study and be biased by that, I might ask well, what would be the ideal way of figuring out if adding OMT during labor would benefit women? The ideal way in my mind would be to add OMT versus some sham procedure, an attempt at a placebo, two patients in labor, assign them in a randomized way, try to blind as many people as possible as to whether the patient received real OMT or a sham process, and then try to control and keep everything else that I possibly can control the same between the two groups and see what happens.

Antonia  27:38

The randomized placebo controlled triple blinded child is always the most ideal for controlling for known and unknown variables whenever that’s possible. And sometimes it’s just not possible to do a study in that way. But if it is, that really should be the preferred study design for most questions. So in that case you’d want the doctors and nurses taking care of the patients to be the same. You want the same labor protocols, same oxytocin protocols, you want the same planned rupture of membrane timing and things like that. You want to make sure the patients had initial cervical exams or bishop scores that were similar, just everything you can do to make the two groups exactly identical, except for the addition of omt in one group. And then you would even do your best to provide it some kind of sham or placebo group if you will like I don’t know, maybe just Massage or yeah, rake here which can be hard, yeah, hard, yeah.

Howard 28:40

So I think about all that. I think about what is the best way to do this study. Now. In some cases, as you said, you can’t always do the best study, and this happens a lot in obstetrics. We have ethical constraints on randomizing patients to potentially harmful drugs or interventions, especially while they’re pregnant, and we also have to ethically not withhold care that we know works, so that there’s all these sort of ethical and pragmatic reasons, expense reasons, timing reasons why you can’t always do the very best study, and it’s important to think about that and consider why the authors made the choices they did. That you could be legitimate right now. I then compare what the ideal study design would be to what the authors actually did and the paper I just mentioned about omt. They had different groups of obstetricians managing different sets of patients. There was no randomization, there was no placebo control, there was no standardization of the patients based upon Cervical exam or pretty much any of the things that we listed okay, yeah, and so for omt in labor.

Antonia  29:40

It seems like there’s no ethical or pragmatic reason to avoid controlling for those Things. So right off the bat you get the impression that this is not a good study and there may be quite a significant degree of bias in the way it was designed.

Howard 29:55

Exactly so what you’re doing is you’re looking for bias that’s not obvious through. Those sorts of choices and those sorts of issues are why the study were replicated. In my find, a different finding. The study I’m talking about has not been replicated, so the findings of it should be taken and viewed with the grain of salt or really with extreme caution. And the differences that they found in outcomes are likely due to chance alone, or in other words, a type one error.

Antonia  30:20

Yeah, and in the same way, you have to think about which patients they chose to include or exclude, and there could be good reasons for including certain patients, excluding others, or it may just be that the authors focused on criteria that would give them the highest chance of giving them the outcome they were looking for.

Howard 30:40

Yes, it’s been shown that more often than not, by manipulating just inclusion, exclusion criteria and starting and stopping points, you can take random data sets and make them appear statistically significant. And again there might be reasons why you chose the exclusion criteria and inclusion criteria. I don’t know what the idea is, but just remember that it’s a powerful thing. Inclusion and exclusion criteria can completely change the findings of a paper and they also are relevant as to whether or not the findings of the paper are clinically appropriate to your patients that you’re taking care of.

Antonia  31:11

Yeah, and all of those things are related to what variables they controlled for. So it’s a good kind of mental exercise to think about what did they not control, for there’s always uncontrolled variables, or so called working variables, so that’s why randomization is important.

Howard 31:28

Yes, and a lot of times the variables that are controlled for in a study are rather just convenient or there are available things like demographic info that was in the chart but they’re not always important, say genetic variation or things that are not in the chart. So once you get outside of a randomized controlled trial, the reproducibility of a study drops off dramatically because it has to do with how many of those things you’ve controlled for. Randomizing controls for things you don’t know you need to control for Okay, the other big question I look for in this section is whether or not a power analysis was done. A power analysis is absolutely essential to making sure that the proper number of patients were enrolled in the study. If you enroll too few patients, then you risk getting a type 2 error, but you also, if you go on and enroll too many patients, you risk getting a type 1 error.

Antonia  32:15

Yeah, and I think that’s not always intuitive, because we can easily assume that the larger number of patients there are in a study, the better it must be, like surely a thousand subjects must be better than 200 subjects. But that’s not true.

Howard 32:31

Yeah, I think it’s one of those things we learn casually. We don’t study this systematically, but as you increase in the number of subjects, the p value will decrease. It’s just a mathematical fact. So the easiest way to make something appear to have a statistically significant p value is to over enroll, and this happens quite a bit. The other part of a power analysis is determining, before you do the study, what variables and outcomes you’re interested in and what magnitude of effect is important to you. A lot of studies find statistically significant differences in outcomes, but the actual difference is so small that it’s an irrelevant difference in clinical practice. A power analysis helps guard against finding clinically insignificant but perhaps statistically significant findings. If you’re reading a paper that doesn’t have a power analysis and it is a randomized controlled trial, then you should think of it as a preliminary paper that could be used to inform a future power analysis. And almost never should you take the findings from that paper and allow it to influence your clinical practice.

Antonia  33:31

Well, that’s probably a lot of papers then. That should not influence clinical practice and, frankly, a large portion of published papers are not even randomized controlled trials. They might be retrospective papers or even qualitative, which that has its own value in its own way, but for this segment right now we’re just really talking about randomized controlled trials.

Howard 33:53

Exactly and of course, we have levels of evidence and appreciate that everything in that area is level three or lower. When we talk about retrospective data, please just remember that correlation doesn’t equal causation, and retrospective or case control type studies are among the lower or lowest quality papers that get published in terms of their reproducibility. Now more advanced questions for the method section here have to do with whether or not the statistical methods used were appropriate. But honestly, unless you do have advanced training, you’re probably not going to do too well with answering that. Those sorts of questions.

Antonia  34:27

Usually in good publications there’s a statistician that’s been involved in the study process and you would need to ask a statistician whether those statistical methods that they list are appropriate or not, or and they might not even be able to explain to you in a way that you understand what they mean, or maybe not you specifically, but me, for example. So I know that in my journal clubs that I’ve done in the past we would often try to dig into what a specific statistical term meant that was listed in the methods. And especially if it’s a room full of stressed out, sleep deprived doctors that are there because it’s a requirement, that part would just be fairly just abstract and difficult to grasp and it would just make people’s heads spin and they would just want to get to what’s the bottom line? Do I need to do?

Howard 35:17

I need to do this frankly, it forces people not to do the work and, as you said, just tell me the bottom line and the abstract. And they don’t do any work. So Don’t try to get into things you don’t understand yeah, yeah, okay, well, let’s get.

Antonia  35:31

So we’re still still talking about is the paper good or not? The first step. So let’s get to the results section of the paper.

Howard 35:37

Okay, and things to look for here would be to determine whether the patient populations in the two groups were matched appropriately, how the authors accounted for relevant confounders and then just what did the study find and what was that finding statistically significant? And make sure you distinguish between the primary outcome the study was designed to look at and findings from subset analyses or secondary outcomes, because the statistics of the primary outcome, around which the power analysis hinges, are not often relevant to the subset analysis. So keep in mind that findings from a subset analysis Generally should not change clinical practice. Their preliminary and also subset analyses are vulnerable to the multiple comparators problem and this is rarely accounted for in papers, unfortunately. I think it’s one of the most common issues that you can see in a typical good journal.

Antonia  36:28

Yeah, let’s hit on that one for a moment. I know we’ve talked briefly about it in prior episodes where it’s come up, but Just explain again what you mean by the multiple comparators problem.

Howard 36:39

Let’s say we did a study comparing vaginal hysterectomy and laparoscopic hysterectomy.


If it were a proper randomized controlled trial, we would have to agree upon a primary outcome and then design a power analysis to make sure that we enroll the proper number of patients not too many, not too few to see a difference in the primary outcome. And let’s say our primary outcome was linked to stay. So we would have to decide how much of a difference in link to stay was clinically relevant. Let’s say we decided that a half a day was relevant, so a difference of 12 hours or more. So we use this information to make a power analysis to enroll the proper number of patients to find a difference that we think is clinically important. And we usually use some assumptions from prior preliminary data to figure that out. But of course, along the way our resident or med student who’s helping us with this project collects a whole bunch of other data about the hysterectomies. They collect information on estimated blood loss link to surgery, pain score data, pre and postoperative hemoglobins, amount of narcotics administered in the first 24, 48, 72 hours after surgery, whether there were bladder or ureter injuries, the cost of the procedure, time to return to work and everything else you can imagine 50 other things. So let’s say that we randomize our 100 patients to vaginal hysterectomy versus laparoscopic hysterectomy and in the end we find that there’s no difference in the link to stay. All the patients went home on average about the same day and we don’t find that difference. We thought we were going to find that was our primary outcome, but then Hope Springs Eternal.


In our subset analyses we’ve collected all these other things.


So we picked 10 of them and maybe we did that before we collected data, maybe we didn’t, maybe we did it afterwards that matters and we do statistical analysis on those and we publish them in a table of secondary outcomes.


And when we do that we find let’s just say, for example, that vaginal hysterectomy had a shorter link to surgery and less blood loss and quicker return to work, just for example. Well, the multiple comparatives problem comes in with this sort of analysis because essentially we’re repeating an experiment multiple times, testing multiple different hypotheses each time. One hypothesis was that vaginal hysterectomy had a shorter link to stay, another was that it’s associated with less blood loss, another is that it’s associated with a quicker return to work, another one is that it costs less, et cetera. We’re running multiple comparisons and if we want to keep our type one error rate for our study at 5%, we have to account for the fact that we’re testing multiple different hypotheses because each hypothesis, statistically, when we implement a null hypothesis test that generates a p-value, well, there’s an independent 5% rate of returning a p-value of less than 0.05 for each of those individual comparators.

Antonia  39:24

And just as a reminder, type one error is finding a relationship that doesn’t actually exist in reality, while the type two error is not finding a relationship that actually does exist in reality. So if the accepted level of significance in your experiment is a p-value less than 0.05, that means you’re accepting a 5% chance of falsely discovering something that isn’t actually there. Then that’s for one experiment. So if you do one experiment, you have the 5% chance of the type one error. So if you just look at the length of stay, 5% chance that you find a difference, even if there isn’t one. But then if you do 10 experiments that are independent, unrelated, then you have a 40% chance of finding some false finding that isn’t there. And then if you do 20 experiments, so basically you’re testing 20 different hypotheses and you have a 64% chance of finding at least one positive. That is truly false.

Howard 40:27

Yeah, you did better than I did. Yeah, exactly. So maybe I find no difference between vaginal hysterectomy and laparoscopic hysterectomy in the area I investigated, say less operative time. Now the thing is I have to account for the fact that I’ve tested so many different hypotheses in this subset analysis at once, and the simplest way of doing that although not always the correct way is to take my p-value that I’m interested in as being significant usually in biological sciences, 0.05, and then divide that by the number of hypotheses that were tested, so let’s say 10. And then the new p-value of 0.005 would be the p-value that I could claim statistical significance for, while still protecting the overall type one error rate for my study and experiment at 5%.

Antonia  41:14

Yeah, it’s almost another way to look at it would be like you’re instead you’re creating a composite outcome, but you’ve only done the power analysis for one part of that composite, which is not a valid way to look at things. So you would be looking for a certain magnitude of benefit in any number of this big bucket of different outcomes. But papers don’t do this adjustment in the p-values. They’ll usually flag anything that individually is under 0.05 in their little table of their secondary outcomes and then claim it’s a true positive Right think about that next time you look at one of these tables.

Howard 41:55

There are also plenty of papers that do have true findings in there and you can see these p-values that are less than 0.0001. It would still be valid if they did the correct approach and that’s okay. That’s the way it’s supposed to work. But these corrections for multiple comparators are not often done and claims of a finding in a subset analysis, particularly that may suffer from the multiple comparators problem, are often used to make a paper that didn’t find a difference in the primary outcome All of a sudden have a positive finding and we have a bias in our publishing, literature publishing arena towards publishing positive findings, not negative ones, and that’s a problem.


Yeah, authors are incentivized to find something significant and readers want to read about significant things and the NIH wants to pay for authors who find positive things and drug companies want to sell products with p-values under 0.05, so we’re definitely incentivized. Okay and otherwise. In the results section I also want to think about the certainty of the finding and the magnitude of effect of the finding. And then in the discussion section I want to see if what the authors claim in their discussion is consistent with what they actually found. So commonly authors will go off on a tangent, talking about an alternative hypothesis or a theory that they have, but the findings of the paper itself don’t have anything to do with that alternative hypothesis or their theory.


The study just tends to not support the null hypothesis. In other words, finding that there is a statistical association between some outcome and some intervention doesn’t always mean that the intervention caused that outcome. Now it likely does in a true randomized controlled trial. But this is more important to think about with retrospective studies or population studies that find an association between a couple of variables and then the authors imagine a hypothesis that follows from that and that’s a slippery slope. Today there was in the email blast from ACOG there was information about a slight uptick in the preterm birth rate and that the CNN journalist interviewed several docs around the country to ask about it and a hypothesis is just coming from all different directions, but none of those hypothesis are supported by merely observing a statistically increased preterm birth rate. Those are called narrative fallacies and they’re dangerous.

Antonia  44:10

Okay, well, there’s a lot more to talk about in terms of study design and things like that, but this is in summary. This is just step one of your process and this obviously can take up the whole journal club, and it normally does, but it’s determining whether the paper is good. Is it a good study design, good methodology? Is the discussion relevant to what they found? And yeah, this is usually just the sole focus of journal clubs. But let’s move on to step two, and that is asking what is the probability that the discovered association is true?

Howard 44:45

Right. So once we’ve decided that the paper was pretty good if it passed muster there and it seems to be a true association between the variable study, the next question is that association that was discovered true, given everything we know about the field? Was it a true discovery or false discovery? And the p-value doesn’t tell us that. So this gets into base theorem and learning to interpret the paper in the context of all that’s known about the subject matter. So if a new paper were published today that said that smoking cigarettes doesn’t cause lung cancer, the paper itself may be high quality and the p-value and the association in the study may be legitimate.


Remember that there’s a 5% chance that even an untrue hypothesis will generate a p-value by accident less than 0.05, even if there’s no trickery, most of these studies get published.


But the short answer here is that base theorem would tell us to look at what all’s known about smoking and lung cancer and determine the pre-test or pre-experimental probability of the hypothesis. And then we would learn to discount this new paper as likely untrue or false discovery, because the pre-test probability that smoking doesn’t cause lung cancer is a tiny, tiny number. But this gets to the core of the so-called reproducibility crisis where so many scientific papers eventually fail, replication and the phenomenon of medical reversal, where things that we’ve done for years are found later not to work. You cannot interpret any paper in isolation. You have to look at what all’s known about the subject, because the p-value doesn’t tell us by itself whether the association discovered is likely to be true in a bigger ontologic sense. You have to use base theorem for that, and that means you have to determine the probability that the hypothesis was true based upon everything we know before this new evidence in this experiment came into being.

Antonia  46:29

Yeah, and I would say people should read your book about that issue in particular. But the smoking hypothetical is a pretty good one. So of course, in real life we’re so certain that smoking causes lung cancer, based on what has already been observed and studied, that we wouldn’t want to waste the time and money to make a new high quality study testing whether it causes lung cancer or not. Inherently, you know that if a new paper says smoking doesn’t cause lung cancer, even if it’s in the New England Journal or wherever, you would suspect there’s got to be an issue either with the quality of paper or just with the finding being some freak chance, accident, finding that just statistical anomaly. And I think most people assume that the quality of the paper is the problem.


But it may not be in every case. It may have been the most perfectly designed and conducted study that anyone has ever done, but random chance has still led to an apparent statistical association or a lack of one. We guard against that by understanding that in this example, decades of science has told us that smoking causes lung cancer. We can say that that’s a pretty undeniable fact at this point and that example is only good because of how much we know about that association. But let’s say, another new paper comes along and it’s about some chemical causing birth defects, or at least addressing that question. That’s going to be harder because we don’t have that intuition or that scientific background to know beforehand, does it or not, and so then we’re really just left with looking at the quality and then also looking at chance.

Howard 48:14

And that’s why we need replication in that situation to see if that effect stands up. Over time we build that body of knowledge.

Antonia  48:21

Yeah, so basically, if something hasn’t been replicated, then Fuel it with caution.


Yes, Okay. So step three ask what other hypotheses might explain the findings besides what they’ve talked about or tested. So in other words, let’s say we’ve decided this is a good paper and we’ve decided, thinking about Bayes’ theorem, that the association likely is true based on what we know about the field already. That still doesn’t mean that there might not be some other hypothesis that also could explain the data. So an example I just. This is just something I made up. So let’s say there’s a new medication and it’s found to be associated with birth defects. In a study we could conclude that the medication directly caused the birth defects. But maybe an alternative explanation is the underlying medical condition that that medication is treating could actually be what causes the birth defect. Or maybe not even the medication or the condition itself, but maybe a treatable side effect of that medication is what caused the birth defect. So that one study alone is likely not going to account for every possible mechanism of causation or even just association.

Howard 49:39

And this again is just the reminder that correlation doesn’t equal causation. We all know it, we all say it all the time we have t-shirts I’m assuming you all have t-shirts but then we all make the mistake of assuming that correlation equals causation almost every day in every new paper that comes out, and we don’t have to be the dead horse about this here. But the truth is correlation almost never equals causation. So again, this is less of a problem with randomized, controlled trials, where you’ve done your best to only change that one variable between the two groups in a prospective and randomized and blinded manner, but in a retrospective data or observational data. This is the utmost importance.


We discussed recently on the podcast studies that look at caffeine consumption in the first trimester in an increased risk of miscarriage, and okay, so even if that correlation is true, it doesn’t mean that caffeine causes miscarriage. And we discussed a paper that indicated that in fact there is a common set of genes that seem to explain both why some people may have an increased rate of miscarriage and the propensity to drink more caffeine or smoke more cigarettes, in fact. So the association can be true, but there could be a different hypothesis that explains it. So we have to appreciate that limitation Correlation does not equal causation.

Antonia  50:50

Okay, step four is the magnitude of the discovered effect clinically significant.

Howard 50:56

Right and we’ve talked about this and we don’t have to spend too much time on that today. But the magnitude of effect is something you have to consider. An author should consider what magnitude of effect is important before you even do the study and the finding of, say, hemoglobin A1C that’s one tenth of a point lower with your intervention. Is that clinically significant for the patient, especially if that intervention is very expensive or has significant side effects or unintended consequences? So there are definitely a lot of things that are in fact, statistically significant but the magnitude of effect is so minimal or the cost or expense or side effect profile so high that it’s not worth using that intervention.

Antonia  51:34

Alright, and question five what is the cost of adopting the intervention into practice? Let’s say, all of the other stuff checks out.

Howard 51:43


Antonia  51:44

And this is a true finding, clinically significant. What would it take? Is it prohibitive to still adopt this or not?

Howard 51:51

Yeah, and the two questions are related. But I would always want to see those last two questions addressed, certainly at a journal club. It’d be nice if authors did it in their paper. And cost is not just economic here, it’s also unintended consequences or side effects or other implications of using the intervention. So we have to think a lot about what outcome has been selected and does it represent what the patient wants or needs? For example, a drug that lowers blood pressure but doesn’t prevent morbidity or mortality related to heart disease or renal disease probably isn’t worth taking. We’re just making a number different, but we’re not actually affecting a patient oriented outcome that they care about.

Antonia  52:28

Okay, so that was the five step process. So in the next episode let’s talk about some more, some actual studies as examples and some types of trials, and we’ll go through these questions briefly with them. We’re not going to try to get too much into the weeds, especially with the p-value hacking or multiple comparator problems or other statistical methodologies, but we’ll just look at a few studies, especially some newer, some current relevant interventions and medications out there and that we see in OBGYN. We’ll ask these, we’ll go through this five step process and we’ll try to understand whether or not we should adopt whatever they’re, concluding that we should into our practice. So we’ll do that in the next episode. We’re getting low on time today so we still wanted to talk about Jackie Kennedy, but we’ll have to be quick.

Howard 53:19

Well, I can make this like a case presentation at checkout. So let’s pretend that we’re in November of 1963, a few weeks before John F Kennedy was assassinated in Dallas, and Mrs Kennedy comes to our clinic to review her pregnancy history and I can be the resident and you can be the attending.

Antonia  53:35

Okay. So she’s coming in for kind of a general follow up. I do ask a lot of pimp questions. I’ll just yeah, that’s okay. Yeah, I’m ready.

Howard 53:44

Okay, dr Roberts. So Mrs Kennedy is a 34 year old gravita five para 2212, who’s now four months postpartum after a repeat cesarean delivery of her son Patrick Bouvié Kennedy, who unfortunately died just a few days after birth after he was delivered by emergency repeat cesarean for a suspected abruption at about 34 weeks of gestation.

Antonia  54:08

Okay, well, that’s very sad. Let’s go through her other pregnancies. So what was her first pregnancy like?

Howard 54:14

Her first pregnancy was an early miscarriage in the first trimester, a little bit after the Kennedys got married.

Antonia  54:21

Okay, and how about second pregnancy?

Howard 54:24

Her second pregnancy was a baby girl named Ara Bella, who was born by cesarean delivery in August of 1956. In the middle or so of the third trimester after Mrs Kennedy abrupted. She presented with significant vaginal bleeding and pain and was delivered presumably by classical cesarean. I don’t know that for sure, but Ara Bella was still born at the time of the birth.

Antonia  54:47

Also very sad. Okay, what was her third pregnancy like?

Howard 54:52

Well, the next pregnancy you’ve heard more about this was her daughter, carolyn Kennedy, who was born in November of 1957, again by a cesarean at or near term, and this was a planned repeat cesarean in a pregnancy that went pretty well after she’d had the cesarean with Ara Bella.

Antonia  55:09

Okay, good, and then what about her fourth pregnancy?

Howard 55:13

Born while Jack was president, that was John Jr. Three years later by again another scheduled repeat cesarean delivery at or near term.

Antonia  55:22

All right. So then her fifth and final pregnancy was with Patrick. Was she just doing well until she abrupted again?

Howard 55:29

Yeah, and she was at an equestrian event and started having pains and bleeding and a rather dramatic story that wouldn’t have time to tell. She was taken by the Secret Service by ambulance to a scheduled helicopter rendezvous where arrangements were already made in cases happened, and taken to a hospital set up for this. And Patrick was born alive, successfully by cesarean delivery but due to prematurity he died of highland membrane disease.

Antonia  55:56

Wow. Well, it’s almost unthinkable that someone would die of respiratory distress or highland membrane disease at 34 weeks today in the US, let alone the son of the president.

Howard 56:08

Well, that’s right, and his death ultimately had a lot to do with why our emphasis in the United States on research shifted heavily towards understanding premature birth and advancing neonatal technology. So the focus of the Eunice Kennedy Shriver National Institute of Child Health originally had been things like intellectual disability, but it shifted after this to things like how can we prevent neonatal morbidity and mortality?

Antonia  56:32

An amazing number of improvements in neonatal care have occurred, largely due to studies funded by the NICHD. So today we’ve extended by many, many weeks the gestational age at which pre-term newborns tend to survive, with usually little to no health problems that are related to prematurity. So let’s go back to our case. Did Mrs Kennedy have any risk factors for placental abruption?

Howard 57:00

Well, surprisingly, she was a very heavy chain smoker. Mrs Kennedy worked hard to hide this fact from the public and tried to control any pictures that were taking of her while smoking, but we will put a pic of her smoking while pregnant on the Instagram account.

Antonia  57:16

And people should follow this account. A lot of good stuff going on there and share with friends, of course, Do it for us and do it for Maddie.

Howard 57:24

Anyway, the other interesting thing about her pregnancies is that a lot of Americans first learned about Caesarians by reading about her and in particular about John Jr’s birth, which occurred after Kennedy became president. So that was a big deal in the age of new media and television and things like that, and there were a lot of magazines and newspapers that ran explainers telling people what is a Caesarian delivery and why are they sometimes necessary and how are they done. And this had a great normalizing effect about Caesarians when the public in many cases were first learning about them because Mrs Kennedy had one. Even if people didn’t always like John, most people were enamored with Jackie and they paid attention. There was also a belief that at the time that most women should have a hysterectomy after the third Caesarian because of the classical incision that was usually involved in those incisions and the amount of damage done to the uterus.


But in Europe this wasn’t the practice and the low transverse incision was commonplace, as we discussed when we talked about Queen Elizabeth’s births in a previous episode. So the public became educated by the news media that the transverse incision was safer and they also learned, of course, that Jackie had had four Caesarians now without at least maternal complications. Of course two children died of prematurity because she abrupted, but Caesarians could be safe and normal. And all of this had again this normalizing effect on Caesarian delivery in the US and in particular encouraged public pressure for the use of the low transverse incision, like it was being done in Europe, instead of the vertical classical incision on the uterus. And consequently, for that and many other reasons, the incidence of Caesarian exploded in this decade and rising about 400% in the next 20 years or less, and became a routine part of care by the end of the 1960s.

Antonia  59:14

It’s interesting that that’s what people took away from Jackie Kennedy having C-sections. I guess it would be kind of like if Taylor Swift had some new or some uncommon surgical procedure today in terms of that cultural impact.

Howard 59:29

Yeah, I’ll put a link to an article written by an academic historian who’s also written a nice book, which I have, about the history of Caesarians in the US, and this is a big thesis of hers and does a good job at with it about how Jackie Kennedy’s changed our perspectives. In any event, though, I think it’s undeniable that through her obstetric history and then, of course, the tragic death of Patrick Kennedy in particular, the US gained an increasing public acceptance of Caesarian. We saw an increased demand for and popularity of the low transverse incision, and we shifted research dollars and emphasis towards advancing the science and care of pre-mature neonates to the point where it is today.

Antonia  01:00:10

Well, I think they should have also taken away that pregnant women especially shouldn’t smoke.

Howard 01:00:17

That too. I will say that it’s interesting even by that point that Jackie did such a good job of hiding it, because that implies there was a little bit. She knew this wasn’t the right thing. Yeah.

Antonia  01:00:28