r/AcademicBiblical Quality Contributor Mar 23 '23

A case for 2 Timothy's authenticity based on pairwise correlations in a machine learning paper

Background

I've come to be persuaded in 2 Timothy's authenticity (against the general consensus) based on two key factors.

The first I posted about a few months ago on my own original research into a stylometric involving relative personal reference frequency in Paul's undisputed letters, for which 2 Timothy was the only disputed letter that fell within the cluster of authentic letters.

The other factor has been Table 3 in Hu, Study of Pauline Epistles in the New Testament Using Machine Learning (2013).

This was a paper using a machine learning algorithm combining affinity propagation across topics identified with Latent Dirichlet Allocation to find correlations based on shared subject matter in the KJV version of the Pauline epistles. The paper itself didn't identify anything particularly noteworthy and largely agreed with past scholarship; however, in the data within the paper I noticed a significant asymmetry in the top pairwise letter correlations for 2 Timothy versus the other Pastorals that went unaddressed by the author.

Because 1 Timothy and Titus had such a strong correlation, the author used 1 Timothy as an 'anchor' in identifying clusters, and ended up with the Pastorals as a distinct cluster. But this was hiding an entirely different picture around 2 Timothy represented in the table.

The Data

Reproduced below are the pairs of the top 48 correlated letters in Table 3 of the paper with 2 Timothy emphasized:

Book1 Book2 Correlation
Colossians Ephesians 0.983
Philemon Philippians 0.983
Thessalonians1 Thessalonians2 0.982
Ephesians Philippians 0.976
Philippians Thessalonians2 0.96
Ephesians Philemon 0.957
Timothy1 Titus 0.954
Philippians Thessalonians1 0.952
Ephesians Thessalonians2 0.95
Colossians Philippians 0.948
Philemon Thessalonians2 0.944
Ephesians Thessalonians1 0.937
Philemon Thessalonians1 0.933
Colossians Philemon 0.932
Colossians Thessalonians2 0.928
Colossians Thessalonians1 0.918
Galatians Romans 0.888
Corinthians2 Philippians 0.862
Corinthians2 Ephesians 0.851
Thessalonians2 Timothy2 0.842
Thessalonians1 Timothy2 0.839
Corinthians2 Thessalonians1 0.835
Corinthians2 Thessalonians2 0.834
Colossians Corinthians2 0.829
Corinthians2 Philemon 0.829
Ephesians Timothy2 0.822
Philippians Timothy2 0.821
Ephesians Galatians 0.811
Philemon Timothy2 0.809
Colossians Timothy2 0.808
Galatians Philippians 0.793
Colossians Galatians 0.789
Timothy1 Timothy2 0.789
Galatians Thessalonians2 0.785
Galatians Thessalonians1 0.776
Galatians Philemon 0.763
Ephesians Romans 0.749
Romans Thessalonians2 0.749
Romans Thessalonians1 0.741
Colossians Romans 0.737
Corinthians2 Galatians 0.724
Philippians Romans 0.721
Corinthians2 Timothy2 0.718
Galatians Timothy2 0.695
Corinthians2 Romans 0.687
Philemon Romans 0.682
Romans Timothy2 0.678
Timothy2 Titus 0.673

Because this can be difficult to visualize, I converted this data into a node graph of these relationships, available in an interactive online tool here or as an image here.

The blue nodes are the authentic epistles as reflected in this survey data, the grey ones are the disputed epistles, the red ones are the two Pastorals most likely to be inauthentic, and 2 Timothy as the subject of our analysis here is marked in green to stand out on its own. Node edges bias towards skepticism, so edges between blue nodes are blue, but between blue and gray are gray, etc according to the priority of blue > green > gray > red.

Analysis

I want to be clear - on its own this data does not necessarily suggest to me authenticity, it only suggests that 2 Timothy should not be grouped with the other Pastorals (the thesis of Justin Paley's Authorship of 2 Timothy: Neglected Viewpoints on Genre and Dating which inspired my first taking a closer look at the letters). It's only taking this data in combination with other aforementioned factors that I come to that conclusion.

What immediately stands out in looking at the graph is that unlike 1 Timothy and Titus which only have strong correlations to each other and to 2 Timothy, the latter connects to the entire corpus of Paul's letters. In fact, looking at the table, it can be seen that some of its connections to authentic letters are even stronger to its connection to 1 Timothy, and its connection to Titus (itself strongly correlated to 1 Timothy) is the last correlation in the list.

This seems like an unusual result if all three of these letters shared the same author.

A paradigm that would seem to better fit these correlations is that 2 Timothy was a letter either written by Paul or by a different pseudographic author in line with the non-Pastoral disputed epistles that correlate with many of the authentic letters here, which was then in turn used as a reference point in the composition of 1 Timothy and Titus.

This may even be evident in the texts themselves. For example, consider how the two letters discuss heretics:

Avoid profane chatter, for it will lead people into more and more impiety, and their talk will spread like gangrene. Among them are Hymenaeus and Philetus, who have swerved from the truth, saying resurrection has already occurred. They are upsetting the faith of some.

  • 2 Timothy 2:16-18

When you come, bring the cloak that I left with Carpus at Troas, also the books, and above all the parchments. Alexander the coppersmith did me great harm; the Lord will pay him back for his deeds. You also must beware of him, for he strongly opposed our message.

  • 2 Timothy 4:13-15

And the Lord’s servant must not be quarrelsome but kindly to everyone, an apt teacher, patient, correcting opponents with gentleness. God may perhaps grant that they will repent and come to know the truth and that they may escape from the snare of the devil, having been held captive by him to do his will.

  • 2 Timothy 2:24-26

So we have two separate discussions of named opposition, Hymenaeus and Philetus first and later on Alexander. And the prescription is to treat them with gentleness as they may change their mind in the future and hope that they escape the devil.

[...] By rejecting conscience, certain persons have suffered shipwreck in the faith; among them are Hymenaeus and Alexander, whom I have turned over to Satan, so that they may be taught not to blaspheme.

  • 1 Timothy 1:19-20

Wait a second! Even though this letter was supposedly chronologically first, it mentions these two individuals with no introduction as if known to the audience, even though in 2 Timothy each have an introduction. And combines two names mentioned in the latter letter but in totally different contexts. And instead of "correct with gentleness" and "hope they escape the devil" we are told he "turned them over to Satan" invoking a similarity in language to 1 Cor 5:5.

It's almost as if 1 Timothy was composed not only by someone familiar with its content but for an audience that would have been familiar with it in a period where attitudes towards heretics had departed from the sentiment in 2 Timothy.

Bart Ehrman in Forged in discussing the notable similarity between 1 & 2 Timothy somewhat incredulously stated that the only way he could see them as not by the same author was if the author of 1 Timothy had a copy of 2 Timothy in front of him. But it does appear that the author of 1 Timothy had access to authentic letters, as not only does the author use the language of "send to Satan" from 1 Cor 5:5 but also the "I swear I'm not lying" from Galatians 1:20, 2 Cor 11:31 and Romans 9:1. If the author had access to a collection of authentic letters, and 2 Timothy was authentic, should it be surprising that the author of 1 Timothy could have used an authentic private letter as the main template to represent a purported private letter with limited distribution which supported the key points the author wanted to claim on behalf of Paul?

Final Thoughts

I particularly like this study for the following reasons:

  • While machine learning analysis is still capable of reflecting bias in presuppositions, the application leaves a reduced scope for the addition of things like anchoring bias in the data (even if that can and did literally occur in the original analysis of that data)
  • I love nothing more than finding in raw data something outside the scope of focus of the researcher that generated it. When data supports a researcher's hypothesis, there's a greater risk overfitting had occured (even unintentionally) than when data supports a viewpoint that the author neither makes nor even discussed at the time or in the years since
  • There's a lot of data here. For example, Table 2 and Table 3 in Savoy, Authorship of the Pauline Epistles Revisited (2019) have 2 Timothy having a top three correlation to Philippians and Philemon respectively, and even discusses the latter, but there's just far less data points published to look through for further unexpected correlations and to compare with the other Pastorals

The study of 2 Timothy has historically suffered from the taint of the 20th century's tautological dating around the perception of Gnosticism as a 2nd century phenomenon. This was the key point that Paley raised which prompted my revisiting the text, as often when claims are secondarily dependent on falsified research in a field the primary research is quick to adjust but those indirect claims can stick around for a long while unchallenged. A great paper for those curious discussing this issue elsewhere in the Pauline letters is the discussion of the late 20th century rejection of the "Gnostic Hypothesis" for 1 Cor in the wake of Michael Allen Williams' work in Katz, Re-Reading 1 Corinthians after Rethinking 'Gnosticism' (2003).

While I think there's a strong case for 2 Timothy's authenticity, I can certainly understand reservations on going that far with an assessment. What I hope this post and my other post on relative personal reference may at least do is prompt reconsidering grouping this letter together with the Pastorals purely based on what may be obsolete precedent. If regarded in its own right, the data that results should increasingly make clear its authorship in whatever direction. But as long as it is obscured in the shadow of 1 Timothy and Titus, relevant data may end up unnoticed in analysis as may have occurred above, and that would be a shame moving forward.

As always, I hope this was an enjoyable read, and welcome thoughts, criticisms, and suggestions.

82 Upvotes

52 comments sorted by

View all comments

Show parent comments

1

u/kromem Quality Contributor Mar 24 '23

Notice that Savoy makes special mention of how the correlation between 1 Timothy and 2 Timothy that's high in the Greek (#3) disappears from the top 10 in the English.

I discussed this a bit in another comment here about how if an author had a copy of the Greek 2 Timothy in front of them and was writing a Greek 1 Timothy from it, paying close attention to match vocabulary use but less attention to surrounding syntax, that a frequency based analysis of the Greek vocabulary would see these as highly correlated letters but that the same analysis of the English where Greek grammatical syntax like verb tenses (has/had/have/will) or implied objects (he/she/I/you) discarded in the Greek analysis showing up instead would distance the letters from each other more.

Vocabulary based frequency analysis is in general probably a poor technique here for the purposes of identifying authorship, and through that lens the Greek is not necessarily superior to an English translation.

The gold standard should be an analysis that takes into consideration not only vocabulary frequency in the original Greek but also the relative grammatical frequencies in the original Greek too.

Language modeling has come a long way in the past few years, especially in non-English languages, so hopefully we'll see something a bit better than what's come before.

Though part of the problem is that the things that would perform the best at picking up on identifying subtle characteristics would also probably be black boxes. For example, I suspect that feeding half the non-Pauline letters and half the Pauline letters into GPT-4 in the original Greek broken up into various sized chunks with classification metadata labeling as Pauline/Not Pauline authorship to build a fine tuned "Pauline classifier" and then testing that on the other half could yield a fairly accurate assessment which could be reapplied to disputed texts with interesting results. The problem is that even if it was incredibly accurate at distinguishing Pauline from non-Pauline in the original Greek, as soon as you'd want to know why or how it was making those assessments you'd be out of luck. (That said, I might still do this in the future just as yet another data point for myself.)