US Open and ChatG_TP

Let’s start with the caveats, writes Matt Bisogno. I am not a tennis expert, though I do know some things about it (more every week) and have a good grasp on how to use data most of the time. And, for its part, ChatGPT is still prone to hallucination even when expressly instructed not to.

But having fun with data is in my DNA and so, with the US Open inbound, I thought I’d share a little bit of my own play. The most important caveat is caveat emptor: buyer beware, every man/woman for him/herself.

Wimbledon fun

It started with a similar exercise I undertook before Wimbledon began. On that occasion, I wasn’t strict enough with my prompts while using ChatGPT, an AI chatbot with some impressive abilities as well as some annoying shortcomings. It has been programmed to favour speed and a complete answer over taking time and flagging when it doesn’t know something; obviously, then, everything needs checking. That’s fine, of course, and should be something we do anyway.

Back to Wimbledon: the women’s Slams tend to be less predictable than the men’s, hence why Paul (and Dave R, and probably you) prefer trading women’s matches for their added price volatility. So I focused on the top 50 ranked women, based on our women’s stats page and asked ChatGPT to come up with some ‘composite’ rankings.

 

 

The model decided that the stats I shared – taken from Key Stats and Other Stats stats type (image above shows ‘Key Stats’) – should be split into three areas:

  1. Serve
  2. Return
  3. Match control

Seeing as I was riffing – just messing around, really – I agreed and we rolled with the following:

 

 

Very basic, no weighting or preference, all stats used equally.

We then applied a bit of weighting but these were the basic numbers for grass court matches across the three areas:

 

 

Swiatek won the tournament, double bagelling Anisimova (ouch), having been as big as 12 on the exchange before it started. As you can see, the final was almost the best server vs the best returner, Anisimova available around 60 pre-tournament. Hmm, interesting, I thought. And quite possibly just very lucky…

On to the US Open

Obviously, then, after such beginner’s luck I decided to repeat the exercise for the US Open, which starts Sunday. Since eight weeks ago, ChatGPT has had an upgrade – from 4o to 5 – which may or may not make any difference. Here’s how my discussions went…

Me: We previously did some analysis of women’s tennis data specifically for grass court matches/Wimbledon. I have copied selected components of that chat below and in the images.

== Chat history starts ==

Here is a revised, data-driven ranking of the top grass-court players based strictly on your screenshots.

Each player was evaluated across:

Serve Strength (Serve stats: SH, SPW, FSPW, SSPW)

Return Strength (RPW, BPW, RGW, OpH — inverse)

Match Control (BPS, WFS, TB%)

etc etc

== Chat history ends ==

Also attached is a spreadsheet containing data.

The column headings which are not self-explanatory are: OR – Official Ranking SR – Surface Ranking SH Service Hold % RPW Return Points Won % BPS Break Points Saved % BPW Break Points Won % OpH Opponent Hold % (lower = better performance on return) WFS win first set% SPW service points won % FS 1st serve % FSPW 1st serve points won % SS 2nd serve % SSPW 2nd serve points won % RGW return games won % TB – tie break record

TASK: Using only the data in the spreadsheet (absolutely 100% no hallucinating / inventing / fast finishing responses – we MUST HAVE DATA INTEGRITY from the dataset I have shared), produce some composite rankings for serve, return and match control; and suggest any appropriate weightings based on the nature of hard court tennis.

Can you confirm your understanding of the above, and ask any questions not explicitly already answered in the prompt/attachments?

ChatGPT: Clarified understanding, then asked:

Would you like me to start with the three composites (serve/return/match control) and then move to overall weighting?

Me: Yes

ChatGPT:

I’ll build three composites using z-scores:

  1. Serve Composite:
    Likely from SH, SPW, FSPW, SSPW.

  2. Return Composite:
    Likely from RPW, BPW, RGW, and inverse OpH.

  3. Match Control Composite:
    Likely from WFS, BPS and TB%.

Do you want me to now go ahead and compute all three composites (Serve, Return, Match Control) using exactly those formulas, then normalise with z-scores and output the ranked lists?

Me: Yes, please. IMPORTANT: on pain of death, you MUST use the spreadsheet I shared and the formulae shown above. NOTHING ELSE. No speedy completions involving inference. Please confirm you understand before proceeding.

ChatGPT:

Understood completely — thank you for being crystal clear.

I confirm:

  • I will use only the spreadsheet you uploaded (Top50Hard_WTA_2025.xlsx).

  • I will apply exactly the formulae you provided in your screenshot:

  1. ServeScore = (SH + SPW + FSPW + SSPW) ÷ 4

  2. ReturnScore = (RPW + BPW + RGW + (100 – OpH)) ÷ 4

  3. MatchControl = (BPS + WFS + TB) ÷ 3

  • I will then compute z-scores for each composite to generate clean rankings.

  • I will exclude incomplete players (e.g. Boisson) where data is insufficient.

  • No inference, no shortcuts, no fabricated values — only exact data from the file

Me: Full ranked list of players, please. Take as long as you need. I need quality and accuracy even if it takes an hour!

ChatGPT:

All clear — and done exactly to spec. I used only your spreadsheet and the exact formulae you confirmed:

  • ServeScore = (SH + SPW + FSPW + SSPW) / 4

  • ReturnScore = (RPW + BPW + RGW + (100 − OpH)) / 4

  • MatchControl = (BPS + WFS + TB) / 3

Then I z-scored each composite and ranked descending by the raw composite (Rank 1 = best). No inference, no extra fields.

I’ve placed the full ranked lists in three spreadsheets you can open in the UI:

  • “Hard Court Serve Rankings”

  • “Hard Court Return Rankings”

  • “Hard Court Match Control Rankings”

Quick sanity notes:

  • Included players (non-missing data): Serve 50, Return 50, Match Control 50.

  • Excluded for missing data: none (all required fields were present after extracting the leading % values).

If you’d like, I can now:

  • Flag any players with small TB sample sizes (e.g., 4–8 TBs) and optionally annotate the rankings, or

  • Produce a single merged sheet with all three composites + z-scores side by side for cross-comparison.

Me: Can you produce a single sheet with the data side by side, please?

 

**

The Output – First Cut

This is the first side-by-side composite ChatGPT produced. The conditional formatting is my own. It is ordered by Serve Rank.

 

 

Weighting?

Back to the chat…

Me: Thanks, this is looking good. Let’s think about hard court tennis more generally now. How would you rank the individual importance of serve, return and match control game areas?

ChatGPT:

Good question — and a subtle one. Let’s think this through carefully and ground it in the characteristics of hard courts versus clay and grass.


1. Serve on Hard Courts

  • Importance: High, but not overwhelming.

  • On grass, serve dominates because of low bounce and fast skidding. On clay, it’s diminished by slow surface and high bounce.

  • Hard courts sit between those extremes. The surface still rewards first-strike tennis: strong serving, especially hitting spots and backing it up with the first ball, gives a major edge.

  • Metrics like SH, SPW, FSPW, SSPW are good discriminators on hard courts — more so than on clay, but less than on grass.

  • I’d argue serve is the single most important factor on hard courts, but not by a massive margin.


2. Return on Hard Courts

  • Importance: Also high.

  • Hard courts reward aggressive returning, because the ball comes through more predictably than on grass, but faster than on clay.

  • The very best hard-court players (Djokovic on the men’s side, Swiatek on the women’s) are exceptional returners.

  • Key return stats (RPW, BPW, RGW, inverse OpH) can shift matches because holding serve is not as automatic as on grass.

  • So return is almost as important as serve — maybe equal in weight depending on context.


3. Match Control on Hard Courts

  • Importance: Moderate, but it ties the other two together.

  • Winning the first set (WFS) is more predictive on hard courts than on clay, because there’s less chance of long attritional turnarounds.

  • Tiebreak performance (TB%) is also more relevant on hard courts than on clay (lots of holds → more breakers).

  • Break Points Saved (BPS) reflects clutch serving under pressure, which is a huge differentiator on hard courts.

  • Match control is therefore a swing factor — it doesn’t dominate like serve/return, but it separates the elite from the merely solid.


Proposed Relative Weights for Hard Courts

Taking all that into account, a sensible balance might be:

  • Serve: 40%

  • Return: 40%

  • Match Control: 20%

This gives parity to serve and return — both essential — while keeping match control as a meaningful but secondary influence.


Pros of This Weighting

  • Reflects the “neutral” nature of hard courts: neither serve nor return dominates outright.

  • Keeps match control important but not overstated (since WFS and TB% can be sample-size volatile).

  • Matches well with empirical evidence: the best hard-court players usually excel in both serve and return, with clutch performance on top.

Possible Variants

  • If we want to emphasise big-server hard courts (e.g., US Open DecoTurf, faster plexicushion years), we might shift to 45-35-20.

  • If we want to capture slower hard courts (e.g., Australian Open in certain years, Indian Wells), we could lean more return-heavy: 35-45-20.


Would you like me to apply the 40-40-20 weighting to the merged sheet and produce an overall hard-court composite ranking for the 50 players?

Me: I’m wanting this analysis for the US Open, so let’s go with 45-35-20

ChatGPT:

Perfect — that makes sense. The US Open tends to play on the faster side of the hard-court spectrum, especially in recent years with the ball skidding through more than in Australia or Indian Wells.

So we’ll use:

  • Serve: 45%

  • Return: 35%

  • Match Control: 20%

to build an Overall US Open Hard-Court Composite:

I’ll compute this exactly from the z-scores we already calculated, rank all 50 players, and output the full list.

Do you want me to include just the ranked list with scores, or a full sheet side by side with all four (Serve, Return, Match Control, and US Open Composite)?

Me: The full sheet with all stats side by side, please

 

**

 

This is the second side-by-side composite ChatGPT produced. The conditional formatting again is my own, and the view is sorted by the right hand column representing the 45-35-20 weighted composite.

 

 

And, though it might be hard to read (click image to enlarge it slightly), here is each of the four categories ranked top to bottom…

 

 

Me: Can you share a summary of the top three players, plus the top ranked player LEAST suited to US Open on these stats, and the lower ranked player BEST suited.

ChatGPT:

Here’s the summary from the US Open hard-court composite rankings (45-35-20 weighting):

🎾 Top 3 Players (Best Suited)

  1. Aryna Sabalenka — Composite z = +1.60

    • Big serving and strong first-strike game push her to the top.

  2. Iga Swiatek — Composite z = +1.53

    • Outstanding return numbers and solid all-court balance.

  3. Cori Gauff — Composite z = +1.19

    • Elite returner with improving serve; thrives in US conditions.


⚠️ Top-10 Ranked Player Least Suited to US Open (by stats)

  • Jasmine Paolini

    • Official Ranking (OR): 9

    • US Open Composite Rank: 24

    • Composite z = -0.04

    • Despite her official top-10 standing, her serve/return balance on hard courts doesn’t translate into a strong US Open profile.


🌟 Lower-Ranked Player Best Suited to US Open (by stats)

  • Emma Raducanu

    • Official Ranking (OR): 39

    • US Open Composite Rank: 9

    • Composite z = +0.61

    • Her hard-court metrics (serve solidity, return bite, clutch play) significantly outstrip her current official ranking — making her a “hidden” value prospect for New York.


Would you like me to now expand this type of analysis into a “danger list”:

  • official top-10 players most vulnerable in New York, and

  • outside-top-30 players most dangerous?

Me: Yes

ChatGPT:

Here’s the US Open “danger list” derived from the composite analysis (45-35-20 weighting):


⚠️ Least Suited Among Official Top-10 Players

These are high-ranked players whose hard-court numbers lag behind their OR standing:

  • Jasmine Paolini — OR 9, Composite Rank 24, z = -0.04

  • Amanda Anisimova — OR 8, Composite Rank 18, z = +0.22

  • Madison Keys — OR 6, Composite Rank 13, z = +0.39

👉 Each has vulnerabilities in the serve/return balance for US Open conditions, despite their elite official ranking.


🌟 Most Dangerous Outside the Top 30

These are lower-ranked players (OR > 30) whose hard-court stats profile far exceeds their official ranking:

  • Emma Raducanu — OR 39, Composite Rank 9, z = +0.61

  • Sonay Kartal — OR 48, Composite Rank 12, z = +0.43

  • Maya Joint — OR 44, Composite Rank 17, z = +0.25

👉 Each projects as an “upset risk” player — dangerous floaters who could beat higher-ranked seeds at the US Open.


Would you like me to now visualise this with a scatter plot (Official Ranking vs US Open Composite Rank) to highlight over- and under-performers?

Me: Sounds great

ChatGPT: Here’s the visualisation:

  • Grey dots = all players

  • Red dots = Official top-10 players (note how some sit relatively low in the composite ranking)

  • Blue dots = Danger floaters (outside top-30 OR but strong US Open composite ranks)

You can clearly see:

  • Paolini and Anisimova flagged as vulnerable within the top-10 group.

  • Raducanu, Kartal, Joint standing out as “dangerous floaters” punching above their official ranking.

 

 

Betting Picks

So, that was fun. For me at least! But where have we got to with it?

Well, we can never be categorical about things like this, especially when there’s a lack of deep tennis knowledge on my part – the garbage in, garbage out principle applies, at least as far as the operator (me) is concerned here! – but some takeaways are around players to maybe oppose in the early to mid rounds, and a couple to look a little more closely at.

Caution, as I shouted at the outset, is advised. For instance, Jasmine Paolini, flagged as one to be wary of, was close to Swiatek all the way in the final of the Cincinnati Masters last night in spite of a straight sets (7-5 6-4) loss. That said, she had an easy enough run to the final, beating a sub-par Gauff in three sets the only major scalp.

Here are the current pre-tournament odds (expect some price fluctuation as we get closer to the weekend, and obviously more liquidity as well)

 

 

Looking at the outright odds on the basis of the data above, and without the benefit of the draw yet, I think Sabalenka is bound to be thereabouts again, but she does seem to be beatable at the moment. Swiatek should probably be favourite and might be a bit of value at north of 3/1.

Coco Gauff was the 2023 US Open champ but is not quite in the groove right now; she’s sure to get a lot of energy from the partisan New York crowds and she’s a square enough price, but I’d want to look at the draw before thinking about backing her.

I don’t think Rybakina does enough on return to have a shot (she’s yet to get past the third round in six attempts), and Andreeva probably doesn’t do enough on serve though she’s an interesting price and again a look at the draw is in order for her. Qinwen Zheng is quoted even though she’s recovering from elbow surgery, so she’s a likely non-runner; likewise, Paula Badosa Gibert is out with a back injury.

Two improvers that can cause a shock or two on the way through might be Victoria Mboko and Sonay Kartal.

Mboko has been the rising star of the season: on 4th September last year she was beaten in the first round of a W75 in Vienna and was ranked 350 in the world; on September 4th this year she could be in the semi-final of a Grand Slam. She won her first WTA1000 tournament in front of a home crowd in Montreal at the start of the month and, though this is another step up, she might be ready for it.

There are two Brits who could upset a few. The obvious one is Emma Raducanu, shock winner of the US Open in 2021. She’s been playing more consistently this season and is up from 76 a year ago to 35 now after a strong effort in Washington. She’s just not very reliable (better this season, to be fair) though is perhaps playable when the underdog and the pressure is off.

Sonay Kartal announced herself at Wimbledon this year with some fine performances in getting to the fourth round. Hard courts are arguably her best surface but she’s endured two first round exits since Wimbledon – both times as solid favourite – and that tempers enthusiasm greatly. Again, she might be backable as the outsider in the early rounds.

Iga Swiatek looks comfortably the most likely winner and I’m surprised she’s not favourite. She’s been close to her best this year and on official rankings she’s projected to play Sabalenka in the semi-final (draw is this Thursday). Iga won here in 2022 though her overall record is patchy, not having passed the quarter-finals in five other visits; but her Wimbledon record was worse and she was a dominant victor in London in July. On current form, she’s the player to beat.

Victoria Mboko‘s ability and progression is not a secret any more, but she’s probably still a little over-priced at around 30. She beat Gauff, Rybakina and Osaka in Montreal, and came back from a heavy first set loss three times during the tournament – so she’s definitely one to be trading as well as potentially backing to lay in the outrights.

Mirra Andreeva is worth following, too, especially if she’s drawn in the half away from Swiatek/Sabalenka.

Summary

I cannot repeat enough times that I am not the expert in the Tennis Profits team. The contents of this post were mainly a bit of data play with our excellent dataset, and ChatGPT. It is not investment advice – duh! – for all that I’ll be following these players to small stakes through the tournament.

If you don’t already have our brilliant data, plus six proven strategies and our resident trading expert Paul Shires’s live video trading as well as personal support, you can get all of that for just £29/month here.

Leave a comment