 |
Sortieren nach:
Datum - neue zuerst |
Datum - alte zuerst |
Bewertung
|
|
Hi Peter,
Just to give you a bit more context, you can assume that your database (just like everybody else’s) has been already scraped by all major makers of "AI" models long ago, and now they're just making sure they're up to date.
I was also personally hit by this ever-escalating insane scraping wave. Not that I care much about the data, but the bots were so ruthless that it reminded me of the early search engine days, and I had to ban them just to keep my server running. Here is an article that you might like with some background:
https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/
What Thomas did was to take an "open source" model (misuse of terminology, but that's what they are called anyways) and have it generate summaries per airfield. I don't know which one he picked, but it's not important. The point is, guess where the information that this model was trained upon was sourced from, be it OpenAI, or Google, or Microsoft? You bet, it's your database. So no wonder that answering very specific queries, an output with high similarity to the reviews on your database comes out...
This is to explain that he didn't scrape anything like you seem to think. It's all been done already and you'd better go after OpenAI if you don't like it. Not sure it makes any difference to you though. But I think it's better to stick to reality instead of some conspirative phantasies.
P.S. I find it funny that it took 6 pages of conversation for you to make it clear that you're against sharing the reviews, because you just don't want them to be accessible through any other interface other than yours, be it a commercial project or not. That's your thing, but why not just put it this way in your original answer and end the discussion at that?
P.P.S. I don't like either database (yours or eddh) exactly for this reason, and I neither use nor contribute to either of those. I do occasionally contribute reviews to FF, and I will be using this new project because it provides an API and uses unrestrictive licensing.
|
|
|
"You are an aviation expert writing a concise, practical executive-style summary about the GA airfield Colonsay Airport (ICAO: EGEY), aimed at non-commercial general aviation pilots. Use external sources (e.g. airport official website, Oban & The Isles Airports, council fees pages, pilot forums such as EuroGA, eddh.de, you-fly, etc.) as input—your summary must be evidence-based and cite sources. Include operational procedures (e.g. PPR requirements, out-of-hours permitting), fees (landing, parking, permit; rounded per rules), safety notes (bird activity, turbulence), and any on-site/nearby services or constraints. Do not add uncited claims. Use plain text, English only, direct declarative style, one or two short paragraphs, no headers, no storytelling—just facts."
|
|
|
And wherever you get them - these facts can never be "copyrighted" anyway. Even if the pireps created by AI contained some senteces or rewritten personal experience - i see little to zero chance there could be a copyright issue.
For a text to be copyright-protected, it must reach a sufficient level of originality (often referred to as the “threshold of originality”).
|
|
|
To be fair, I think that the jury is still out on how to handle "AI" hallucinations in terms of copyright, as evidenced by the battles going on in Hollywood right now, or emerging commotions in the gaming industry. That much should be clear though: the precedent-setting decision won't be made on the basis of EuroGA vs. airfield.directory PIREPs :)
|
|
|
I am not even talking about copyright for AI content – im general small fact based little texts like these pilot reports do not qualify for "copyright". No court will be interested in dealing with this stuff.
|
|
|
You cannot control where "AI" scrapes its semi-garbage from. The moment this project receives a credible legal threat it will have to pack up. Then everybody contributing will have wasted their time.
|
|
|
Let's ask the Defendant:
Building your own site with pilot reports and want AI to collect content only from certain sources.
Here’s how that works in practice:
Yes, you can restrict AI to specific sites.
If you tell me “only use EuroGA.net and Pilots of America, never AOPA or Reddit”, then I can be directed to search or scrape only those whitelisted sites.
How it’s done technically:
When fetching fresh info, you (or a developer) set the AI/web crawler to query only those domains. Example: a search like site:euroga.org "PIREP" ensures only EuroGA is searched.
You can also exclude others using -site:aopa.org.
********
This does not mean that it "should be done", but technically it is absolutely possible, And the risk is small, because the fact based content of PIREPS (especially when re-written by the AI) can hardly be "copyrighted".
|
|
|
This does not work. LLMs disregard robots.txt etc etc.
The whole LLM copyright debate is regarding this. If LLM scrapers respected these controls, they would be almost useless. They would just be a "better google".
Anyway, doing this would be admitting that specific sources are targeted because they have the quality data Rockhopper is looking for ;)
|
|
|
|
|
|
The AI-generated summaries posted on Airfield Directory are legally completely fine for several reasons: they do not consist of copied texts but only of facts expressed in new wording:
- Facts such as fees, PPR rules, phone numbers, or typical wind conditions are not protected by copyright and may be freely shared.
- The texts are not taken 1:1 from databases but are automatically reformulated. This means they do not infringe copyright protection.
- The AI-generated texts themselves are freely usable, including commercially, according to the AI-provider.
Apart from that:
- The copyright of a PIREP always lies with the author, i.e. the person who wrote the text, and not with the forum in which it was published. The forum or its operator has at most the usage rights defined in its terms and conditions (e.g. for displaying or archiving posts) – if such rules exist at all. It does not thereby become the copyright holder. Copyright remains with the writer, not the forum – and a fact-based summary does not violate this right. If, nevertheless, an individual airfield operator or author (but not a forum operator) feels their copyright or other rights have been infringed, the procedure for notice & takedown is set out in the Terms of Service, section 14.
- I would like to remind everybody that under Article 20 of the GDPR, every author also has a right to ask the service operator to export their own data in a machine-readable format, which applies to forums as well.
Every forum by the way is free to attribute and import PIREPs from Airfield Directory.
The claim “The moment this project receives a credible legal threat it will have to pack up. Then everybody contributing will have wasted their time” is of course just Trump-style fear, uncertainty and doubt.
Because the posted PIREPs on Airfield Directory are Creative Commons and are regularly exported, they remain free forever, entirely independent of the continued existence of Airfield Directory. This seems, however, to be very difficult for some minds to grasp.
|
|
|
|
10 Beiträge Seite 1 von 1
|
|
|
 |
|
|
|
 |
 |
|