[SAC] [OSGeo] #2978: repo.osgeo.org contains 0 byte files

#2978: repo.osgeo.org contains 0 byte files
---------------------------+-----------------------
Reporter: jive | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Keywords:
---------------------------+-----------------------
Alexander via discuss email list:

> We're thankful consumers of osgeo's maven repository, though this
morning we noticed that empty files are pulled. Inspecting certain cases,
we saw that since this morning files with size 0 seem to be in the repo,
e.g. the files here
https://repo.osgeo.org/service/rest/repository/browse/release/org/jboss
/jboss-parent/36/.

Kévin Belellou has reported via https://osgeo-
org.atlassian.net/browse/GEOT-7433

> Since this morning (September 7th 2023), we encounter a bug with your
Maven Nexus instance (Nexus Repository Manager), that we proxy in our own
instance.
>
> A lot of artifacts with size of 0 bytes have appeared in the geonetwork-
cache repository, that caused our compile workflows to fail.
>
> Weird thing is that some of these “ghost” artifacts are ours
(com.total.* for example).
>
> My theory is that our Nexus asked yours for these artifacts (that you
shouldn’t have) and your Nexus somehow created empty ones and returned
them.
>
> A minor thing that may be important: all the artifacts in package
com.total.* have a classifier like sources, config or javadoc.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978&gt;
OSGeo <Gter - OSGeo;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Frank Gasdorf provides the following troubleshooting:

> I recently stumble about this problem to and my investigation is as
follows:
>
> * you a company uses a repository manager and has osger repo as a proxy
repo configured AND
> * and internal artefact has been request but is not avaliable in
internal repositories, that a http get is set to osgeo repo as well
> * this leads to enties like that with size 0 for each artefact
>
> IMHO it seems to be an issue in nexus configuration and I investigate
how to configure filter for proxe repositories in Nexus.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:1&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

I am trying to determine if there is anything for me to do as an admin of
repo.osgeo.org ?

I would like to determine if this is a cache problem or someone in the
community uploading zero-sized files by accident.

Browsing *release* I immediately see junk:

{{{
Repository release
Format maven2
Component Group $%7Bbusm.datasync.groupid%7D
Component Name dataconfig-service
Component Version $%7Bbusm.version%7D
Path $%7Bbusm/datasync/groupid%7D/dataconfig-
service/$%7Bbusm.version%7D/dataconfig-service-$%7Bbusm.version%7D.pom
Content type application/xml
File size 0 bytes
Content type application/xml
File size 0 bytes
Blob created Wed Sep 06 2023 19:46:12 GMT-0700 (Pacific Daylight Saving
Time)
Blob updated Wed Sep 06 2023 19:46:12 GMT-0700 (Pacific Daylight Saving
Time)
Last downloaded Wed Sep 06 2023
Locally cached false
Blob reference cache@EF9560A3-97C6ABE4-329A592F-
5E3FCFA2-D8F8BD08:4d71c425-4b81-4dfb-abc2-e05174a8c61d
Containing repo geonetwork-cache
Uploader anonymous
Uploader's IP Address 14.137.135.63
}}}

This appears to be in geonetwork-cache.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:2&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Yep, the `geonetwork-cache` seems very troubled, collecting zero byte
files on a wide range of topics?!?

* https://repo.osgeo.org/#browse/browse:geonetwork-cache

Going to focus on this configuration for now.

* cleanup policy does not allow me to remove based on size (ha!)
* It is a cache of https://github.com/geonetwork/core-maven-repo
* GitHub provided guidance that using GitHub as a maven repository was
impolite; and the project moved to repo.osgeo.org two years ago.
* I assume GitHub cut off the use even as a cache yesterday, and the
resulting madness has geonetwork-cache collecting timeouts and zero sized
files for everyone.

I am going to take geonetwork-cache out of `release` for now.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:3&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: sac@…
     Type: task | Status: new
Priority: normal | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Please test and let me know if that addresses the problem; for everyone
but the geonetwork community.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:4&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Changes (by jive):

* owner: sac@… => jive
* priority: normal => critical

--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:5&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by fgdrf):

IMHO this ticket has two aspects: The repository setup itself that
external request might not be logged on osgeo instance AND the recommend
setup on the repository manager that proxy osgeo repositories.

Its possible to define routing rules in the repo manager (in this case
nexus) as written here : https://help.sonatype.com/repomanager3/using-
nexus-repository/repository-manager-concepts/proxy-repository-concepts
#ProxyRepositoryConcepts-RoutingRules

I will try this on my end. Nevertheless, why are these "wrong" requests
lead to zero size artefacts in osgeo release repository? Is it a Nexus
bug?
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:6&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by fgdrf):

More details about how to setup routing rules :
https://help.sonatype.com/repomanager3/nexus-repository-administration
/repository-management/routing-rules#RoutingRules-
CreatingorModifyingaRoutingRule
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:7&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Okay:

a) Cutting out geonetwork-cache has restored service. Downstream caching
repositories may need to clear their cache for osgeo release (I could not
find a way to clear only the zero sized things)

b) The repository setup for geonetwork-cache *could* of used routing rules
to short list only the content contained in the cache. But it was a random
collection of patched jars that community made over time; and I thought it
was just a temporary fix while they got their act together. For this I
apologize; I should have followed up with that community again.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:8&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Frank I am replying to your comment as I am not sure I understand. There
are six components in play and I cannot match your words to each
component.

-- upstream --

PROBLEM REPOSITORY

-- osgeo repository --

GEONETWORK-CACHE (cache of the upstream problem repository)

OSGEO-RELEASE

-- downstream --

MAVEN BUILD 1 (against osgeo-release)

DOWNSTREAM CACHING REPOSITORY or MIRROR (cache of osgeo-release above)

MAVEN BUILD 2 (against downstream caching repository or mirror)

Replying to [comment:6 fgdrf]:

> IMHO this ticket has two aspects:
>
> 1) The repository setup itself that external request might not be logged
on osgeo instance

I do not understand external requests that might not be logged on osgeo
instances?

- Do you mean how the "downstream caching repository" is setup?
- Do you mean the setup of "geonetwork-cache" in the osgeo repository?

> 2) the recommend setup on the repository manager that proxy osgeo
repositories.

There is no good answer to this as each project has a different
constellation of jars and artefacts it requires for health and happiness.

The "downstream caching repository" or "mirror" could choose to use
routing rules to pick and choose what to cache from "osgeo release", or
may have greater control to use routing rules against "geonetwork-
release", "geoserver-release", "geotools-release" to tightly control what
jars they obtain from where.

The "osgeo-release" gathers up lots of sources to optimize "maven build
1". Since maven checks *each* repository listed in the pom.xml file - it
is much faster to have a mirror like "osgeo-release" to increase build
times.

For "maven build 2" a downstream repository is setting up their own
mirror.

> I will try this on my end. Nevertheless, why are these "wrong" requests
lead to zero size artefacts in osgeo release repository? Is it a Nexus
bug?

Yes this appears to be a Nexus bug, the service that caused the issue is
an old style maven 2 repository:
https://raw.githubusercontent.com/geonetwork/core-maven-repo/master

This now returns: 400: Invalid Request

I assume it is a Nexus bug that this response is being stored as a 0 byte
artifact.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:9&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

I have confirmed that the core-geonetwork builds are broken; they were
infact using some patched jars from the geonetwork-cache repository.

I have instructed the project to recover artifacts from their version
history, or local repositories, before they are lost, and upload to
geonetwork-releases repository.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:10&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by jive):

Please hold this ticket open until core-geonetwork team is happy again.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:11&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by juanluisrp):

I've been able to build all the active GeoNetwork branches (3.12.x, 4.0.x,
4.2.x and main) with an empty local maven repository, so I'd say the
problem for GN is fixed.

What puzzles me is why this has happened now if two years ago we removed
the contents of https://github.com/geonetwork/core-maven-repo/. I don't
think Github has stopped serving contents from there. Maybe the contents
were still cached in https://repo.osgeo.org/geonetwork-cache and somehow
they have been evicted / cleared from there.

Anyway, I think it's safe to delete geonetwork-cache repository from
repo.osgeo.org since all the dependencies are already in the release repo
and there are instructions for legacy GeoNetwork versions to update the
configuration in case anybody needs to build and old version.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:12&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by fgdrf):

Thanks for your help and support. IMHO its solved once geonetwork builds
are fine (again).

However, on my end I configured a routing rule in nexus to avoid external
requests for internal artefacts
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:13&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: new
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+------------------------
Comment (by fgdrf):

We configured https://repo.osgeo.org/repository/releases as a proxy
repository. This was the repo with empty files which should not be there
(we haven't deployed these internal artefacts).

And we neared down the problem that these artefacts were requested
internally but did not exists in our hosted repositories. Therefore these
were request in external proxied repositories as well.

Due to the possible bug in nexus files with size 0 were written in remote
repository.

Today I tried again to request a non existing artefact again and it
doesn't appeared here

{{{
     <dependency>
       <groupId>com.group.id</groupId>
       <artifactId>whatever</artifactId>
       <version>1.0.15</version>
     </dependency>
}}}

Nothing there - this is great!
https://repo.osgeo.org/#browse/browse:release:com%2Fgroup%2Fid%2Fwhatever

Does this scenario description help to understand what problems we ware
faced with?

Again, thank you for your help and clean-up.

Replying to [comment:9 jive]:
> Frank I am replying to your comment as I am not sure I understand. There
are six components in play and I cannot match your words to each
component.
>
> -- upstream --
>
> PROBLEM REPOSITORY
>
> -- osgeo repository --
>
> GEONETWORK-CACHE (cache of the upstream problem repository)
>
> OSGEO-RELEASE
>
> -- downstream --
>
> MAVEN BUILD 1 (against osgeo-release)
>
> DOWNSTREAM CACHING REPOSITORY or MIRROR (cache of osgeo-release above)
>
> MAVEN BUILD 2 (against downstream caching repository or mirror)
>
>
> Replying to [comment:6 fgdrf]:
>
> > IMHO this ticket has two aspects:
> >
> > 1) The repository setup itself that external request might not be
logged on osgeo instance
>
> I do not understand external requests that might not be logged on osgeo
instances?
>
> - Do you mean how the "downstream caching repository" is setup?
> - Do you mean the setup of "geonetwork-cache" in the osgeo repository?
>
> > 2) the recommend setup on the repository manager that proxy osgeo
repositories.
>
> There is no good answer to this as each project has a different
constellation of jars and artefacts it requires for health and happiness.
>
> The "downstream caching repository" or "mirror" could choose to use
routing rules to pick and choose what to cache from "osgeo release", or
may have greater control to use routing rules against "geonetwork-
release", "geoserver-release", "geotools-release" to tightly control what
jars they obtain from where.
>
> The "osgeo-release" gathers up lots of sources to optimize "maven build
1". Since maven checks *each* repository listed in the pom.xml file - it
is much faster to have a mirror like "osgeo-release" to increase build
times.
>
> For "maven build 2" a downstream repository is setting up their own
mirror.
>
> > I will try this on my end. Nevertheless, why are these "wrong"
requests lead to zero size artefacts in osgeo release repository? Is it a
Nexus bug?
>
> Yes this appears to be a Nexus bug, the service that caused the issue is
an old style maven 2 repository:
https://raw.githubusercontent.com/geonetwork/core-maven-repo/master
>
> This now returns: 400: Invalid Request
>
> I assume it is a Nexus bug that this response is being stored as a 0
byte artifact.
>
>
>
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:14&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: closed
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+------------------------
Changes (by jive):

* status: new => closed
* resolution: => fixed

Comment:

> I've been able to build all the active GeoNetwork branches (3.12.x,
4.0.x, 4.2.x and main) with an empty local maven repository, so I'd say
the problem for GN is fixed.

Thanks Juan, we will marked this closed.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:15&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: closed
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+------------------------
Comment (by jive):

I have removed the now un-used geonetwork-cache from repo.osgeo.org.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:16&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: closed
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+------------------------
Comment (by jive):

For organizations encountering this issue and wondering how an external
maven repository was filled with your "internal" metadata files ...

1. This seems to have occurred to a bug in the nexus software used by
repo.osgeo.org (the zero sized files). One of the caches we had setup,
geonetwork-cache, was incorrectly record zero byte files for every request
that came in.

2. I have completely removed the geonetwork-cache and all traces to my
knowledge of these zero byte files. So your "metadata" is no longer
visible in geonetwork-cache or osgeo-release.

3. Keep in mind the requests for your "internal" metadata.xml files are
still coming in. It is just we now correctly answering that we do not have
this information.

If you are concerned about the visibility of your organizations metadata
files - this is an ongoing concern with the configuration of some maven or
gradle build used within your development team.

Although these files are not present within our infrastructure - your
developers are making requests for these files constantly to
repo.osgeo.org (and any other maven repository your team is using world
wide).

It is not necessarily a problem asking public repositories for the
artifacts groups and artifact names used "internally" by your team. Just
keep in mind such requests will appear in the network traffic and logs of
each external repository your team makes use of.

To manage your team's configuration:

1. you should be running your own nexus maven repository
2. Your team should configure ~/.m2/settings.xml to mirror any external
repositories such as repo.osgeo.org to operative via your mirror.
3. Your mirror should cache repo.osgeo.org, with rules to only fetch the
jars (such as org.geotools.* that are required to support your operations

I also note that gradle allows fine grain control over how each repository
is used with includes/excludes control.

*
https://docs.gradle.org/current/userguide/declaring_repositories.html#sec
:repository-content-filtering

I personally use maven which does not offer such a facility as part of a
default install, instead using mirrors as described above:

* https://maven.apache.org/guides/mini/guide-mirror-settings.html
* https://maven.apache.org/guides/mini/guide-multiple-repositories.html
* https://maven.apache.org/resolver/remote-repository-filtering.html
(advanced!)

Aside: I am volunteering to look at repo.osgeo.org on behalf of my
employer !GeoCat BV and our customers. We take part in a number of
projects including !GeoServer and !GeoNetwork. If you need further
assistance please reach out on these tickets.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:17&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: closed
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+------------------------
Comment (by robe):

Just a heads up. I have plans to upgrade repo.osgeo.org to latest nexus
version in about 2 weeks. I'm wondering if such a change would help or
hurt or not make a difference with this kind of issue.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:18&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.

#2978: repo.osgeo.org contains 0 byte files
---------------------------+------------------------
Reporter: jive | Owner: jive
     Type: task | Status: closed
Priority: critical | Milestone: Unplanned
Component: Systems Admin | Resolution: fixed
Keywords: |
---------------------------+------------------------
Comment (by jive):

Replying to [comment:18 robe]:
> Just a heads up. I have plans to upgrade repo.osgeo.org to latest nexus
version in about 2 weeks. I'm wondering if such a change would help or
hurt or not make a difference with this kind of issue.

Update should be fine, we have not reported, or even checked with the
nexus bug tracker to see if this is a known issue.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2978#comment:19&gt;
OSGeo <https://osgeo.org/&gt;
OSGeo committee and general foundation issue tracker.