René's URL Explorer Experiment

Title: Etiquetador de expresiones multipalabra · Issue #2 · RedHenLab/NLP · GitHub

Open Graph Title: Etiquetador de expresiones multipalabra · Issue #2 · RedHenLab/NLP

X Title: Etiquetador de expresiones multipalabra · Issue #2 · RedHenLab/NLP

Description: Etiquetador de un léxico de expresiones multipalabra See description in English. Esta es una tarea de investigación sobre lenguaje y gestos asociada a la Biblioteca Internacional NewsScape de Noticias de Televisión. NewsScape es un recur...

Open Graph Description: Etiquetador de un léxico de expresiones multipalabra See description in English. Esta es una tarea de investigación sobre lenguaje y gestos asociada a la Biblioteca Internacional NewsScape de Notic...

X Description: Etiquetador de un léxico de expresiones multipalabra See description in English. Esta es una tarea de investigación sobre lenguaje y gestos asociada a la Biblioteca Internacional NewsScape de Notic...

Opengraph URL: https://github.com/RedHenLab/NLP/issues/2

X: @github

direct link

Domain: patch-diff.githubusercontent.com

Hey, it has json ld scripts:

{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Etiquetador de expresiones multipalabra","articleBody":"\u003ch3\u003eEtiquetador de un léxico de expresiones multipalabra\u003c/h3\u003e\n\n\n[See description in English](https://github.com/RedHenLab/NLP/issues/1).\n\nEsta es una tarea de investigación sobre lenguaje y gestos asociada a la [Biblioteca Internacional NewsScape de Noticias de Televisión](https://sites.google.com/site/distributedlittleredhen/home/tutorials-and-educational-resources/csa-overview). NewsScape es un recurso alojado por la Biblioteca de la Universidad de California Los Ángeles, y desarrollado por el  [Grupo Red Hen](https://sites.google.com/site/distributedlittleredhen/) para la Investigación de la Comunicación Multimodal. Además de UCLA, Red Hen tiene nodos de grabación y equipos de investigación en Case Western Reserve University, University of Illinois at Urbana Champaign, Universidad del Sur de Dinamarca, Universidad de Oxford, Universidad de Osnabrück, Texas Tech, Instituto Nacional de Estudios Avanzados de Bangalore, Universidad de Navarra, Universidad de Murcia, y otros lugares (el consorcio está en constante crecimiento). NewsScape contiene más de 200.000 horas de noticias televisivas en inglés, español y otras lenguas europeas, indexadas por sus subtítulos (más de 3000 millones de palabras). Entre otras funciones, NewsScape es la primera base de datos de contenidos audiovisuales que permite realizar una búsqueda sincronizada de subtítulos e imagen, llevándonos al momento exacto del programa en que las palabras recogidas por los subtítulos fueron pronunciadas.\n\nHasta ahora los corpus lingüísticos de gran envergadura son casi exclusivamente escritos (Corpus of American English, corpus CREA y CORDE de la Real Academia Española, hemerotecas, etc.). NewsScape abre nuevos horizontes para el estudio de la comunicación oral en relación con la gran variedad de elementos que acompañan a la palabra: gesto, y entonación, además de, en el caso de la televisión, música, efectos de imagen y sonido, gráficos, etc. Por supuesto, NewsScape también permite seguir noticias, temas, declaraciones de personajes, etc. Estamos desarrollando herramientas de búsqueda y anotación automática y manual de patrones semánticos. Además de verbales, también estamos desarrollando herramientas de detección de rostros, de patrones visuales, de segmentos narrativos, etc. Los grupos de investigación de Navarra y Murcia están desarrollando el proyecto SCHEMOTIME, que compara lenguaje y gestos en la expresión de las emociones y del tiempo, dos conceptos centrales para teorías sobre metáfora y cognición. Además, la colaboración Navarra-Murcia lidera el desarrollo de NewsScape en español.\n\nEl objetivo final de esta tarea es escribir un programa que recibe como entrada un texto en lenguaje natural e identifica estructuras gramaticales en él.\n\nPosiblemente Python sea el lenguaje de programación más adecuado por las librerías disponibles (recomendamos [mwetoolkit](http://mwetoolkit.sourceforge.net)).\n\nUna primera parte de la tarea la ejecuta un proprocesador (que ya existe) que marca las distintas parte de la oración (sustantivos, adverbios, preposiciones, etc). \n\nLa segunda parte, que es el trabajo a realizar ahora, es encontrar esas construcciones premarcadas en un léxico de expresiones multipalabra. \n\nEl programa se utilizará inicialmente con textos tanto en inglés como en español. Si está bien planteado debe funcionar bien con prácticamente cualquier idioma y la calidad del resultado dependerá unicamente de la calidad del léxico. \n\nNo es objetivo de este proyecto preparar el léxico, que nos será suministrado de antemano, al igual que una cantidad considerable de ficheros de entrada.\n\nNo es necesario tener conocimientos avanzados de lengua: Lexemas, léxicos, tipos de oración, etc... es suficiente con una lectura rápida a las páginas relevantes de wikipedia o cualquier otra fuente.\n\nPor ejemplo, un texto (lo ponemos en inglés porque es para el que tenemos un léxico ya creado) podría ser:\n\n\"AND SO THE YEARS ROLLED BY.\"\n\nUna herramienta llamada BSP, del grupo de investigación CLiPS de la  universidad de Amberes lo marca de la siguiente forma:\n\n\"and/CC/O/O/and|so/IN/I-ADVP/O/so|the/DT/I-NP/O/the|years/NNS/I-NP/O/year|rolled/VBN/I-VP/O/roll|by/RP/I-PRT/O/by|././O/O/.\"\n\nNo es importante entender aun estas anotaciones, lo importante es saber que existen y que es lo que usará el programa que hay que programar.\n\nLa lista de expresiones multipalabra del léxico se especifica  mediante una combinación de listas de palabras y etiquetas.\n\nPor ejemplo, una expresión puede tener (en inglés) la estructura \"As + Unidad de tiempo + verbo de movimiento + preposición\", de la siguiente forma: As centuries float slowly by, As the seconds trickled past, As the holidays slowly snuck up on her. \n\nFijate que no es importante saber inglés: Lo importante es identificar correctamente la estructura utilizando la lista de palabras y etiquetas.\n\nEn el ejemplo, la construcción se sigue especificando así:\n- Una lista de palabras que indican unidad de tiempo, como afternoon, age, autumn, century, dawn, decade, evening, y November.\n- Una lista de verbos de movimiento, incluyendo fly, shuffle, sneak up, come tumbling down, y roll past.\n- La PREPOSICIÓN estará disponible en las etiquetas de partes de la oración.\n\nAsí que el léxico define la expresión multipalabra y el programa debe localizar esa expresión en el texto fuente. Son necesarios tres pasos:\n- Identificar la forma lematizada de cada palabra (los lemas están\n  disponibles en las etiquetas de parte de la oración).\n- Comparar la lista de palabras del léxico con la palabra candidata \n  del texto fuente.\n- Comparar las etiquetas del léxico con las identificadas en el texto\n  fuente.\n\nLa aplicación final tendrá una arquitectura cliente-servidor (siendo la aplicación en sí misma la parte servidor) de forma que pueda ser utilizada como servicio por cualquier otro programa.\n\nEl proyecto tendrá mentores tanto en la Universidad de Navarra en España como en la Universidad de California en Los Ángeles. \n\n\u003ch3\u003eMuestra léxico de varias palabras expresiones de tiempo\u003c/h3\u003e\n\n\n\u003col\u003e\n\u003cli\u003e UNIDADES DE TIEMPO + VERBO (pasar, durar) + VPG/IN+(DT)+NN\n-La clase se pasó en un santiamén. –La película duró un suspiro.\n-La semana se ha pasado volando.\n\nUNIDADES DE TIEMPO: tarde, era, otoño, siglo, alba, amanecer, década, tarde, noche, vacaciones, hora, mediodía, medianoche, milenio, milésima de segundo, minuto, momento, mes, mañana (morning y tomorrow), periodo, época, segundo, primavera, verano, hoy, crepúsculo, ocaso, atardecer, anochecer, puesta de sol, semana, fin de semana, invierno, ayer. Lunes, martes, miércoles, jueves, viernes, sábado, domingo. Enero, febrero, marzo, abril, mayo, junio, julio, agosto, septiembre, octubre, noviembre, diciembre. \n\nTiempo nombres de procesos o entidades con duración: prorroga, partido, clase, película, vacaciones, relación, viaje, trayecto, vida, encierro, guerra, estancia, curso, conferencia, fiesta, velada, temporada, Navidades, carrera, visita, intermedio, recreo, concierto, trimestre, semestre, función, la primera/segunda/ultima parte, clase, jornada, obra, corto, verbena, cita, lección, explicación, audición, presentación, discurso. *Esta lista se puede ampliar\u003c/li\u003e\n\n\u003cli\u003e (PREPOSICIÓN: con, al, al cabo de…) + NOMBRE CON DURACIÓN TEMPORAL (pasar, el paso, transcurso, transcurrir) + UNIDAD DE TIEMPO + (ADJETIVO: lento/rápido). Equivalente a 2, 3, 4 en inglés cuando se combina con adverbio).\n-Con el pasar de los años. –Al transcurrir los años, a la larga, a largo/corto plazo, con el paso del tiempo\n-Con el (lento) transcurso de las décadas. - Al cabo de un tiempo\u003c/li\u003e\n\n\u003cli\u003e (PRONOMBRE PERSONAL) + VERBO (llevar/tomar/durar) + UNIDADES DE TIEMPO (mucho tiempo, poco tiempo, casi nada)/ADJETIVOS DE DURACIÓN TEMPORAL (lento/rápido/pesado/interminable/largo/corto/)\n-Nos llevó mucho tiempo. –Duró casi nada. – Os tomó poco tiempo.\n-Se hizo interminable.\u003c/li\u003e\n\n\u003cli\u003e VERBO DE INICIO/FINAL DE PROCESO (empezar/comenzar/terminar/finalizar) vs. VERBO CON VALOR EMOCIONAL (nacer/explotar/estallar/arrancar).\nEjemplos:\nLa guerra/revolución/revuelta empezó/estalló en el 36.\nLa persecución del cristianismo se cierra con el edicto de Milán.\n(Hay muchas cosas que pueden estallar: discusiones, peleas, crisis, tiroteo, tormenta)\nsinónimos de empezar: nacer, originar, germinar, abrir, brotar\nSinónimos de terminar: expirar, extinguir, declinar, morir, fenecer, decaer, amainar\u003c/li\u003e\n\nEl léxico se puede ampliar, pero preferimos hacer un piloto sólo con estas expresiones.\n","author":{"url":"https://github.com/Liontooth","@type":"Person","name":"Liontooth"},"datePublished":"2014-09-17T13:15:50.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/2/NLP/issues/2"}

route-pattern	/_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format)
route-controller	voltron_issues_fragments
route-action	issue_layout
fetch-nonce	v2:69deb0e8-ca96-d8b9-28d0-d9126ffead5e
current-catalog-service-hash	81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114
request-id	E4A2:1B3FAE:23EF01:2CA6F6:699068DE
html-safe-nonce	1b5f8a2e25ea45a403c0e5f43992d8613af3e444de5201fafbb0054a25e44f62
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJFNEEyOjFCM0ZBRToyM0VGMDE6MkNBNkY2OjY5OTA2OERFIiwidmlzaXRvcl9pZCI6IjE0OTU3MTIyMzAwMTc2NjA2IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmac	3cc30da8f42d42b30a17e26e51947cb3065b2ec20893c88efaa4a9f3170bcea3
hovercard-subject-tag	issue:43009316
github-keyboard-shortcuts	repository,issues,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/issues_fragments/issue_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/RedHenLab/NLP/2/issue_layout
twitter:image	https://opengraph.githubassets.com/4f976873e1abeed8d3eca3d868ee30e58ca1b44c6c9803c30c57d07319bef3b1/RedHenLab/NLP/issues/2
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/4f976873e1abeed8d3eca3d868ee30e58ca1b44c6c9803c30c57d07319bef3b1/RedHenLab/NLP/issues/2
og:image:alt	Etiquetador de un léxico de expresiones multipalabra See description in English. Esta es una tarea de investigación sobre lenguaje y gestos asociada a la Biblioteca Internacional NewsScape de Notic...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	Liontooth
hostname	github.com
expected-hostname	github.com
None	42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-control	no-preview
go-import	github.com/RedHenLab/NLP git https://github.com/RedHenLab/NLP.git
octolytics-dimension-user_id	8597243
octolytics-dimension-user_login	RedHenLab
octolytics-dimension-repository_id	23487070
octolytics-dimension-repository_nwo	RedHenLab/NLP
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	23487070
octolytics-dimension-repository_network_root_nwo	RedHenLab/NLP
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	3b33c5aedc9808f45bc5fcf0b1e4404cf749dac7
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues/2#start-of-content
	https://patch-diff.githubusercontent.com/
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FRedHenLab%2FNLP%2Fissues%2F2
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://patch-diff.githubusercontent.com/login?return_to=https%3A%2F%2Fgithub.com%2FRedHenLab%2FNLP%2Fissues%2F2
Sign up	https://patch-diff.githubusercontent.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fissues_fragments%2Fissue_layout&source=header-repo&source_repo=RedHenLab%2FNLP
Reload	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues/2
Reload	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues/2
Reload	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues/2
RedHenLab	https://patch-diff.githubusercontent.com/RedHenLab
NLP	https://patch-diff.githubusercontent.com/RedHenLab/NLP
Notifications	https://patch-diff.githubusercontent.com/login?return_to=%2FRedHenLab%2FNLP
Fork 14	https://patch-diff.githubusercontent.com/login?return_to=%2FRedHenLab%2FNLP
Star 12	https://patch-diff.githubusercontent.com/login?return_to=%2FRedHenLab%2FNLP
Code	https://patch-diff.githubusercontent.com/RedHenLab/NLP
Issues 3	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues
Pull requests 0	https://patch-diff.githubusercontent.com/RedHenLab/NLP/pulls
Actions	https://patch-diff.githubusercontent.com/RedHenLab/NLP/actions
Projects 0	https://patch-diff.githubusercontent.com/RedHenLab/NLP/projects
Wiki	https://patch-diff.githubusercontent.com/RedHenLab/NLP/wiki
Security 0	https://patch-diff.githubusercontent.com/RedHenLab/NLP/security
Insights	https://patch-diff.githubusercontent.com/RedHenLab/NLP/pulse
Code	https://patch-diff.githubusercontent.com/RedHenLab/NLP
Issues	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues
Pull requests	https://patch-diff.githubusercontent.com/RedHenLab/NLP/pulls
Actions	https://patch-diff.githubusercontent.com/RedHenLab/NLP/actions
Projects	https://patch-diff.githubusercontent.com/RedHenLab/NLP/projects
Wiki	https://patch-diff.githubusercontent.com/RedHenLab/NLP/wiki
Security	https://patch-diff.githubusercontent.com/RedHenLab/NLP/security
Insights	https://patch-diff.githubusercontent.com/RedHenLab/NLP/pulse
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/RedHenLab/NLP/issues/2
New issue	https://patch-diff.githubusercontent.com/login?return_to=https://github.com/RedHenLab/NLP/issues/2
Etiquetador de expresiones multipalabra	https://patch-diff.githubusercontent.com/RedHenLab/NLP/issues/2#top
	https://patch-diff.githubusercontent.com/cpcanovas
	https://github.com/Liontooth
	https://github.com/Liontooth
Liontooth	https://github.com/Liontooth
on Sep 17, 2014	https://github.com/RedHenLab/NLP/issues/2#issue-43009316
See description in English	https://github.com/RedHenLab/NLP/issues/1
Biblioteca Internacional NewsScape de Noticias de Televisión	https://sites.google.com/site/distributedlittleredhen/home/tutorials-and-educational-resources/csa-overview
Grupo Red Hen	https://sites.google.com/site/distributedlittleredhen/
mwetoolkit	http://mwetoolkit.sourceforge.net
cpcanovas	https://patch-diff.githubusercontent.com/cpcanovas
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.